4,703 Matching Annotations
  1. Last 7 days
  2. docdrop.org docdrop.org
    1. The older you get, the worse it is

      I did not previously think about how age impacts the way you experience poverty, but I can see how this may be true. When we are little kids, we are still unaware of a lot of the different facets of identity that set us apart, and are more likely to be open minded to more things-- we are still very easily impressionable. As we become older though, and cliques form, there is a way clearer understanding of what may be deemed as cool or desirable for teenagers and what is embarrassing or something to be ashamed of.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Editor’s summary:

      This paper by Castello-Serrano et al. addresses the role of lipid rafts in trafficking in the secretory pathway. By performing carefully controlled experiments with synthetic membrane proteins derived from the transmembrane region of LAT, the authors describe, model and quantify the importance of transmembrane domains in the kinetics of trafficking of a protein through the cell. Their data suggest affinity for ordered domains influences the kinetics of exit from the Golgi. Additional microscopy data suggest that lipid-driven partitioning might segregate Golgi membranes into domains. However, the relationship between the partitioning of the synthetic membrane proteins into ordered domains visualised ex vivo in GPMVs, and the domains in the TGN, remain at best correlative. Additional experiments that relate to the existence and nature of domains at the TGN are necessary to provide a direct connection between the phase partitioning capability of the transmembrane regions of membrane proteins and the sorting potential of this phenomenon.

      The authors have used the RUSH system to study the traffic of model secretory proteins containing single-pass transmembrane domains that confer defined affinities for liquid ordered (lo) phases in Giant Plasma Membrane derived Vesicles (GPMVs), out of the ER and Golgi. A native protein termed LAT partitioned into these lo-domains, unlike a synthetic model protein termed LAT-allL, which had a substituted transmembrane domain. The authors experiments provide support for the idea that ER exit relies on motifs in the cytosolic tails, but that accelerated Golgi exit is correlated with lo domain partitioning.

      Additional experiments provided evidence for segregation of Golgi membranes into coexisting lipid-driven domains that potentially concentrate different proteins. Their inference is that lipid rafts play an important role in Golgi exit. While this is an attractive idea, the experiments described in this manuscript do not provide a convincing argument one way or the other. It does however revive the discussion about the relationship between the potential for phase partitioning and its influence on membrane traffic.

      We thank the editors and scientific reviewers for thorough evaluation of our manuscript and for positive feedback. While we agree that our experimental findings present a correlation between trafficking rates and raft affinity, in our view, the synthetic, minimal nature of the transmembrane protein constructs in question makes a strong argument for involvement of membrane domains in their trafficking. These constructs have no known sorting determinants and are unlikely to interact directly with trafficking proteins in cells, since they contain almost no extramembrane amino acids. Yet, the LATTMD traffics through Golgi similarly to the full-length LAT protein, but quite different from mutants with lower raft phase affinity. We suggest that these observations can be best rationalized by involvement of raft domains in the trafficking fates and rates of these constructs, providing strong evidence (beyond a simple correlation) for the existence and relevance of such domains.

      We have substantially revised the manuscript to address all reviewer comments, including several new experiments and analyses. These revisions have substantially improved the manuscript without changing any of the core conclusions and we are pleased to have this version considered as the “version of record” in eLife.

      Below is our point-by-point response to all reviewer comments.

      ER exit:

      The experiments conducted to identify an ER exit motif in the C-terminal domain of LAT are straightforward and convincing. This is also consistent with available literature. The authors should comment on whether the conservation of the putative COPII association motif (detailed in Fig. 2A) is significantly higher than that of other parts of the C-terminal domain.

      Thank you for this suggestion, this information has now been included as Supp Fig 2B. While there are other wellconserved residues of the LAT C-terminus, many regions have relatively low conservation. In contrast, the essential residues of the COPII association motif (P148 and A150) are completely conserved across in LAT across all species analyzed.

      One cause of concern is that addition of a short cytoplasmic domain from LAT is sufficient to drive ER exit, and in its absence the synthetic constructs are all very slow. However, the argument presented that specific lo phase partitioning behaviour of the TMDs do not have a significant effect on exit from the ER is a little confusing. This is related to the choice of the allL-TMD as the 'non-lo domain' partitioning comparator. Previous data has shown that longer TMDs (23+) promote ER export (eg. Munro 91, Munro 95, Sharpe 2005). The mechanism for this is not, to my knowledge, known. One could postulate that it has something to do with the very subject of this manuscript- lipid phase partitioning. If this is the case, then a TMD length of 22 might be a poor choice of comparison. A TMD 17 Ls' long would be a more appropriate 'non-raft' cargo. It would be interesting to see a couple of experiments with a cargo like this.

      The basis for the claim that raft affinity has relatively minor influence on ER exit kinetics, especially in comparison to the effect of the putative COPII interaction motif, is in Fig 1G. We do observe some differences between constructs and they may be related to raft affinity, however we considered these relatively minor compared to the nearly 4-fold increase in ER efflux induced by COPII motifs.

      We have modified the wording in the manuscript to avoid the impression that we have ruled out an effect of raft affinity of ER exit.

      We believe that our observations are broadly consistent with those of Munro and colleagues. In both their work and ours, long TMDs were able to exit the ER. In our experiments, this was true for several proteins with long TMDs, either as fulllength or as TMD-only versions (see Fig 1G). We intentionally did not measure shorter synthetic TMDs because these would not have been comparable with the raft-preferring variants, which all require relatively long TMDs, as demonstrated in our previous work1,2. Thus, because our manuscript does not make any claims about the influence of TMD length on trafficking, we did not feel that experiments with shorter non-raft constructs would substantively influence our conclusions.

      However, to address reviewer interest, we did complete one set of experiments to test the effect of shortening the TMD on ER exit. We truncated the native LAT TMD by removing 6 residues from the C-terminal end of the TMD (LAT-TMDd6aa). This construct exited the ER similarly to all others we measured, revealing that for this set of constructs, short TMDs did not accumulate in the ER. ER exit of the truncated variant was slightly slower than the full-length LAT-TMD, but somewhat faster than the allL-TMD. These effects are consistent with our previous measurements with showed that this shortened construct has slightly lower raft phase partitioning than the LAT-TMD but higher than allL2. While these are interesting observations, a more thorough exploration of the effect of TMD length would be required to make any strong conclusion, so we did not include these data in the final manuscript.

      Author response image 1.

      Golgi exit:

      For the LAT constructs, the kinetics of Golgi exit as shown in Fig. 3B are surprisingly slow. About half of the protein Remains in the Golgi at 1 h after biotin addition. Most secretory cargo proteins would have almost completely exited the Golgi by that time, as illustrated by VSVG in Fig. S3. There is a concern that LAT may have some tendency to linger in the Golgi, presumably due to a factor independent of the transmembrane domain, and therefore cannot be viewed as a good model protein. For kinetic modeling in particular, the existence of such an additional factor would be far from ideal. A valuable control would be to examine the Golgi exit kinetics of at least one additional secretory cargo.

      We disagree that LAT is an unusual protein with respect to Golgi efflux kinetics. In our experiments, Golgi efflux of VSVG was similar to full-length LAT (t1/2 ~ 45 min), and both of these were similar to previously reported values3. Especially for the truncated (i.e. TMD) constructs, it is very unlikely that some factor independent of their TMDs affects Golgi exit, as they contain almost no amino acids outside the membrane-embedded TMD.

      Practically, it has proven somewhat challenging to produce functional RUSH-Golgi constructs. We attempted the experiment suggested by the reviewer by constructing SBP-tagged versions of several model cargo proteins, but all failed to trap in the Golgi. We speculate that the Golgin84 hook is much more sensitive to the location of the SBP on the cargo, being an integral membrane protein rather than the lumenal KDEL-streptavidin hook. This limitation can likely be overcome by engineering the cargo, but we did not feel that another control cargo protein was essential for the conclusions we presented, thus we did not pursue this direction further.

      Comments about the trafficking model

      (1) In Figure 1E, the export of LAT-TMD from the ER is fitted to a single-exponential fit that the authors say is "well described". This is unclear and there is perhaps something more complex going on. It appears that there is an initial lag phase and then similar kinetics after that - perhaps the authors can comment on this?

      This is a good observation. This effect is explainable by the mechanics of the measurement: in Figs 1 and 2, we measure not ‘fraction of protein in ER’ but ‘fraction of cells positive for ER fluorescence’. This is because the very slow ER exit of the TMD-only constructs present a major challenge for live-cell imaging, so ER exit was quantified on a population level, by fixing cells at various time points after biotin addition and quantifying the fraction of cells with observable ER localization (rather than tracking a single cell over time).

      For fitting to the kinetic model (which attempts to describe ‘fraction in ER/Golgi’) we re-measured all constructs by livecell imaging (see Supp Fig 5) to directly quantify relative construct abundance in the ER or Golgi. These data did not have the plateau in Fig 1E, suggesting that this is an artifact of counting “ER positive cells” which would be expected to have a longer lag than “fraction of protein in ER”. Notably however, t1/2 measured by both methods was similar, suggesting that the population measurement agrees well with single-cell live imaging.

      We have included all these explanations and caveats in the manuscript. We have also changed the wording from “well described” to “reasonably approximated”.

      (2) The model for Golgi sorting is also complicated and controversial, and while the authors' intention to not overinterpreting their data in this regard must be respected, this data is in support of the two-phase Golgi export model (Patterson et al PMID:18555781).

      The reviewers are correct, our observations and model are consistent with Patterson et al and it was a major oversight that a reference to this foundational work was not included. We have now added a discussion regarding the “two phase model” of Patterson and Lippincott-Schwartz.

      Furthermore contrary to the statement in lines 200-202, the kinetics of VSVG exit from the Golgi (Fig. S3) are roughly linear and so are NOT consistent with the previous report by Hirschberg et al.

      Regarding kinetics of VSVG, our intention was to claim that the timescale of VSVG efflux from the Golgi was similar to previously reported in Hirschberg, i.e. t1/2 roughly between 30-60 minutes. We have clarified this in the text. Minor differences in the details between our observations and Hirschberg are likely attributable to temperature, as those measurements were done at 32°C for the tsVSVG mutant.

      Moreover, the kinetics of LAT export from the Golgi (Fig. 3B) appear quite different, more closely approximating exponential decay of the signal. These points should be described accurately and discussed.

      Regarding linear versus exponential fits, we agree that the reality of Golgi sorting and efflux is far more complicated than accounted for by either the phenomenological curve fitting in Figs 1-3 or the modeling in Fig 4. In addition to the possibility of lateral domains within Golgi stacks, there is transport between stacks, retrograde traffic, etc. The fits in Figs 1-3 are not intended to model specifics of transport, but rather to be phenomenological descriptors that allowed us to describe efflux kinetics with one parameter (i.e. t1/2). In contrast, the more refined kinetic modeling presented in Figure 4 is designed to test a mechanistic hypothesis (i.e. coexisting membrane domains in Golgi) and describes well the key features of the trafficking data.

      Relationship between membrane traffic and domain partitioning:

      (1) Phase segregation in the GPMV is dictated by thermodynamics given its composition and the measurement temperature (at low temperatures 4degC). However at physiological temperatures (32-37degC) at which membrane trafficking is taking place these GPMVs are not phase separated. Hence it is difficult to argue that a sorting mechanism based solely on the partitioning of the synthetic LAT-TMD constructs into lo domains detected at low temperatures in GPMVs provide a basis (or its lack) for the differential kinetics of traffic of out of the Golgi (or ER). The mechanism in a living cell to form any lipid based sorting platforms naturally requires further elaboration, and by definition cannot resemble the lo domains generated in GPMVs at low temperatures.

      We thank the reviewers for bringing up this important point. GPMVs are a useful tool because they allow direct, quantitative measurements of protein partitioning between coexisting ordered and disordered phases in complex, cell-derived membranes. However, we entirely agree, that GPMVs do not fully represent the native organization of the living cell plasma membrane and we have previously discussed some of the relevant differences4,5. Despite these caveats, many studies have supported the cellular relevance of phase separation in GPMVs and the partitioning of proteins to raft domains therein 6-9. Most notably, elegant experiments from several independent labs have shown that fluorescent lipid analogs that partition to Lo domains in GPMVs also show distinct diffusive behaviors in live cells 6,7, strongly suggesting the presence of nanoscopic Lo domains in live cells. Similarly, our recent collaborative work with the lab of Sarah Veatch showed excellent agreement between raft preference in GPMVs and protein organization in living immune cells imaged by super-resolution microscopy10. Further, several labs6,7, including ours11, have reported nice correlations between raft partitioning in GPMVs and detergent resistance, which is a classical (though controversial) assay for raft association.

      Based on these points, we feel that GPMVs are a useful tool for quantifying protein preference for ordered (raft) membrane domains and that this preference is a useful proxy for the raft-associated behavior of these probes in living cells. We propose that this approach allows us to overcome a major reason for the historical controversy surrounding the raft field: nonquantitative and unreliable methodologies that prevented consistent definition of which proteins are supposed to be present in lipid rafts and why. Our work directly addresses this limitation by relating quantitative raft affinity measurements in a biological membrane with a relevant and measurable cellular outcome, specifically inter-organelle trafficking rates.

      Addressing the point about phase transition temperatures in GPMVs: this is the temperature at which macroscopic domains are observed. Based on physical models of phase separation, it has been proposed that macroscopic phase separation at lower temperatures is consistent sub-microscopic, nanoscale domains at higher temperatures8,12. These smaller domains can potentially be stabilized / functionalized by protein-protein interactions in cells13 that may not be present in GPMVs (e.g. because of lack of ATP).

      (2) The lipid compositions of each of these membranes - PM, ER and Golgi are drastically different. Each is likely to phase separate at different phase transition temperatures (if at all). The transition temperature is probably even lower for Golgi and the ER membranes compared to the PM. Hence, if the reported compositions of these compartments are to be taken at face value, the propensity to form phase separated domains at a physiological temperature will be very low. Are ordered domains even formed at the Golgi at physiological temperatures?

      It is a good point that the membrane compositions and the resulting physical properties (including any potential phase behavior) will be very different in the PM, ER, and Golgi. Whether ordered domains are present in any of these membranes in living cells remains difficult to directly visualize, especially for non-PM membranes which are not easily accessible by probes, are nanoscopic, and have complex morphologies. However, the fact that raft-preferring probes / proteins share some trafficking characteristics, while very similar non-raft mutants behave differently argues that raft affinity plays a role in subcellular traffic.

      (3) The hypothesis of 'lipid rafts' is a very specific idea, related to functional segregation, and the underlying basis for domain formation has been also hotly debated. In this article the authors conflate thermodynamic phase separation mechanisms with the potential formation of functional sorting domains, further adding to the confusion in the literature. To conclude that this segregation is indeed based on lipid environments of varying degrees of lipid order, it would probably be best to look at the heterogeneity of the various membranes directly using probes designed to measure lipid packing, and then look for colocalization of domains of different cargo with these domains.

      This is a very good suggestion, and a direction we are currently following. Unfortunately, due to the dynamic nature and small size of putative lateral membrane domains, combined with the interior of a cell being filled with lipophilic environments that overlay each other, directly imaging domains in organellar membranes with lipid packing probes remains extremely difficult with current technology (or at least available to us). We argue that the TMD probes used in this manuscript are a reasonable alternative, as they are fluorescent probes with validated selectivity for membrane compartments with different physical properties.

      Ultimately, the features of membrane domains suggested by a variety of techniques – i.e. nanometric, dynamic, relatively similar in composition to the surrounding membrane, potentially diverse/heterogeneous – make them inherently difficult to microscopically visualize. This is one reason why we believe studies like ours, which use a natural model system to directly quantify raft-associated behaviors and relate them to cellular effects (in our case, protein sorting), are a useful direction for this field.

      We believe we have been careful in our manuscript to avoid confusing language surrounding lipid rafts, phase separation, etc. Our experiments clearly show that mammalian membranes have the capacity to phase separate, that some proteins preferentially interact with more ordered domains, and that this preference is related to the subcellular trafficking fates and rates of these proteins. We have edited the manuscript to emphasize these claims and avoid the historical controversies and confusions.

      (4) In the super-resolution experiments (by SIM- where the enhancement of resolution is around two fold or less compared to optical), the authors are able to discern a segregation of the two types of Golgi-resident cargo that have different preferences for the lo-domains in GPMVs. It should be noted that TMD-allL and the LATallL end up in the late endosome after exit of the Golgi. Previous work from the Bonafacino laboratory (PMID: 28978644) has shown that proteins (such as M6PR) destined to go to the late endosome bud from a different part of the Golgi in vesicular carriers, while those that are destined for the cell surface first (including TfR) bud with tubular vesicular carriers. Thus at the resolution depicted in Fig 5, the segregation seen by the authors could be due to an alternative explanation, that these molecules are present in different areas of the Golgi for reasons different from phase partitioning. The relatively high colocalization of TfR with the GPI probe in Fig 5E is consistent with this explanation. TfR and GPI prefer different domains in the GPMV assays yet they show a high degree of colocalization and also traffic to the cell surface.

      This is a good point. Even at microscopic resolutions beyond the optical diffraction limit, we cannot make any strong claims that the segregation we observe is due to lateral lipid domains and not several reasonable alternatives, including separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids. We have explicitly included this point in the Discussion: “Our SIM imaging suggests segregation of raft from nonraft cargo in the Golgi shortly (5 min) after RUSH release (Fig 5B), but at this level of resolution, we can only report reduced colocalization, not intra-Golgi protein distributions. Moreover, segregation within a Golgi cisterna would be very difficult to distinguish from cargo moving between cisternae at different rates or exiting via Golgi-proximal vesicles.”

      We have also added a similar caveat in the Results section of the manuscript: “These observations support the hypothesis that proteins can segregate in Golgi based on their affinity for distinct membrane domains; however, it is important to emphasize that this segregation does not necessarily imply lateral lipid-driven domains within a Golgi cisterna. Reasonable alternative possibilities include separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids.”

      Finally, while probes with allL TMD do eventually end up in late endosomes (consistent with the Bonifacino lab’s findings which we include), they do so while initially transiting the PM2,11.

      Minor concerns:

      (1) Generally, the quantitation is high quality from difficult experimental data. Although a lot appears to be manual, it appears appropriately performed and interpreted. There are some claims that are made based on this quantitation, however, where there are no statistics performed. For example, figure 1B. Any quantitation with an accompanying conclusion should be subject to a statistical test. I think the quality of the model fits- this is particularly important.

      We appreciate the thoughtful feedback, the quantifications and fits were not trivial, but we believe important. We have added statistical significance to Figure 1B and others where it was missing.

      (2) Modulation of lipid levels in Fig 4E shows a significant change for the trafficking rate for the LAT-TMD construct and a not so significant change for all-TMD construct. However, these data are not convincing and appear to depend on a singular data point that appears to lower the mean value. In general, the experiment with the MZA inhibitor (Fig. 4D-F) is hard to interpret because cells will likely be sick after inhibition of sphingolipid and cholesterol synthesis. Moreover, the difference in effects for LAT-TMD and allL-TMD is marginal.

      We disagree with this interpretation. Fig 4E shows the average of three experiments and demonstrates clearly that the inhibitors change the Golgi efflux rate of LAT-TMD but not allL-TMD. This is summarized in the t1/2 quantifications of Fig 4F, which show a statistically significant change for LAT-TMD but not allL-TMD. This is not an effect of a singular data point, but rather the trend across the dataset.

      Further, the inhibitor conditions were tuned carefully to avoid cells becoming “sick”: at higher concentrations, cells did adopt unusual morphologies and began to detach from the plates. We pursued only lower concentrations, which cells survived for at least 48 hrs and without major morphological changes.

      (3) Line 173: 146-AAPSA-152 should read either 146-AAPSA-150 or 146-AAPSAPA-152, depending on what the authors intended.

      Thanks for the careful reading, we intended the former and it has been fixed.

      (4) What is the actual statistical significance in Fig. 3C and Fig. 3E? There is a single asterisk in each panel of the figure but two asterisks in the legend.

      Apologies, a single asterisk representing p<0.05 was intended. It has been fixed.

      (5) The code used to calculate the model. is not accessible. It is standard practice to host well-annotated code on Github or similar, and it would be good to have this publicly available.

      We have deposited the code on a public repository (doi: 10.5281/zenodo. 10478607) and added a note to the Methods.

      (1) Lorent, J. H. et al. Structural determinants and func7onal consequences of protein affinity for membrane ra=s. Nature communica/ons 8, 1219 (2017).PMC5663905

      (2) Diaz-Rohrer, B. B., Levental, K. R., Simons, K. & Levental, I. Membrane ra= associa7on is a determinant of plasma membrane localiza7on. Proc Natl Acad Sci U S A 111, 8500-8505 (2014).PMC4060687

      (3) Hirschberg, K. et al. Kine7c analysis of secretory protein traffic and characteriza7on of golgi to plasma membrane transport intermediates in living cells. J Cell Biol 143, 1485-1503 (1998).PMC2132993

      (4) Levental, K. R. & Levental, I. Giant plasma membrane vesicles: models for understanding membrane organiza7on. Current topics in membranes 75, 25-57 (2015)

      (5) Sezgin, E. et al. Elucida7ng membrane structure and protein behavior using giant plasma membrane vesicles. Nat Protoc 7, 1042-1051 (2012)

      (6) Komura, N. et al. Ra=-based interac7ons of gangliosides with a GPI-anchored receptor. Nat Chem Biol 12, 402-410 (2016)

      (7) Kinoshita, M. et al. Ra=-based sphingomyelin interac7ons revealed by new fluorescent sphingomyelin analogs. J Cell Biol 216, 1183-1204 (2017).PMC5379944

      (8) Stone, M. B., Shelby, S. A., Nunez, M. F., Wisser, K. & Veatch, S. L. Protein sor7ng by lipid phase-like domains supports emergent signaling func7on in B lymphocyte plasma membranes. eLife 6 (2017).PMC5373823

      (9) Machta, B. B. et al. Condi7ons that Stabilize Membrane Domains Also Antagonize n-Alcohol Anesthesia. Biophys J 111, 537-545 (2016)

      (10) Shelby, S. A., Castello-Serrano, I., Wisser, I., Levental, I. & S., V. Membrane phase separa7on drives protein organiza7on at BCR clusters. Nat Chem Biol in press (2023)

      (11) Diaz-Rohrer, B. et al. Rab3 mediates a pathway for endocy7c sor7ng and plasma membrane recycling of ordered microdomains Proc Natl Acad Sci U S A 120, e2207461120 (2023)

      (12) Veatch, S. L. et al. Cri7cal fluctua7ons in plasma membrane vesicles. ACS Chem Biol 3, 287-293 (2008)

      (13) Wang, H. Y. et al. Coupling of protein condensates to ordered lipid domains determines func7onal membrane organiza7on. Science advances 9, eadf6205 (2023).PMC10132753

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendation for the authors):

      (1) On a few occasions, I found that the authors would introduce a concept, but provide evidence much later on. For example, in line 57, they introduced the idea that feedback timing modulates engagement of the hippocampus and striatum, but they provided the details much later on around line 99. There are a few instances like these, and the authors may want to go through the manuscript critically to bridge such gaps to improve the flow of reading.

      First, we thank the reviewer for acknowledging the contribution of our study and the methodological choices. We acknowledge the concern raised about the flow of information in the introduction. We have critically reviewed the manuscript, especially on writing style and overall structure, to ensure a smoother transition between the introduction of concepts and the provision of supporting evidence. In the case of the concept of feedback timing and memory systems, lines 46-58 first introduce the concept enhanced with evidence regarding adults, and we then pick up the concept around line 103 again to relate it to children and their brain development to motivate our research question. To further improve readability, we have included an outline of what to expect in the introduction. Specifically, we added a sentence in line 66-68 that provides an overview of the different paragraphs: “We will introduce the key parameters in reinforcement learning and then we review the existing literature on developmental trajectories in reinforcement learning as well as on hippocampus and striatum, our two brain regions of interest.”

      This should prepare the reader better when to expect more evidence regarding the concepts introduced. We included similar “road-marker” outline sentences in other occasions the reviewer commented on, to enhance consistency and readability.

      (2) I am curious as to how they think the 5-second delay condition maps onto real-life examples, for example in a classroom setting feedback after 5 seconds could easily be framed as immediate feedback.

      The authors may want to highlight a few illustrative examples.

      Thank you for asking about the practical implications of a 5-second delay condition, which may be very relevant to the reader. We have modified the introduction example in line 39-41 towards the role of feedback timing in the classroom to point out its practical relevance early on: “For example, children must learn to raise their hand before speaking during class. The teacher may reinforce this behavior immediately or with a delay, which raises the question whether feedback timing modulates their learning”.

      We have also expanded a respective discussion point in lines 720-728 to pick up the classroom example and to illustrate how we think timescale differences may apply: “In scenarios such as in the classroom, a teacher may comment on a child’s behavior immediately after the action or some moments later, in par with our experimental manipulation of 1 second versus 5 seconds. Within such short range of delay in teachers’ feedback, children’s learning ability during the first years of schooling may function equally well and depend on the striatal-dependent memory system. However, we anticipate that the reliance on the hippocampus will become even more pronounced when feedback is further delayed for longer time. Children’s capacity for learning over longer timescales relies on the hippocampal-dependent memory system, which is still under development. This knowledge could help to better structure learning according to their development.”

      (3) In the methods section, there are a few instances of task description discrepancies which make things a little bit confusing, for example, line 173 reward versus punishment, or reward versus null elsewhere e.g. line 229. In the same section, line 175, there are a few instances of typos.

      We appreciate your attention to detail in pointing out discrepancies in task descriptions and typos in the method section. We have revised the section, corrected typos, and now phrased the learning outcomes consistently as “reward” and “punishment”.

      (4). I wasn't very clear as to why the authors did not compute choice switch probability directly from raw data but implemented this as a model that makes use of a weight parameter. Former would-be much easier and straightforward for data plotting especially for uninformed readers, i.e., people who do not have backgrounds in computational modelling.

      Thank you for asking for clarification on the calculation of switching behavior. Indeed, in the behavioral results, switching behavior was directly calculated from the raw data. We now stressed this in the methods in lines 230-235, also by naming win-stay and lose-shift as “proportions” instead of as “probabilities”:“As a first step, we calculated learning outcomes diretly from the raw data, which where learning accuracy, win-stay and lose-shift behavior as well as reaction time.

      Learning accuracy was defined as the proportion to choose the more rewarding option, while win-stay and lose-shift refer to the proportion of staying with the previously chosen option after a reward and switching to the alternative choice after receiving a punishment, respectively.”

      In contrast to the raw data switching behavior, the computational heuristic strategy model indeed uses a weight for a relative tendency of switching behavior. We have also stressed the advantage of the computational measure and its difference to the raw data switching behavior in lines 248-252 and believe that the reader can now clearly distinguish between the raw data and the computational results: “Note that these model-based outcomes are not identical to the win-stay and lose-shift behavior that were calculated from the raw data. The use of such model-based measure offers the advantage in discerning the underlying hidden cognitive process with greather nuance, in contrast to classical approaches that directly use raw behavioral data.”

      (5) I agree with the authors' assertion that both inverse temperature and outcome sensitivity parameters may lead to non-identifiability issues, but I was not 100% convinced about their modelling approach exclusively assessing a different family of models (inv temperature versus outcome sensitivity). Here, I would like to make one mid-way recommendation. They may want to redefine the inverse temperature term in terms of reaction time, i.e., B=exp^(s+g(RT-mean (RT)) where s and g are free parameters (see Webb, 2019), and keep the outcome sensitivity parameter in the model with bounds [0,2] so that the interpretation could be % increase or decrease in actual outcome. Personally, in tasks with binary outcomes i.e. [0,1: null vs reward] I do not think outcome sensitivity parameters higher than 2 are interpretable as these assign an inflated coefficient to outcomes.

      We appreciate the mid-way recommendation regarding the modeling approach for inverse temperature and outcome sensitivity parameters. We have carefully revised our analysis approach by considering alternative modeling choices. Regarding the suggestion to redefine the inverse temperature in terms of reaction time by B=exp^(s+g(RT-mean (RT)), we unfortunately were not able to identify the reference Webb (2019), nor did we find references to the suggested modeling approach. Any further information that the reviewer could provide will be greatly appreciated. Regardless, we agree that including reaction times through the implementation of drift-diffusion modeling may be beneficial. However, changing the inverse temperature model in such a way would necessitate major changes in our modeling approach, which unfortunately would result in non-convergence issues in our MCMC pipeline using Rstan. Hence, this approach goes beyond the scope of the manuscript. Nonetheless, we have decided to mention the use of a drift-diffusion model, along with other methodological considerations, as future recommendation for disentangling outcome sensitivity from inverse temperature in lines 711-712: “Future studies might shed new light by examining neural activations at both task phases, by additionally modeling reaction times using a drift-diffusion approach, or by choosing a task design that allows independent manipulations of these phases and associated model parameters, e.g., by using different reward magnitudes during reinforcement learning, or by studying outcome sensitivity without decisionmaking.“

      Regarding the upper bound of outcome sensitivity, we agree that traditionally, limiting the parameter values at 2 is the choice for the parameter to be best interpretable. During model fitting, we had experienced non-convergence issues and ceiling effects in the outcome sensitivity parameter when fixing the inverse temperature at 1. The non-convergence issue was not resolved when we fixed the inverse temperature at 15.47, which was the group mean of the winning inverse temperature family. Model convergence was only achieved after increasing the outcome sensitivity upper bound to 20, with inverse temperature again fixed at 1. Since this model also performed well during parameter and model recovery, we argue that the parameter is nevertheless meaningful, despite the more extreme trial-to-trial value fluctuations under higher outcome sensitivity. We described our choice for this model in the methods section in lines 282-288: “Even though outcome sensitivity is usually restricted to an upper bound of 2 to not inflate outcomes at value update, this configuration led to ceiling effects in outcome sensitivity and non-converging model results. Further, this issue was not resolved when we fixed the inverse temperature at the group mean of 15.47 of the winning inverse temperature family model. It may be that in children, individual differences in outcome sensitivity are more pronounced, leading to more extreme values. Therefore, we decided to extend the upper bound to 20, parallel to the inverse temperature, and all our models converged with Rhat < 1.1.”.

      (6) I think the authors reporting optimal parameters for the model is very important (line 464), but the learning rate they report under stable contingencies is much higher than LRs reported by for example Behrens et al 2007, LRs around 0.08 for the optimal learning behaviour. The authors may want to discuss why their task design calls for higher learning rates.

      Thank you for appreciating our optimal parameter analysis, and for the recommendation to discuss why optimal learning rates in our task design may call for higher learning rates compared to those reported in some other studies. As largely articulated in Zhang et al (2020; primer piece by one of our co-authors), the optimal parameter combination is determined by several factors, such as the reward schedule (e.g., 75:25, vs 80:20) and task design (e.g., no reversal, one reversal, vs multiple reversal) and number of trials (e.g., 80, vs 100, vs, 120). Notably, in these taskrelated regards, our task is different from Behrens et al. (2007), which hinders a quantitative comparison among the optimal parameters in the two tasks. We have now included more details in our discussion in lines 643-656: “However, the differences in learning rate across studies have to be interpreted with caution. The differences in the task and the analysis approach may limit their comparability. Task proporties such as the trial number per condition differed across studies. Our study included 32 trials per cue in each condition, while in adult studies, the trials per condition ranged from 28 to 100. Optimal learning rates in a stable learning environment were at around 0.25 for 10 to 30 trials, another study reported a lower optimal learning rate of around 0.08 for 120 trials. This may partly explain why in our case of 32 trials per condition and cue, optimal learning rates called for a relatively high optimal learning rate of 0.29, while in other studies, optimal learning rates may be lower. Regarding differences in the analysis approach, the hierarchical bayesian estimation approach used in our study produces more reliable results in comparison to maximum likelihood estimation, which had been used in some of the previous adult studies and may have led to biased results towards extreme values. Taken together, our study underscores the importance of using longitudinal data to examine developmental change as well as the importance of simulation-based optimal parameters to interpret the direction of developmental change.”

      (7) The authors may want to report degrees of freedom in t-tests so that it would be possible to infer the final sample size for a specific analysis, for example, line 546.

      We appreciate the recommendation to include degrees of freedom, which are now added in all t-test results, for example in line 579: “Episodic memory, as measured by individual corrected object recognition memory (hits - false alarms) of confident (“sure”) ratings, showed at trend better memory for items shown in the delayed feedback condition (𝛽!""#$%&’(#")%*"# = .009, SE =.005, t(df = 137) = 1.80, p = .074, see Figure 5A).”

      (8) I'm not sure why reductions in lose shift behaviour are framed as an improvement between 2 assessment points, e.g. line 578. It all depends on the strength of the contingency so a discussion around this point should be expanded.

      We acknowledge that a reduction in lose-shift behavior only reflect improvements under certain conditions where uncertainty is low and the learning contingencies are stable, which is the case in our task. We have added Supplementary Material 4 to illustrate the optimality of win-stay and lose-shift proportions from model simulation and to confirm that children’s longitudinal development was indeed towards more optimal switching behavior. In the manuscript, we refer to these results in lines 488-490: “We further found that the average longitudinal change in win-stay and lose-shift proportion also developed towards more optimal value-based learning (Supplementary Material 4).”

      (9) If I'm not mistaken, the authors reframe a trend-level association as weak evidence. I do not think this is an accurate framing considering the association is strictly non-significant, therefore should be omitted line 585.

      We thank for the point regarding the interpretation of a trend-level association as weak evidence. We changed our interpretation, corrected in lines 581-585: “The inclusion of poor learners in the complete dataset may have weakend this effect because their hippocampal function was worse and was not involved in learning (nor encoding), regardless of feedback timing. To summarize, there was inconclusive support for enhanced episodic memory during delayed compared to immediate feedback, calling for future study to test the postulation of a selective association between hippocampal volume and delayed feedback learning.” as well as lines 622-623: “Contrary to our expectations, episodic memory performance was not enhanced under delayed feedback compared to immediate feedback.”

      Reviewer # 2 (Public Review):

      We thank the reviewer for acknowledging the strength of our study and pointing out its weaknesses.

      Weaknesses:

      There were a few things that I thought would be helpful to clarify. First, what exactly are the anatomical regions included in the striatum here?

      We appreciate the clarification question regarding the anatomical regions included in the striatum. The striatum included ventral and dorsal regions, i.e., accumbens, caudate and putamen. We have now specified the anatomical regions that were included in the striatum in lines 211-212: “We extracted the bilateral brain volumes for our regions of interest, which were striatum and hippocampus. The striatum regions included nucleus accumbens, caudate and putamen.”

      Second, it was mentioned that for the reduced dataset, object recognition memory focused on "sure" ratings. This seems like the appropriate way to do it, but it was not clear whether this was also the case for the full analyses in the main text.

      Thank you for pointing out that in the full dataset analysis, the use of “sure” ratings for object recognition memory was previously not mentioned. Including only “sure” ratings was used consistently across analyses. This detail is now described under methods in lines 332-333: “Only confident (“sure”) ratings were included in the analysis, which were 98.1 % of all given responses.”

      Third, the children's fitted parameters were far from optimal; is it known whether adults would be closer to optimal on the task?

      We thank for your question on whether adult learning rates in the task have been reported to be more optimal than those of the children in our study. This indeed seems to be the case, and we added this point in our discussion in line 639-643: “Adult studies that examined feedback timing during reinforcement learning reported average learning rates range from 0.12 to 0.34, which are much closer to the simulated optimal learning rates of 0.29 than children’s average learning rates of 0.02 and 0.05 at wave 1 and 2 in our study. Therefore, it is likely that individuals approach adult-like optimal learning rates later during adolescence.”

      The main thing I would find helpful is to better integrate the differences between the main results reported and the many additional results reported in the supplement, for example from the reduced dataset when excluding non-learners. I found it a bit challenging to keep track of all the differences with all the analyses and parameters. It might be helpful to report some results in tables side-by-side in the two different samples. And if relevant, discuss the differences or their implication in the Discussion. For example, if the patterns change when excluding the poor learners, in particular for the associations between delayed feedback and hippocampal volume, and those participants were also those less well fit by the value-based model, is that something to be concerned about and does that affect any interpretations? What was not clear to me is whether excluding the poor learners at one extreme simply weakens the general pattern, or whether there is a more qualitative difference between learners and non-learners. The discussion points to the relevance of deficits in hippocampaldependent learning for psychopathology and understanding such a distinction may be relevant.

      We appreciate the feedback that it might seem challenging to keep track of differences between the analyses of the full and the reduced dataset. We have now gathered all the analyses for the reduced dataset in Supplementary Material 6, with side-by-side tables for comparison to the full dataset results. Whenever there were differences between the results, they were pointed out in the results section, see lines 557-560: “In the results of the reduced dataset, the hippocampal association to the delayed learning score was no longer significant, suggesting a weakened pattern when excluding poor learners (Supplementary Material 6). It is likely that the exclusion reduced the group variance for hippocampal volume and delayed learning score in the model.” and lines 579-581: “Note that in the reduced dataset, delayed feedback predicted enhanced item memory significantly (Supplementary Material 6).”

      The found differences were further included in our discussion in lines 737-740 in the context of deficits in hippocampal-dependent learning and psychopathology: “Interestingly, poor learners showed relatively less value-based learning in favor of stronger simple heuristic strategies, and excluding them modulated the hippocampal-dependent associations to learning and memory in our results. More studies are needed to further clarify the relationship between hippocampus and psychopathology during cognitive and brain development.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) There appears to be a flaw in the exploration of cortical inputs. the authors never show that HFS of cortical inputs has no effect in the absence of thalamic stimulation. It appears that there is a citation showing this, but I think it would be important to show this in this study as well.

      We understand that the reviewer would like us to induce an HFS protocol on cortical input and then test if there is any change in synaptic strength in thalamic input. We have done this experiment which shows that without a footshock, high-frequency stimulation (HFS) of the cortical inputs did not induce synaptic potentiation on the thalamic pathway (Extended Data Fig. 4d).

      (2) t is somewhat confusing that the authors refer to the cortical input as driving heterosynaptic LTP, but this is not shown until Figure 4J, that after non-associative conditioning (unpaired shock and tone) HFS of the cortex can drive freezing and heterosynaptic LTP of thalamic inputs.

      We agree with the reviewer that it is in figure 4j and figure 5,b,c which we show electrophysiological evidence for cortical input driving heterosynaptic LTP. It is only to be consistent with our terminology that initially we used behavioral evidence as the proxy for heteroLTP (figure 3c).

      …, the authors are 'surprised' by this outcome, which appears to be what they predict.

      We removed the phrase “To our surprise”.

      (3) 'Cortex' as a stimulation site is vague. The authors have coordinates they used, it is unclear why they are not using standard anatomical nomenclature.

      We replaced “cortex” with “auditory/associative cortex”.

      (4) The authors' repeated use of homoLTP and heteroLTP to define the input that is being stimulated makes it challenging to understand the experimental detail. While I appreciate this is part of the goal, more descriptive words such as 'thalamic' and 'cortical' would make this much easier to understand.

      We agree with the reviewer that a phrase such as “an LTP protocol on thalamic and cortical inputs” would be more descriptive. We chose the words “homoLTP” and “heteroLTP” only to clarify (for the readers) the physiological relevance of these protocols. We thought by using “thalamic” and “cortical” readers may miss this point. However, when for the first time we introduce the words “homoLTP” and “heteroLTP”, we describe which stimulated pathway each refers to.

      Reviewer #2 (Public Review):

      (1) …The experimental schemes in Figs. 1 and 3 (and Fig. 4e and extended data 4a,b) show that one group of animals was subjected to retrieval in the test context at 24 h, then received HFS, which was then followed by a second retrieval session. With this design, it remains unclear what the HFS impacts when it is delivered between these two 24 h memory retrieval sessions.

      We understand that the reviewer has raised the concern that the increase in freezing we observed after the HFS protocol (ex. Fig. 1b, the bar labeled as Wth+24hHFSth) could be caused or modulated by the recall prior to the HFS (Fig. 1a, top branch). To address this concern, in a new group of mice, 24 hours after weak conditioning, we induced the HFS protocol, followed by testing (that is, no testing prior to the HFS protocol). We observed that homoLTP was as effective in mice that were tested prior to the induction protocol as those that were not (Fig. 1b, Extended Data Fig. 1d,e).

      It would be nice to see these data parsed out in a clean experimental design for all experiments (in Figs 1, 3, and 4), that means 4 groups with different treatments that are all tested only once at 24 h, and the appropriate statistical tests (ANOVA). This would also avoid repeating data in different panels for different pairwise comparisons (Fig 1, Fig 3, Fig 4, and extended Fig 4).

      While we understand the benefit of the reviewer’s suggestion, the current presentation of the data was done to match the flow of the text and the delivery of the information throughout the manuscript. We think it is unlikely that the retrieval test prior to the HFS impacts its effectiveness, as confirmed by homosynaptic HFS data (Extended Data Fig. 1d,e). It is beyond the scope of current manuscript to investigate the mechanisms and manipulations related to reconsolidation and retrieval effects.

      (2) … It would be critical to know if LFPs change over 24 h in animals in which memory is not altered by HFS, and to see correlations between memory performance and LFP changes, as two animals displayed low freezing levels. … They would suggest that thalamo-LA potentiation occurs directly after learning+HFS (which could be tested) and is maintained over 24 h.

      We have performed the experiment where we recorded the evoked LFP 2hrs and 24hrs following the weak conditioning protocol. We observed that a weak conditioning protocol that was not followed by an optical LTP protocol on the cortical inputs failed to produce synaptic potentiation of the thalamic inputs (tested 2hrs and 24hrs after the LTP protocol; Extended Data Fig. 5d,e).

      (3) The statistical analyses need to be clarified. All statements should be supported with statistical testing (e.g. extended data 5c, pg 7 stats are missing). The specific tests should be clearly stated throughout. For ANOVAs, the post-hoc tests and their outcomes should be stated. In some cases, 2-way ANOVAs were performed, but it seems there is only one independent variable, calling for one-way ANOVA.

      All the statistical analyses have been revised and the post-hoc tests performed after the ANOVAs are mentioned in the relevant figure legends.

      Reviewer #2 (Recommendations For The Authors):

      The wording "transient" and "persistent" used here in the context of memory seems a bit misleading, as only one timepoint was assessed for memory recall (24 h), at which the memory strength (freezing levels) seem to change.

      As the reviewer mentioned, we have tested memory recall only at one time point. For this reason, throughout the text we used “transient” exclusively to refer to the experience (receiving footshock) and not to the memory. We replaced “persistence” with “stabilization” where it refers to a memory (“the induction of plasticity influences the stabilization of the memory”).

      For the procedures in which the CS and US were not paired, the term "unpairing" is used (which is probably the more adequate one), but the term "non-associative conditioning" appears in the text, which seems a bit misleading, as this term may have another connotation. There is also literature that an unpairing of CS and US could lead to the formation of a safety memory to the CS, that may be disrupted by HFS stimulation.

      We replaced "non-associative" with “unpaired”.

      Validation of viral injection sites for all experiments: Only representative examples are shown, it would be nice to see all viral expression sites.

      For this manuscript, we have used 155 mice. For this reason, including the injection sites for all the animals in the manuscript is not feasible. Except for the mice that have been excluded, (please see exclusion criteria added in the methods), the expression pattern we observed was consistent across animals and therefore the images shown are true representatives.

      Extended Data 1b: Please explain what N, U, W, and S behavioral groups mean. To what groups mentioned in the text (pg 2,3) do these correspond?

      The requested clarifications are implemented in the figure legend.

      Please elaborate on the following aspects of your methods and approaches:

      • Please explain if the protocol for HFS to manipulate behavior was the same as the one used for the LTP experiments (Fig 1d, Fig 4j) and was identical for homo/hetero inputs from thal and ctx?

      We used the same HFS protocol for all the HFS inductions. We included this information in the methods section.

      • Please state when the HFS was given in respect to the conditioning (what means immediately before and after?) and in which context it was given. Were animals subjected to HFS exposed to the context longer (either before or after the conditioning while receiving HFS) than the other groups? When the HFS was given in another context (for the 24 h group)- how was this controlled for?

      Requested information has been added to the methods section. The control and intervention groups were treated in the same way.

      • When were the footshocks given in the anesthesized recordings (Fig. 4j) and how was the temporal relationship to the HFS? Was the timing the same as for the HFS in the behavioral experiments?

      Requested information has been added to the methods section.

      • Please add information on how the LFP was stimulated and how the LFP- EPSP slope was determined in in vivo recordings, likewise for the whole cell recordings of EPSPs in Fig. 5d-f.

      Requested information has been added to the methods section.

      Here, the y-Axis in Fig. 5e should be corrected to EPSP slope rather than fEPSP slope if these are whole-cell recordings.

      This has been corrected.

      • Please include information if the viral injections and opto-manipulations were done bilateral or unilateral and if so in which hemisphere. Likewise, indicate where the LFP recordings were done.

      Requested information has been added to the methods section.

      • Were there any exclusion criteria for animals (e.g. insufficient viral targeting or placement of fibers and electrodes), other than the testing of the optical CS for adverse effects?

      Requested information has been added to the methods section.

      Statistics: In addition to clarifying analytical statistics, please clarify n-numbers for slice recordings (number of animals, number of slices, and number of cells if applicable).

      Requested information has been added to the methods section.

      It would be nice to scrutinize the results in extended data 4b. The freezing levels with U+24h HFS show a strong trend towards an increase, the effect size may be similar to immediate HFS Fig 4f and extended data 4a) if n was increased.

      We agree with the reviewer. To address this point, we added “HomoLTP protocol when delivered 24hrs later, produced an increase in freezing; however, the value was not statistically significant.” To show this point, we used the same scale for freezing in Extended Data Fig. 4a and b.

      In the final experiment (Fig. 5a-c), Fig. 5b seems to show results from only one animal, but behavioral results are from 4 animals (Fig 5c). It would be helpful to see the quantification of potentiation in each animal.

      The results (now with error bar) include all mice.

      Please spell out the abbreviation "STC".

      Now, it is spelled out.

      Page 8 last sentence of the discussion does not seem to fit there.

      The sentence has been removed.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors did not determine how WTh affects Th-LA synapses, as field EPSPs were recorded only after HFS. WTh was required for the effects of HFS, as HFS alone did not produce CR in naïve and/or unpaired controls. As such the effects of the WTh protocol on synaptic strength must be investigated.

      We have performed the experiment where we recorded the evoked LFP 2hrs and 24hrs following the weak conditioning protocol. We observed that a weak conditioning protocol that was not followed by an optical LTP protocol on the cortical inputs failed to produce synaptic potentiation of the thalamic inputs (tested 2hrs and 24hrs after the LTP protocol; Extended Data Fig. 5d,e).

      (2) The authors provide some evidence that their dual opsin approach is feasible, particularly the use of sustained yellow light to block the effects of blue light on ChrimsonR. However, this validation was done using single pulses making it difficult to assess the effect of this protocol on Th input when HFS was used. Without strong evidence that the optogenetic methods used here are fault-proof, the main conclusions of this study are compromised. Why did the authors not use a protocol in which fibers were placed directly in the Ctx and Th while using soma-restricted opsins to avoid cross-contamination?

      We understand that the reviewer raises the possibility that our dual-opsin approach, although effective with single pulses, may fail in higher frequency stimulation protocols (10Hz and 85Hz). To address this concern, in a new group of mice we applied our approach to 10Hz and 85Hz stimulation protocols. We show that our approach is effective in single-pulse as well as in 10Hz and 85Hz stimulation protocols (Fig. 2d-h).

    1. Author response:

      The following is the authors’ response to the current reviews.

      We sincerely appreciate the reviewer’s dedication to evaluating our manuscript and raising essential considerations regarding the classification of the migration behavior we described. While the reviewer suggests that this behavior aligns with the concept of itinerancy, we contend that it represents a distinct phenomenon, albeit with similarities, as both involve the non-breeding movements of birds. We acknowledge that our manuscript did not adequately address this distinction and have considered the reviewer’s feedback. In our response, we clarify the difference between the described phenomenon and itinerancy. Our revised manuscript will include a new section in the Discussion to address this issue comprehensively.

      In the first part of the review, the reviewer emphasizes that the pattern we are describing is consistent with itinerancy. Regardless of the terminology used, we want to highlight the existence of two different types of migratory behavior, both of which involve movement in non-breeding areas.

      The first type, called itinerancy, was first described by Moreau in 1972 in “The Palaearctic-African Bird Migration Systems.” As noted by the reviewer, this behavior involves an alternation of stopovers and movements between different short-term non-breeding residency areas. They usually occur in response to food scarcity in one part of the non-breeding range, causing birds to move to another part of the same range. These movements typically cover distances of 10 to 100 kilometers but are neither continuous nor directional. Moreau (1972) defined itinerancy as prolonged stopovers, normally lasting several months, primarily in tropical regions. He noted observations of certain species disappearing from his study areas in sub-Saharan Africa in December and others appearing, suggesting they may have multiple home ranges during the non-breeding season. Subsequent research, as mentioned by the reviewer, has confirmed itinerancy in many species, particularly among Palaearctic-African migrants in sub-Saharan Africa. In particular, the Montagu’s Harrier has been extensively studied in this regard. The reviewer rightly points out that our study does not include recent findings on this species. In our revised version, we will include references to recent studies, such as those by Trierweiler et al. (2013, Journal of Animal Ecology, 82:107-120) and Schlaich et al. (2023, Ardea, 111:321-342), which show that Montagu’s Harrier has an average of 3-4 home ranges separated by approximately 200 kilometers. These studies suggest that the species spends approximately 1.5 months at each site, with the most extended period typically observed at the last site before migrating to the breeding grounds.

      In the second type, birds undertake a post-breeding migration, arrive in their non-breeding range, and then gradually move in a particular direction throughout the season. This continuous directional movement covers considerable distances and continues throughout the non-breeding period. In our study, this movement covered about 1000 km, comparable to the total migration distance of Rough-legged Buzzards of about 1500 km. As observed in our research, these movements are influenced by external factors such as snow cover. In such cases, the progression of snow cover in a south-westerly direction during winter can prevent birds from finding food, forcing them to continue migrating in the same direction. In essence, this movement represents a prolonged phase of the migration process but at a slower pace. Similar behavior has been documented in buzzards, as reported by Strandberg et al. (2009, Ibis 151:200-206). Although several transmitters in their study stopped working in mid-winter, the authors observed a phenomenon they termed ‘prolonged autumn migration.’

      In the second part of the review, the reviewer questions the need to distinguish between the two behaviors we have discussed. However, we believe these behaviors differ in their structure (with the first being intermittent and often non-directional, whereas the second is continuous and directional) and in their causes (with the first being driven by seasonal food resource cycles and the second by advancing snow cover). We therefore argue that it is worth distinguishing between them. To differentiate these forms of non-breeding movement, we propose to use ‘itinerancy’ for the first type, as described initially by Moreau in 1972, and introduce a separate term for the second behavior. Although ‘slow directional itinerancy’ could be considered, we find it too cumbersome.

      Moreover, ‘itinerancy’ in the literature refers not only to non-breeding movements but also to the use of different nesting sites, e.g., Lislevand et al. (2020, Journal of Avian Biology: e02595), reinforcing its association with movements between multiple sites within habitats. We, therefore, propose that the second behavior be given a distinct name. We acknowledge the reviewer’s point that we did not adequately address this distinction in the Discussion and plan to include a separate section in our paper’s revised version. In the third part of his review, the reviewer suggests an alternative title. Another reviewer, Dr Theunis Piersma, suggested the current title during the first round of reviewing, and we have chosen his version.

      In the fourth part of the review, the reviewer questions whether it is appropriate to discuss the conservation aspect of this study. This type of non-breeding movement raises concerns about accurately determining non-breeding ranges and population dynamics for species that exhibit this behavior. We believe that accurate determination of range and population dynamics is critical to conservation efforts. While this may be less important for species breeding in Europe and migrating to Africa, for which monitoring breeding territories is more feasible, it’s essential for Arctic and sub-Arctic breeding species. Large-scale surveys in these regions have historically been challenging and have become even more so with the end of Arctic cooperation following Russia’s war with Ukraine (Koivurova, Shibata, 2023). For North America and Europe, non-breeding abundance is typically estimated once per season in mid-winter. In North America, these are the so-called Christmas counts (which take place once at the end of December), and in Europe, they are the IWC counts mentioned by the reviewer (as follows from their official website - “The IWC requires a single count at each site, which should be repeated each year. The exact dates vary slightly from region to region, but take place in January or February”). Because of such a single count in mid-winter, non-breeding habitats occupied in autumn and spring will be listed as ‘uncommon’ at best, while south-western habitats where birds are only present in mid-winter will be listed as ‘common.’ However, the situation will be reversed if we consider the time birds spend in these habitats.

      The reviewer also highlights the introduction’s unconventional structure and information redundancy at the beginning. We have chosen this structure and provided basic explanations to improve readability for a wider audience, given eLife’s readership. At the same time, we will certainly take the reviewers’ feedback into account in the revised version. We plan to include the references to modern itinerancy research mentioned above and to add a section on itinerancy to the Discussion.

      We appreciate the reviewer’s input and sincerely thank them for their time and effort in reviewing our paper. While we may not fully agree on the classification of the behavior we describe, we value the opportunity to engage in discussion and believe that presenting arguments and counterarguments to the reader is beneficial to scientific progress.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I much enjoyed reading this manuscript, that is, once I understood what it is about. Titles like "Conserving bird populations in the Anthropocene: the significance of non-breeding movements" are a claim to so-called relevance, they have NOTHING to do with the content of the paper, so once I understood that this paper was about the "Quick quick slow: the foxtrot migration of rough-legged buzzards is a response to habitat and snow" (an alternative title), it was becoming very interesting. So the start of the abstract as well as the introduction is very tedious, as clearly much trouble is taken here to establish reputability. In my eyes this is unnecessary: eLife should be interested in publishing such a wonderful description of such a wonderful migrant in a study that comes to grips with limiting factors on a continental scale!

      We sincerely appreciate your time and effort in reviewing our manuscript. Thank you for your appreciation of our study.

      We agree that the focus of the article should be changed from conservation to migration patterns. We have rewritten the Introduction and Discussion as suggested. We have added the application of this pattern including conservation at the end of the Discussion by completely changing Figure 5. We have also changed the title to the suggested one.

      Not sure that the first paragraph statements that seek to downplay what we know about wintering vs breeding areas are valid (although I see what purpose they serve). Migratory shorebirds have extensively been studied in the nonbreeding areas, for example, including movement aspects (see, as just one example, Verhoeven, M.A., Loonstra, A.H.J., McBride, A.D., Both, C., Senner, N.R. & Piersma, T. (2020) Migration route, stopping sites, and non breeding destinations of adult Black tailed Godwits breeding in southwest Fryslân, The Netherlands. Journal of Ornithology 162, 61-76) and there are very impressive studies on the winter biology of migrants across large scale (for example in Zwarts' Living on the Edge book on the Sahel wetlands). Think also about geese and swans and about seabirds!

      We have rewritten the first paragraph and it now talks about patterns of migratory behavior. We have also rewritten the second paragraph, now it is devoted to studies of movements in the non-breeding period. We explain how our pattern differs from those already studied and give references to the papers you mentioned.

      Directional movements in nonbreeding areas as a function of food (in this case locusts) have really beautifully been described by Almut Schlaich et al in JAnimEcol for Montagu's harriers.

      We have added Montagu's harrier example in the second paragraph of the Introduction and the Discussion. We have added a reference to Schlaich and to Garcia and Arroyo, who suggested that Montagu's harriers have long directional migrations during the non-breeding period.

      Once the paper starts talking buzzards, and the analyses of the wonderful data, all is fine. It is a very competent analysis with a description of a cool pattern.

      Thank you for your appreciation of our study. We hope the revised version is better and clearer.

      However, i would say that it is all a question of spatial scale. The buzzards here respond to changes in food availability, but there is not an animal that doesn't. The question is how far they have to move for an adequate response: in some birds movements of 100s of meters may be enough, and then anything to the scale of rough-legged buzzards.

      In the new version of the manuscript, we emphasize that this is a large distance (about 1000 km), comparable to the distance of the fall and spring migrations (about 1400 km) in lines 70-72 of the Introduction and 379-383 of the Discussion.

      And actually, several of the shorebirds I know best also do a foxtrot, such as red knots and bar-tailed godwits moulting in the Wadden Sea, then spending a few months in the UK estuaries, before returning to the Wadden Sea before the long migrations to Arctic breeding grounds. The publication of the rough-legged buzzard story may help researchers to summarize patterns such as this too. Mu problem with this paper is the framing. A story on the how and why of these continental movements in response to snow and other habitat features would be a grand contribution. Drop Anthropocene, and rethink whether foxtrot should be introduced as a hypothesis or a summary of cool descriptions. I prefer the latter, and recommend eLife to go with that too, rather than encourage "disconnected frames that seek 'respectability'" Good luck, theunis piersma

      We thank the reviewer again for his valuable comments and suggestions. We have changed the framing to the suggested one and removed the Anthropocene from the article.

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the time and effort you have taken to review our manuscript. We have carefully considered all of your comments, including both public and author comments, and provided detailed responses to each of them below. In addition, we would like to address the most important public comments.

      We agree with the suggestion to shift the focus of the article from conservation to migration patterns. Accordingly, we have rewritten both the Introduction and Discussion sections to focus on migration behavior rather than conservation.

      However, we respectfully disagree with the suggestion that the migration patterns we describe are synonymous with itinerancy. We acknowledge that our original presentation may have been unclear and may have hindered full understanding. In the revised version, we provide a detailed analysis of migratory behavior in the Introduction that describes how our pattern differs from itinerancy. We also revisit this distinction in the Discussion section. We have also carefully revised Figure 1 to improve clarity and avoid potential misunderstandings.

      Regarding the applicability of the described migration pattern, we acknowledge that the Rough-legged Buzzard is not listed as an endangered species. However, we believe that our findings have practical implications. We have moved our discussion of this issue to the end of the Discussion section and have completely revised Figure 5. While the overall population of Rough-legged Buzzards is not declining, certain regions within its range are experiencing declines. We show that this decline does not warrant listing the species as endangered. Instead, it may represent a redistribution within the non-breeding range - a shift in range dynamics. We use the example of the Rough-legged Buzzard to illustrate this concept and emphasize the importance of considering such dynamics when assessing the conservation status of species in the future.

      We also acknowledge that the hypothesis of this form of behavior has been proposed previously for Montagu's Harrier, and we have included this information in the revised manuscript. In addition, we agree that the focus on the Anthropocene is unnecessary in this context and have therefore removed it.

      We believe that these revisions significantly improve the clarity and robustness of the manuscript, and we are grateful for your insightful comments and suggestions.

      As a general comment, please note that including line numbers (as it is the standard in any manuscript submission) would facilitate reviewers providing more detailed comments on the text.

      We apologize for this oversight and have added line numbers to our revised manuscript.

      Dataset: unclear what is the frequency of GPS transmissions. Furthermore, information on relative tag mass for the tracked individuals should be reported.

      We have included this information in our manuscript (L 157-163). We also refer to the study in which this dataset was first used and described in detail (L 164).

      Data pre-processing: more details are needed here. What data have been removed if the bird died? The entire track of the individual? Only the data classified in the last section of the track? The section also reports on an 'iterative procedure' for annotating tracks, which is only vaguely described. A piecewise regression is mentioned, but no details are provided, not even on what is the dependent variable (I assume it should be latitude?).

      Regarding the deaths. We only removed the data when the bird was already dead. We have corrected the text to make this clear (L 170).

      Regarding the iterative procedure. We have added a detailed description on lines 175-188.

      Data analysis: several potential issues here:

      (1) Unclear why sex was not included in all mixed models. I think it should be included.

      Our dataset contains 35 females and eight males. This ratio does not allow us to include sex in all models and adequately assess the influence of this factor. At the same time, because adult females disperse farther than males in some raptor species, we conducted a separate analysis of the dependence of migration distance on sex (Table S8) and found no evidence for this in our species. We have written a separate paragraph about this. This paragraph can be found on lines 356-360 of the new manuscript.

      (2) Unclear what is the rationale of describing habitat use during migration; is it only to show that it is a largely unsuitable habitat for the species? But is a formal analysis required then? Wouldn't be enough to simply describe this?

      Habitat use and snow cover determine the two main phases (quick and slow) of the pattern we describe. We believe that habitat analysis is appropriate in this case and that a simple description would be uninformative and would not support our conclusions.

      (3) Analysis of snow cover: such a 'what if' analysis is fine but it seems to be a rather indirect assessment of the effect of snow cover on movement patterns. Can a more direct test be envisaged relating e.g. daily movement patterns to concomitant snow cover? This should be rather straightforward. The effectiveness of this method rests on among-year differences in snow cover and timing of snowfall. A further possibility would be to demonstrate habitat selection within the entire non-breeding home range of an individual in relation snow cover. Such an analysis would imply associating presence-absence of snow to every location within the non-breeding range and testing whether the proportion of locations with snow is lower than the proportion of snow of random locations within the entire non-breeding home range (95% KDE) for every individual (e.g. by setting a 1/10 ratio presence to random locations).

      The proposed analysis will provide an opportunity to assess whether the Rough-legged Buzzard selects areas with the lowest snow cover, but will not provide an opportunity to follow the dynamics and will therefore give a misleading overall picture. This is especially true in the spring months. In March-April, Rough-legged Buzzards move northeast and are in an area that is not the most open to snow. At this time, areas to the southwest are more open to snow (this can be seen in Figure 4b). If we perform the proposed analysis, the control points for this period would be both to the north (where there is more snow) and to the south (where there is less snow) from the real locations, and the result would be that there is no difference in snow cover.

      A step-selection analysis could be used, as we did in our previous work (Curk et al 2020 Sci Rep) with the same Rough-legged Buzzard (but during migration, not winter). But this would only give us a qualitative idea, not a quantitative one - that Rough-legged Buzzards move from snow (in the fall) and follow snowmelt progression (in the spring).

      At the same time, our analysis gives a complete picture of snow cover dynamics in different parts of the non-breeding range. This allows us to see that if Rough-legged Buzzards remained at their fall migration endpoint without moving southwest, they would encounter 14.4% more snow cover (99.5% vs. 85.1%). Although this difference may seem small (14.4%), it holds significance for rodent-hunting birds, distinguishing between complete and patchy snow cover. Simultaneously, if Rough-legged Buzzards immediately flew to the southwest and stayed there throughout winter, they would experience 25.7% less snow cover (57.3% vs. 31.6%). Despite a greater difference than in the first case, it doesn't compel them to adopt this strategy, as it represents the difference between various degrees of landscape openness from snow cover.

      We write about this in the new manuscript on lines 385-394.

      Results: it is unclear whether the reported dispersion measures are SDs or SEs. Please provide details.

      For the date and coordinates of the start and end of the different phases of migration, we specified the mean, sd, and sample size. We wrote this in line 277. For the values of the parameters of the different phases of the migration (duration, distance, speed, and direction), we used the mean, the standard error of the mean, and the confidence interval (obtained using the ‘emmeans’ package). We have indicated this in lines 302-303 and the caption of Table 1 (L 315) and Figure 2 (L 293-294). For the values of habitat and snow cover experienced by the Rough-legged Buzzards, we used the mean and the error of the mean. We reported this on lines 322 and 337 and in Figures 3 (L 332-333) and 4 (L 355-356).

      Discussion: in general, it should be reshaped taking into account the comments. It is overlong, speculative and quite naive in several passages. Entire sections can be safely removed (I think it can be reduced by half without any loss of information). I provide some examples of the issues I have spotted below. For instance, the entire paragraph starting with 'Understanding....' is not clear to me. What do you mean by 'prohibited management' options? Without examples, this seems a rather general text, based on unclear premises when related to the specific of this study. Some statements are vague, derive from unsubstantiated claims, and unclear. E.g. "Despite their scarcity in these habitats, forests appear to hold significant importance for Rough-legged buzzards for nocturnal safety". I could not find any day-night analysis showing that they actually roost in forests during nighttime. Being a tundra species, it may well be possible that rough-legged buzzards perceive forests as very dangerous habitats and that they prefer instead to roost in open habitats. Analysing habitat use during day and night during the non-breeding period may be of help to clarify this. Furthermore, considering the fast migration periods, what is the flight speed during day and night above forests? Do these birds also migrate at night or do they roost during the night? Perhaps a figure visualizing day and night track segments could be of help (or an analysis of day vs. night flight speed) (there are several R packages to annotate tracks in relation to day and night). This is an example of another problematic statement: "The progression of snow cover in the wintering range of Rough-legged buzzards plays a significant role in their winter migration pattern." The manuscript does not contain any clear demonstration of this, as I wrote in my previous comments. Without such evidence, you must considerably tone down such assertions. But since providing a direct link is certainly possible, I think that additional analyses would clearly strengthen your take-home message.

      The paragraph starting with "The quantification of environmental changes that could prove fatal to bird species presents yet another challenge for conservation efforts in an era of rapid global change." is quite odd. Take the following statement "For instance, the presence of small patches of woodland in the winter range might appear crucial to the survival of the Rough-legged buzzard. Elimination of these seemingly minor elements of vegetation cover through management actions could have dire consequences for the species.". It is based on the assumption that minor vegetation elements play a key role in the ecology of the species, without any evidence supporting this. Does it have any sense? I could safely say exactly the opposite and I would believe it might even be more substantiated.

      We agree with these comments.

      We have completely rewritten this section. As suggested, we have shortened it by removing statements that were not supported by the research. We have completely removed the statements about "prohibited management". We have also removed the statement that "forests appear to be of significant importance to Rough-legged buzzards for nocturnal safety" and everything associated with that statement, e.g. the statement about "small elements of vegetation cover", etc. We do believe that this statement is true in substance, but we also agree that it is not supported by the results and requires separate analysis. At the same time, we believe that this is a topic for a separate study and would be redundant here. Therefore, we leave it for a separate publication.

      Conclusion paragraph: I believe this severely overstates the conservation importance of this study. That the results have "crucial implications for conservation efforts in the Anthropocene, where rapidly changing environmental factors can severely impact bird migration" seems completely untenable to me. What is the evidence for such crucial implications? For instance, these results may suggest that climate change, because global warming is predicted to reduce snow cover in the non-breeding areas, might well be beneficial for populations of this species, by reducing non-breeding energy expenditure and improving non-breeding survival. I think statements like these are simply not necessary, and that the study should be more focused on the actual results and evidence provided.

      We have completely rewritten this section. We removed the reference to the Anthropocene and focused on migratory behavior and migration patterns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful for these balanced, nuanced evaluations of our work concerning the observed epistatic trends and our interpretations of their mechanistic origins. Overall, we think the reviewers have done an excellent job at recognizing the novel aspects of our findings while also discussing the caveats associated with our interpretations of the biophysical effects of these mutations. We believe it is important to consider both of these aspects of our work in order to appreciate these advances and what sorts of pertinent questions remain.

      Notably, both reviewers are concerned that our lack of experimental approaches to compare the conformational properties of GnRHR variants weakens our claims. We would first humbly suggest that this constitutes a more general caveat that applies to nearly all investigations of the cellular misfolding of α-helical membrane proteins. Whether or not any current in vitro folding measurements report on conformational transitions that are relevant to cellular protein misfolding reactions remains an active area of debate (discussed further below). Nevertheless, while we concede that our structural and/ or computational evaluations of various mutagenic effects remain speculative, prevailing knowledge on the mechanisms of membrane protein folding suggest our mutations of interest (V276T and W107A) are highly unlikely to promote misfolding in precisely the same way. Thus, regardless of whether or not we were able experimentally compare the relevant folding energetics of GnRHR variants, we are confident that the distinct epistatic interactions formed by these mutations reflect variations in the misfolding mechanism and that they are distinct from the interactions that are observed in the context of stable proteins. In the following, we provide detailed considerations concerning these caveats in relation to the reviewers’ specific comments.

      Reviewer #1 (Public Review):

      The paper carries out an impressive and exhaustive non-sense mutagenesis using deep mutational scanning (DMS) of the gonadotropin-releasing hormone receptor for the WT protein and two single point mutations that I) influence TM insertion (V267T) and ii) influence protein stability (W107A), and then measures the effect of these mutants on correct plasma membrane expression (PME).

      Overall, most mutations decreased mGnRHR PME levels in all three backgrounds, indicating poor mutational tolerance under these conditions. The W107A variant wasn't really recoverable with low levels of plasma membrane localisation. For the V267T variant, most additional mutations were more deleterious than WT based on correct trafficking, indicating a synergistic effect. As one might expect, there was a higher degree of positive correlation between V267T/W107A mutants and other mutants located in TM regions, confirming that improper trafficking was a likely consequence of membrane protein co-translational folding. Nevertheless, context is important, as positive synergistic mutants in the V27T could be negative in the W107A background and vice versa. Taken together, this important study highlights the complexity of membrane protein folding in dissecting the mechanism-dependent impact of disease-causing mutations related to improper trafficking.

      Strengths

      This is a novel and exhaustive approach to dissecting how receptor mutations under different mutational backgrounds related to co-translational folding, could influence membrane protein trafficking.

      Weaknesses

      The premise for the study requires an in-depth understanding of how the single-point mutations analysed affect membrane protein folding, but the single-point mutants used seem to lack proper validation.

      Given our limited understanding of the structural properties of misfolded membrane proteins, it is unclear whether the relevant conformational effects of these mutations can be unambiguously validated using current biochemical and/ or biophysical folding assays. X-ray crystallography, cryo-EM, and NMR spectroscopy measurements have demonstrated that many purified GPCRs retain native-like structural ensembles within certain detergent micelles, bicelles, and/ or nanodiscs. However, helical membrane protein folding measurements typically require titration with denaturing detergents to promote the formation of a denatured state ensemble (DSE), which will invariably retain considerable secondary structure. Given that the solvation provided by mixed micelles is clearly distinct from that of native membranes, it remains unclear whether these DSEs represent a reasonable proxy for the misfolded conformations recognized by cellular quality control (QC, see https://doi.org/10.1021/acs.chemrev.8b00532). Thus, the use and interpretation of these systems for such purposes remains contentious in the membrane protein folding community. In addition to this theoretical issue, we are unaware of any instances in which GPCRs have been found to undergo reversible denaturation in vitro- a practical requirement for equilibrium folding measurements (https://doi.org/10.1146/annurev-biophys-051013-022926). We note that, while the resistance of GPCRs to aggregation, proteolysis, and/ or mechanical unfolding have also been probed in micelles, it is again unclear whether the associated thermal, kinetic, and/ or mechanical stability should necessarily correspond to their resistance to cotranslational and/ or posttranslational misfolding. Thus, even if we had attempted to validate the computational folding predictions employed herein, we suspect that any resulting correlations with cellular expression may have justifiably been viewed by many as circumstantial. Simply put, we know very little about the non-native conformations are generally involved in the cellular misfolding of α-helical membrane proteins, much less how to measure their relative abundance. From a philosophical standpoint, we prefer to let cells tell us what sorts of broken protein variants are degraded by their QC systems, then do our best to surmise what this tells us about the relevant properties of cellular DSEs.

      Despite this fundamental caveat, we believe that the chosen mutations and our interpretation of their relevant conformational effects are reasonably well-informed by current modeling tools and by prevailing knowledge on the physicochemical drivers of membrane protein folding and misfolding. Specifically, the mechanistic constraints of translocon-mediated membrane integration provide an understanding of the types of mutations that are likely to disrupt cotranslational folding. Though we are still learning about the protein complexes that mediate membrane translocation (https://doi.org/10.1038/s41586-022-05336-2), it is known that this underlying process is fundamentally driven by the membrane depth-dependent amino acid transfer free energies (https://doi.org/10.1146/annurev.biophys.37.032807.125904). This energetic consideration suggests introducing polar side chains near the center of a nascent TMDs should almost invariably reduce the efficiency of topogenesis. To confirm this in the context of TMD6 specifically, we utilized a well-established biochemical reporter system to confirm that V276T attenuates its translocon-mediated membrane integration (Fig. S1)- at least in the context of a chimeric protein. We also constructed a glycosylation-based topology reporter for full-length GnRHR, but ultimately found its’ in vitro expression to be insufficient to detect changes in the nascent topological ensemble.

      In contrast to V276T, the W107A mutation is predicted to preserve the native topological energetics of GnRHR due to its position within a soluble loop region. W107A is also unlike V276T in that it clearly disrupts tertiary interactions that stabilize the native structure. This mutation should preclude the formation of a structurally conserved hydrogen bonding network that has been observed in the context of at least 25 native GPCR structures (https://doi.org/10.7554/eLife.5489). However, without a relevant folding assay, the extent to which this network stabilizes the native GnRHR fold in cellular membranes remains unclear. Overall, we admit that these limitations have prevented us from measuring how much V276T alters the efficiency of GnRHR topogenesis, how much the W107A destabilizes the native fold, or vice versa. Nevertheless, given these design principles and the fact that both reduce the plasma membrane expression of GnRHR, as expected, we are highly confident that the structural defects generated by these mutations do, in fact, promote misfolding in their own ways. We also concede that the degree to which these mutagenic perturbations are indeed selective for specific folding processes is somewhat uncertain. However, it seems exceedingly unlikely that these mutations should disrupt topogenesis and/ or the folding of the native topomer to the exact same extent. From our perspective, this is the most important consideration with respect to the validity of the conclusions we have made in this manuscript.

      Furthermore, plasma membrane expression has been used as a proxy for incorrect membrane protein folding, but this not necessarily be the case, as even correctly folded membrane proteins may not be trafficked correctly, at least, under heterologous expression conditions. In addition, mutations can affect trafficking and potential post-translational modifications, like glycosylation.

      While the reviewer is correct that the sorting of folded proteins within the secretory pathway is generally inefficient, it is also true that the maturation of nascent proteins within the ER generally bottlenecks the plasma membrane expression of most α-helical membrane proteins. Our group and several others have demonstrated that the efficiency of ER export generally appears to scale with the propensity of membrane proteins to achieve their correct topology and/ or to achieve their native fold (see https://doi.org/10.1021/jacs.5b03743 and https://doi.org/10.1021/jacs.8b08243). Notably, these investigations all involved proteins that contain native glycosylation and various other post-translational modification sites. While we cannot rule out that certain specific combinations of mutations may alter expression through their perturbation of post-translational GnRHR modifications, we feel confident that the general trends we have observed across hundreds of variants predominantly reflect changes in folding and cellular QC. This interpretation is supported by the relationship between observed trends in variant expression and Rosetta-based stability calculations, which we identified using unbiased unsupervised machine learning approaches (compare Figs. 6B & 6D).

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Chamness and colleagues make a pioneering effort to map epistatic interactions among mutations in a membrane protein. They introduce thousands of mutations to the mouse GnRH Receptor (GnRHR), either under wild-type background or two mutant backgrounds, representing mutations that destabilize GnRHR by distinct mechanisms. The first mutant background is W107A, destabilizing the tertiary fold, and the second, V276T, perturbing the efficiency of cotranslational insertion of TM6 to the membrane, which is essential for proper folding. They then measure the surface expression of these three mutant libraries, using it as a proxy for protein stability, since misfolded proteins do not typically make it to the plasma membrane. The resulting dataset is then used to shed light on how diverse mutations interact epistatically with the two genetic background mutations. Their main conclusion is that epistatic interactions vary depending on the degree of destabilization and the mechanism through which they perturb the protein. The mutation V276T forms primarily negative (aggravating) epistatic interactions with many mutations, as is common to destabilizing mutations in soluble proteins. Surprisingly, W107A forms many positive (alleviating) epistatic interactions with other mutations. They further show that the locations of secondary mutations correlate with the types of epistatic interactions they form with the above two mutants.

      Strengths:

      Such a high throughput study for epistasis in membrane proteins is pioneering, and the results are indeed illuminating. Examples of interesting findings are that: (1) No single mutation can dramatically rescue the destabilization introduced by W107A. (2) Epistasis with a secondary mutation is strongly influenced by the degree of destabilization introduced by the primary mutation. (3) Misfolding caused by mis-insertion tends to be aggravated by further mutations. The discussion of how protein folding energetics affects epistasis (Fig. 7) makes a lot of sense and lays out an interesting biophysical framework for the findings.

      Weaknesses:

      The major weakness comes from the potential limitations in the measurements of surface expression of severely misfolded mutants. This point is discussed quite fairly in the paper, in statements like "the W107A variant already exhibits marginal surface immunostaining" and many others. It seems that only about 5% of the W107A makes it to the plasma membrane compared to wild-type (Figures 2 and 3). This might be a low starting point from which to accurately measure the effects of secondary mutations.

      The reviewer raises an excellent point that we considered at length during the analysis of these data and the preparation of the manuscript. Though we remain confident in the integrity of these measurements and the corresponding analyses, we now realize this aspect of the data required further discussion and documentation which we have provided in the revised version of the manuscript as is described in the following.

      Still, the authors claim that measurements of W107A double mutants "still contain cellular subpopulations with surface immunostaining intensities that are well above or below that of the W107A single mutant, which suggests that this fluorescence signal is sensitive enough to detect subtle differences in the PME of these variants". I was not entirely convinced that this was true.

      We made this statement based on the simple observation that the surface immunostaining intensities across the population of recombinant cells expressing the library of W107A double mutants was consistently broader than that of recombinant cells expressing W107A GnRHR alone (see Author response image 1 for reference). Given that the recombinant cellular library represents a mix of cells expressing ~1600 individual variants that are each present at low abundance, the pronounced tails within this distribution presumably represent the composite staining of many small cellular subpopulations that express collections of variants that deviate from the expression of W107A to an extent that is significant enough to be visible on a log intensity plot.

      Author response image 1.

      Firstly, I think it would be important to test how much noise these measurements have and how much surface immunostaining the W107A mutant displays above the background of cells that do not express the protein at all.

      For reference, the average surface immunostaining intensity of HEK293T cells transiently expressing W107A GnRHR was 2.2-fold higher than that of the IRES-eGFP negative, untransfected cells within the same sample- the WT immunostaining intensity was 9.5-fold over background by comparison. Similarly, recombinant HEK293T cells expressing the W107A double mutant library had an average surface immunostaining intensity that was 2.6-fold over background across the two DMS trials. Thus, while the surface immunostaining of this variant is certainly diminished, we were still able to reliably detect W107A at the plasma membrane even under distinct expression regimes. We have included these and other signal-to-noise metrics for each experiment in the Results section of the revised manuscript.

      Beyond considerations related to intensity, we also previously noticed the relative intensity values for W107A double mutants exhibited considerable precision across our two biological replicates. If signal were too poor to detect changes in variant expression, we would have expected a plot of the intensity values across these two replicates to form a scatter. Instead, we found DMS intensity values for individual variants to be highly correlated from one replicate to the next (Pearson’s R2 = 0.95, see Author response image 2 for reference). This observation empirically demonstrates that this assay consistently differentiated between variants that exhibit slightly enhanced immunostaining from those that have even lower immunostaining than W107A GnRHR. We have included these discussion points in the Results section as well as scatter plots for replicate variant intensities within all three genetic backgrounds in Figure S3 of the revised manuscript.

      Author response image 2.

      But more importantly, it is not clear if under this regimen surface expression still reports on stability/protein fitness. It is unknown if the W107A retains any function or folding at all. For example, it is possible that the low amount of surface protein represents misfolded receptors that escaped the ER quality control.

      While we believe that such questions are outside the scope of this work, we certainly agree that it is entirely possible that some of these variants bypass QC without achieving their native fold. This topic is quite interesting to us but is quite challenging to assess in the context of GPCRs, which have complex fitness landscapes that involve their propensity to distinguish between different ligands, engage specific components associated with divergent downstream signaling pathways, and navigate between endocytic recycling/ degradation pathways following activation. In light of the inherent complexity of GPCR function, we humbly suggest our choice of a relatively simple property of an otherwise complex protein may be viewed as a virtue rather than a shortcoming. Protein fitness is typically cast as the product of abundance and activity. Rather than measuring an oversimplified, composite fitness metric, we focused on one variable (plasma membrane expression) and its dominant effector (folding). We believe restraining the scope in this manner was key for the elucidation of clear mechanistic insights.

      The differential clustering of epistatic mutations (Fig. 6) provides some interesting insights as to the rules that dictate epistasis, but these too are dominated by the magnitude of destabilization caused by one of the mutations. In this case, the secondary mutations that had the most interesting epistasis were exceedingly destabilizing. With this in mind, it is hard to interpret the results that emerge regarding the epistatic interactions of W107A. Furthermore, the most significant positive epistasis is observed when W107A is combined with additional mutations that almost completely abolish surface expression. It is likely that either mutation destabilizes the protein beyond repair. Therefore, what we can learn from the fact that such mutations have positive epistasis is not clear to me. Based on this, I am not sure that another mutation that disrupts the tertiary folding more mildly would not yield different results. With that said, I believe that the results regarding the epistasis of V276T with other mutations are strong and very interesting on their own.

      We agree with the reviewer. In light of our results we believe it is virtually certain that the secondary mutations characterized herein would be likely to form distinct epistatic interactions with mutations that are only mildly destabilizing. Indeed, this insight reflects one of the key takeaway messages from this work- stability-mediated epistasis is difficult to generalize because it should depend on the extent to which each mutation changes the stability (ΔΔG) as well as initial stability of the WT/ reference sequence (ΔG, see Figure 7). Frankly, we are not so sure we would have pieced this together as clearly had we not had the fortune (or misfortune?) of including such a destructive mutation like W107A as a point of reference.

      Additionally, the study draws general conclusions from the characterization of only two mutations, W107A and V276T. At this point, it is hard to know if other mutations that perturb insertion or tertiary folding would behave similarly. This should be emphasized in the text.

      We agree. Our findings suggest different mutations may not behave similarly, which we believe is a key finding of this work. We have emphasized this point in the Discussion section of the revised manuscript as follows:

      “These findings suggest the folding-mediated epistasis is likely to vary among different classes of destabilizing mutations in a manner that should also depend on folding efficiency and/ or the mechanism(s) of misfolding in the cell.”

      Some statistical aspects of the study could be improved:

      (1) It would be nice to see the level of reproducibility of the biological replicates in a plot, such as scatter or similar, with correlation values that give a sense of the noise level of the measurements. This should be done before filtering out the inconsistent data.

      We thank the reviewer for this suggestion and will include scatters for each genetic background like the one shown above in Figure S3 of the revised version of the manuscript.

      (2) The statements "Variants bearing mutations within the C- terminal region (ICL3-TMD6-ECL3-TMD7) fare consistently worse in the V276T background relative to WT (Fig. 4 B & E)." and "In contrast, mutations that are 210 better tolerated in the context of W107A mGnRHR are located 211 throughout the structure but are particularly abundant among residues 212 in the middle of the primary structure that form TMD4, ICL2, and ECL2 213 (Fig. 4 C & F)." are both hard to judge. Inspecting Figures 4B and C does not immediately show these trends, and importantly, a solid statistical test is missing here. In Figures 4E and F the locations of the different loops and TMs are not indicated on the structure, making these statements hard to judge.

      We apologize for this oversight and thank the reviewer for pointing this out. We utilized paired Wilcoxon-Signed Rank Tests to evaluate the statistical significance of these observations and modified the description of these findings in the revised version of the results section as follows:

      “Variants bearing mutations within the C-terminal regions including ICL3, TMD6, and TMD7 fare consistently worse in the V276T background relative to WT (paired Wilcoxon-Signed Rank Test p-values of 0.0001, 0.02, and 0.005, respectively) (Fig. 4 B & E). Given that V276T perturbs the cotranslational membrane integration of TMD6 (Fig. S1, Table S1), this directional bias potentially suggests that the apparent interactions between these mutations manifest during the late stages of cotranslational folding. In contrast, mutations that are better tolerated in the context of W107A mGnRHR are located throughout the structure but are particularly abundant among residues in the middle of the primary structure that form ICL2, TMD4, and ECL2 (paired Wilcoxon-Signed Rank Test p-values of 0.0005, 0.0001, and 0.004, respectively) (Fig. 4 C & F).”

      (3) The following statement lacks a statistical test: "Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD)." Is this enrichment significant? Further in the same paragraph, the claim that "In contrast to the sparse epistasis that is generally observed between mutations within soluble proteins, these findings suggest a relatively large proportion of random mutations form epistatic interactions in the context of unstable mGnRHR variants". Needs to be backed by relevant data and statistics, or at least a reference.

      We thank the reviewer for this reasonable suggestion. In the revised manuscript, we included the results of a paired Wilcoxon-Signed Rank Test that confirms the statistical significance of this observation and modified the Results section to reflect this as follows:

      “Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD, Fisher’s Exact Test p = 0.0019). These findings suggest random mutations form epistatic interactions in the context of unstable mGnRHR variants in a manner that depends on the specific folding defect (V276T vs. W107A) and topological context.”

      Reviewer #1 (Recommendations for the Authors):

      As far as this reviewer is aware, the effect of the V267T variant on MP insertion has not been measured directly; its position corresponds to T277 in TMD6 of human GnRHR that has been measured for TM insertion, but given the clear lack of conservation (threonine vs valine) the mutation in TM6 could potentially have a different impact on the mouse homologue. Please clarify what the predicted delta TM for insertion is between human and mouse GnRHR is? Moreover, I would argue that single TM insertion by tethering to Lep is insufficient to understand MP insertion/folding, as neighbouring TM helices could help to drive TM6 insertion. Has ER microsome experiments for mouse GnRHR also been carried out in the context of neighbouring helices?

      We included measurements (and predictions) of the impact of the V276T substitution on the translocon-mediated membrane integration of the mouse TMD6 in the context of a chimeric Lep protein (see Fig. S1 & Table S1). Our results reveal that this substitution decreases the efficiency of TMD6 membrane integration by ~10%. Though imperfect, this prevailing biochemical assay remains popular for a variety of theoretical and technical reasons. Importantly, extensive experimental testing of this system has shown that these measurements report apparent equilibrium constants that are well-described by two-state equilibrium partitioning models (see DOIs 10.1038/nature03216 and 10.1038/nature06387). This observation provides a reasonable rationale to interpret these measurements using energetic models as we have in this work (see Table S1). From a technical perspective, the Lep system is also advantageous due to the fact that this protein is generally well expressed in the context of in vitro translation systems containing native membranes, which generally ensures a consistent signal to noise and dynamic range for membrane integration measurements. Nevertheless, the reviewers are correct that membrane integration efficiencies are likely distinct in the context of the native mGnRHR protein. For these reasons, we attempted to develop a glycosylation-based topology reporter prior to the posting and submission of this manuscript. However, all GnRHR reporters we tested were poorly expressed in vitro and the resulting 35S-labeled proteins only generated faint smears on our phosphorimaging screens that could not be interpreted. For these reasons, we chose to rely the Lep measurements for these investigations.

      The lack of a more relevant topological reporter is one of many challenges we faced in our investigations of this unstable, poorly behaved protein. We share the reviewer’s frustrations concerning the speculative aspects of this work. Nevertheless, there is increasing appreciation for the fact that our perspectives on protein biophysics have been skewed by our continuing choice to focus on the relatively small set of model proteins that are compatible with our favored methodologies (doi: 10.1016/j.tibs.2013.05.001). We humbly suggest this work represents an example of how we can gain a deeper understanding of the limits of biochemical systems when we instead choose to study the unsavory bits of cellular proteomes. But this choice requires a willingness to make some reasonable assumptions and to lean on energetic/ structural modeling from time to time. Despite this limitation, we believe there is still tremendous value in this compromise.

      What is the experimental evidence the W107A variant affects the protein structure? Has its melting temperature with and without inverse agonist binding for WT vs the W107A variant been measured, for example? Even heat-FSEC of detergent-solubilised membranes would be informative to know how unstable the W107A variant is. If is very unstable in detergent, then it could be that recovery mutants are going to be unlikely as you are already starting with a poor construct showing poor folding/localisation.

      We again understand the rationale for this concern, but do not believe that thermal melting measurements are likely to report the same sorts of conformational transitions involved in cellular misfolding. Heating up a protein to the point in which membranes (or micelles) are disrupted and the proteins begin to form insoluble aggregates is a distinct physical process from those that occur during co- and post-translational folding within intact ER membranes at physiological temperatures (discussed further in the Response to the Reviews). Indeed, as the reviewer points out below, there seems to be little evidence that secretion is linked to thermal stability or various other metrics that others have attempted to optimize for the sake of purification and/ or structural characterization. Thus, we believe it would be just as speculative to suggest thermal aggregation represents a relevant metric for the propensity of membrane proteins to fold in the cell. The physical interpretation of membrane protein misfolding reaction remains contentious in our field due to the key fact that the denatured states of helical membrane proteins remain highly structured in a manner that is hard to generalize beyond the fact that the denatured states retain α-helical secondary structure (doi: 10.1146/annurev-biophys-051013-022926). This is in stark contrast to soluble proteins, where random coil reference states have proven to be generally useful for energetic interpretations of protein stability. For reference, our lab is currently working to leverage epistatic measurements like this to map the prevailing physiological denatured states of an integral membrane protein. Our current findings suggest that non-native electrostatic interactions form in the context of misfolded states. We hope that more information on the structural aspects of these states will help us to develop and interpret meaningful folding measurements within the membrane.

      For reference, even in cases when quantitative folding measurements can be achieved, their relevance remains actively debated. As a point of reference, the corresponding author of this work previously worked on the stability and misfolding of another human α-helical membrane protein (PMP22). Like GnRHR, PMP22 is prone to misfolding in the secretory pathway and is associated with dozens of pathogenic mutations that cause protein misfolding. To understand how the thermodynamic stability of this protein is linked to secretion, the corresponding author purified PMP22, reconstituted it into n-Dodecyl-phosphocholine (DPC) micelles, and measured its resistance to denaturation by an anionic denaturing detergent (Lauryl Sarcosine, LS). The results were initially perplexing due to the fact that equilibrium unfolding curves manifested as an exponential decay (rather than a sigmoid) and relaxation kinetics appeared to be dominated by the rate constant for unfolding (doi: 10.1021/bi301635f). Unfortunately, these data could not be fit with existing folding models due to the lack of a folded protein baseline and the absence of a folding arm in the chevron plot. We eventually found that a full sigmoidal unfolding transition and refolding kinetics could be measured upon addition of 15% (v/v) glycerol. Our measurements revealed that the free energy of unfolding in DPC micelles was 0 kcal/ mol (without glycerol). This shocking lack of WT stability made it impossible to directly measure the effects of destabilizing mutations that enhance misfolding- you can’t measure the unfolding of a protein that is already unfolded. We ultimately had to instead infer the energetic effects of such mutations from the thermodynamic coupling between cofactor binding and folding (doi: 10.1021/jacs.5b03743). Finally, after demonstrating the resulting ΔΔGs correlated with both cellular trafficking and disease phenotype, we still faced justified scrutiny about the relevance of these measurements due to the fact that they were carried out in micelles. For these reasons, we do not feel that additional biophysical measurements will add much to this work until more is understood about the nature of misfolding reactions in the membrane and how to effectively recapitulate it in vitro. We also note that PMP22 is secreted with 20% efficiency in mammalian cell lines, which is 20-fold more efficient than human GnRHR under similar conditions (doi: 10.1016/j.celrep.2021.110046). Thus, we suspect equilibrium unfolding measurements are likely out of reach using previously described measurements.

      Our greatest evidence suggesting W107A destabilizes the protein has to do with the fact that it deletes a highly conserved structural contact and that this structural modification kills its secretion. The fact that this mutation clearly reduces the escape of GnRHR from ER quality control is a classic indicator of misfolding that represents the cell’s way of telling us that the mutation compromises the folding of the nascent protein in some way or another. Precisely how this mutation remodels the nascent conformational ensemble of nascent GnRHR and how this relates to the free energy difference between the native and non-native portions of its conformational ensemble under cellular conditions is a much more challenging question that lies beyond the scope of this investigation (and likely beyond the scope of what’s currently possible). Indeed, there is an entire field dedicated to understanding such. Nevertheless, the difference in the epistatic interactions formed by W107A and V276T is at the very least consistent with our speculative interpretation that these two mutations vary in their misfolding mechanism and/ or in the extent to which they destabilize the protein. For these reasons, we feel the main conclusions of this manuscript are well-justified.

      Please clarify if the protein is glycosylated or not and, if it is, how would this requirement affect the conclusions of your analysis?

      As we noted in the Response to the Reviewers, which also constitutes a published portion of the final manuscript, this protein is indeed glycosylated. We were well aware of this aspect of the protein since inception of this project and do not think this changes our interpretation at all. Most membrane proteins are glycosylated, and several groups have demonstrated in various ways that the secretion efficiency of glycoproteins is proportional to certain stability metrics for secreted soluble proteins and membrane proteins alike. Generally, mutations that enhance misfolding do not change the propensity of the nascent chain to undergo N-linked glycosylation, which occurs during translation before protein synthesis and/ or folding is complete. Misfolded proteins typically carry lower weight glycans, which reflects their failure to advance from the ER to the Golgi, where N-linked glycans are modified and O-linked glycans are added. From our perspective, glycosyl modifications just ensure that nascent proteins are engaged by calnexin and other lectin chaperones involved in QC. It does not decouple folding from secretion efficiency. In the case of PMP22 (described above), we found that removal of its glycosylation site allows the nascent protein to bypass the lectin chaperones in a manner that enhances its plasma membrane expression eight-fold (doi: 10.1016/j.jbc.2021.100719). Similar to WT, the expression of several misfolded PMP22 variants also significantly increases upon removal of the glycosylation site. Nevertheless, their expression is still significantly lower than the un-glycosylated WT protein, and the expression patterns of the mutants relative to WT was quite similar across this panel of un-glycosylated proteins. Thus, while glycosylation certainly impacts secretion, it does not change its dependence on folding efficiency within the ER. There are many layers of partially redundant QC within the ER, and it seems that folding imposes a key bottleneck to secretion regardless of which QC proteins are involved. For these reasons, we do not think glycosylation (or other PTMs) should factor into our interpretation of these results.

      One caveat with the study is that there is a poor understanding of the factors that decide if the protein should be trafficked to the PM or not. Even secretory proteins not going through the calnexin/reticulum cycle (as they have no N-linked glycans), might still get stuck in the ER, despite the fact they are functional. Could this be a technical issue of heterologous expression overloading the Sec system?

      While we agree that there is much to be learned about this topic, we disagree with the notion that our understanding of folding and secretion is insufficient to generally interpret the molecular basis of the observed trends. In collaboration with various other groups, the corresponding author of this paper has shown for several other proteins that the stability of the native topology and the native tertiary structure can constrain secretion efficiency (see dois: 10.1021/jacs.8b08243, 10.1021/jacs.5b03743, and 10.1016/j.jbc.2021.100423). Moreover, the Balch and Kelly groups demonstrated many years ago that relatively simple models for the coupling between folding and chaperone binding can recapitulate the observed effects of mutations on the secretion efficiency of various proteins (doi: 10.1016/j.cell.2007.10.025). Given a wide body of prevailing knowledge in this area, we believe it is entirely reasonable to assume that the conformational effects of these mutation have a dominant effect on plasma membrane expression.

      Whether or not some of the proteins retained in the ER are folded and/ or functional is an interesting question, but is outside the scope of this work. Various lines of evidence concerning approaches to rescue misfolded membrane proteins suggest many of these variants are likely to retain residual function once they escape the ER, which may suggest there are pockets of foldable/ folded proteins within the ER. But it seems generally clear that the efficiency of folding in the ER bottlenecks secretion regardless of whether or not the ER contains some fraction of folded/ functional protein. We note that it is certainly possible, if not likely, that secretion efficiency is likely to be higher at lower expression levels (doi: 10.1074/jbc.AC120.014940). However, the mutational scanning platform used in this work was designed such that all variants are expressed from an identical promoter at the same location within the genome. Thus, for the purposes of these investigations, we believe it is entirely fair to draw “apples-to-apples” comparisons of their relative effects on plasma membrane expression.

      Please see Francis Arnold's paper on this point and their mutagenesis library of the channelrhodopsin (https://www.pnas.org/doi/10.1073/pnas.1700269114), which further found that 20% of mutations improved WT trafficking. Some general comparisons to this paper might be informative.

      We agree that it may be interesting to compare the results from this paper to those in our own. Indeed, we find that 20% of the point mutations characterized herein also enhance the expression of WT mGnRHR, as mentioned in the Results section. However, we think it might be a bit premature to suggest this is a more general trend in light of the fact that the channelrhodopsins engineered in those studies were not of eukaryotic origin and have likely resulted from distinct evolutionary constraints. We ultimately decided against adding more on this to our already lengthy discussion in order to maintain focus on the mechanisms of epistasis.

      Chris Tate and others have shown that there is a high frequency of finding stabilising point mutations in GPCRs and this is the premise of the StAR technology used to thermostabilise GPCRs in the presence of different ligands, i.e. agonist vs inverse agonists. As far as I am aware, there is a poor correlation between expression levels and thermostability (measured by ligand binding to detergent-solubilised membranes). As such, it is possible that some of the mutants might be more stable than WT even though they have lower levels of PME.

      We believe the disconnect between thermostability and expression precisely speaks to our main point about the suitability of current membrane protein folding assays for the questions we address herein. The degradative activity of ER quality control has not necessarily selected for proteins that are resistant to thermal degradation and/ or are suitable for macromolecular crystallography. For this reason, it is often not so difficult to engineer proteins with enhanced thermal stability. We do not believe this disconnect signals that quality control is insensitive to protein folding and stability, but rather that it is more likely to recognize conformational defects that are distinct from those involved in thermal degradation and/ or aggregation. Indeed, recent work from the Fluman group, which builds on a wider body of previous observations, has shown that the exposure of polar groups within the membrane is a key factor that recruits degradation machinery (doi: 0.1101/2023.12.12.571171). It is hard to imagine that these sorts of conformational defects are the same as those involved in thermal aggregation.

      Reviewer #2 (Recommendations For The Authors):

      (1) I believe that by focusing more on the epistasis with V276T, and less on W107A, the paper could be strengthened significantly.

      We appreciate this sentiment. But we believe the comparison of these two mutants really drive home the point that destabilizing mutations are not equivalent with respect to the epistatic interactions they form.

      (2) In the abstract - please define the term epistasis in a simple way, to make it accessible to a general audience. For example - negative epistasis means that... this should be explicitly explained.

      We thank the reviewer for this suggestion. To meet eLife formatting, we had to cut down the abstract significantly. We simplified this as best we could in the following statement:

      “Though protein stability is known to shape evolution, it is unclear how cotranslational folding constraints modulate the synergistic, epistatic interactions between mutations.”

      We also define positive and negative epistasis in the results section as follows:

      “Positive Ɛ values denote double mutants that have greater PME than would be expected based on the effects of single mutants. Negative Ɛ values denote double mutants that have lower PME than would be expected based on the effects of single mutants. Pairs of mutations with Ɛ values near zero have additive effects on PME.”

      (3) The title is quite complex and might deter readers from outside the protein evolution field. Consider simplifying it.

      We thank the reviewer for this suggestion. We have simplified the title to the following:

      “Divergent Folding-Mediated Epistasis Among Unstable Membrane Protein Variants”

      (4) The paper could benefit from a simple figure explaining the different stages of membrane protein folding (stages 1+2) to make it more accessible to readers from outside the membrane protein field.

      This is a great suggestion. We incorporated a new schematic in the revised manuscript that outlines the nature of these processes (see Fig. 1A in the revised manuscript).

      (5) For the FACS-Seq experiment - it was not clear to me if and when all cells are pulled together. For example - are the 3 libraries mixed together already at the point of transfection, or are the transfected cells pulled together at any point before sorting? This could have some implications on batch effects and should, therefore, be explicitly mentioned in the main text.

      We thank the reviewer for this suggestion. We modified the description of the DNA library assembly to emphasize that the mutations were generated in the context of three mixed plasmid pools, which were then transfected into the cells and sorted independently:

      “We then generated a mixed array of mutagenic oligonucleotides that collectively encode this series of substitutions (Table S3) and used nicking mutagenesis to introduce these mutations into the V276T, W107A, and WT mGnRHR cDNAs (Medina-Cucurella et al., 2019), which produced three mixed plasmid pools.”

      (6) The following description in the text is quite confusing. It would be better to simplify it considerably or remove it: "scores (Ɛ) were then determined by taking the log of the double mutant fitness value divided by the difference between the single mutant fitness values (see Methods)."

      We thank the reviewer for this valuable feedback and have simplified the text as follows:

      “To compare epistatic trends in these libraries, we calculated epistasis scores (Ɛ) for the interactions that these 251 mutations form with V276T and W107A by comparing their relative effects on PME of the WT, V276T, and W107A variants using a previously described epistasis model (product model, see Methods) (Olson et al. 2014).”

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-of-the-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings. Moreover, it enables the visualization of actual cell locations, allowing for the examination of spatial properties (e.g., Figure 4G).

      We thank the reviewer for pointing out the technical novelty of this work.

      Weaknesses:

      There is a notable deviation from several observations obtained through conventional electrophysiological recordings. Particularly, as mentioned below in detail, the considerable differences in baseline firing rates and no observations of ripple-triggered firing patterns raise some concerns about potential artifacts from imaging and analysis, such as cell toxicity, abnormal excitability, and false detection of spikes. While these findings are intriguing if the validity of these methods is properly proven, accepting the current results as new insights is challenging.

      We appreciate the reviewer’s insightful comments regarding the intriguing aspect of our findings. Indeed, the emergence of a novel form of CA1 population synchrony presents exciting implications for hippocampal memory research and beyond.

      While we acknowledge the deviations from conventional electrophysiological recordings, we respectfully contend that these differences do not necessarily imply methodological flaws. All experiments and analyses were conducted with meticulous adherence to established standards in the field.

      Regarding the observed variations in averaging firing rates, it is important to note the well-documented heterogeneity in CA1 pyramidal neuron firing rates, spanning from 0.01 to 10 Hz, with a skewed distribution toward lower frequencies (Mizuseki et al., 2013). Our exclusion criteria for neurons with low estimated firing rates may have inadvertently biased the selection towards more active neurons. Moreover, prior research has indicated that averaging firing rates tend to increase during exposure to novel environments (Karlsson et al., 2008), and among deep-layer CA1 pyramidal neurons (Mizuseki et al., 2011). Given our recording setup in a highly novel environment and the predominance of deep CA1 pyramidal neurons in our sample, the observed higher averaging firing rates could be influenced by these factors. Considering these points, our mean firing rates (3.2 Hz) are reasonable estimations compared to previously reported values obtained from electrophysiological recordings (2.1 Hz in McHugh et al., 1996 and 2.4-2.6 Hz in Buzsaki et al., 2003).

      Regarding concerns about potential cell toxicity, previous studies have shown that Voltron expression and illumination do not significantly alter membrane resistance, membrane capacitance, resting membrane potentials, spike amplitudes, and spike width (see Abdelfattah 2019, Science, Supplementary Figure 11 and 12). In our recordings, imaged neurons exhibit preserved membrane and dendritic morphology during and after experiments (Author response image 1), supporting the absence of significant toxicity.

      Author response image 1.

      Voltron-expressing neurons exhibit preserved membrane and dendritic morphology. (A) Images of two-photon z-stack maximum intensity projection showing Voltron-expressing neurons taken after voltage image experiments in vivo. (B) Post-hoc histological images of neurons being voltage-imaged.

      Regarding spike detection, we use validated algorithms (Abdelfattah et al., 2019 and 2023) to ensure robust and reliable detection of spikes. Spiking activity was first separated from slower subthreshold potentials using high-pass filtering. This way, a slow fluorescence increase will not be detected as a spike, even if its amplitude is large. We benchmarked the detection algorithm in computer simulation. The sensitivity and specificity of the algorithm exceed 98% at the level of signal-to-noise ratio of our recordings. While we acknowledge that a small number of spikes, particularly those occurring later in a burst, might be missed due to their smaller amplitudes (as illustrated in Figure 1 and 2 of the manuscript), we anticipate that any missed spikes would lead to a decrease rather than an increase in synchrony between neurons. Overall, we are confident that spike detection is performed in a rigorous and robust manner.

      To further strengthen these points, we will include the following in the revision:

      (1) Histological images of recorded neurons during and after experiments.

      (2) Further details regarding the validation of spike detection algorithms.

      (3) Analysis of publicly available electrophysiological datasets.

      (4) Discussion regarding the reasons behind the novelty of some of our findings compared to previous observations.

      In conclusion, we assert that our experimental and analysis approach upholds rigorous standards. We remain committed to reconciling our findings with previous observations and welcome further scrutiny and engagement from the scientific community to explore the intriguing implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased-locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      We thank the reviewer for a thorough and thoughtful review of our paper.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single-cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population-level activity in CA1.

      We thank the reviewer for pointing out the technical strength and the novelty of our observations.

      Weaknesses:

      The evidence provided is weak, with the authors making surprising population-level claims based on a very sparse data set (5 data sets, each with less than 20 neurons simultaneously recorded) acquired with exciting, but less tested technology. Further, while the authors link these observations to the novelty of the context, both in the title and text, they do not include data from subsequent visits to support this. Detailed comments are below:

      We understand the reviewer’s concerns regarding the size of the dataset. Despite this limitation, it is important to note that synchronous ensembles beyond what could be expected from chance (jittering) were detected in all examined data. In the revision, we plan to add more data, including data from subsequent visits, to further strengthen our findings.

      (1) My first question for the authors, which is not addressed in the discussion, is why these events have not been observed in the countless extracellular recording experiments conducted in rodent CA1 during the exploration of novel environments. Those data sets often have 10x the neurons simultaneously recording compared to these present data, thus the highly synchronous firing should be very hard to miss. Ideally, the authors could confirm their claims via the analysis of publicly available electrophysiology data sets. Further, the claim of high extra-SWR synchrony is complicated by the observation that their recorded neurons fail to spike during the limited number of SWRs recorded during behavior- again, not agreeing with much of the previous electrophysiological recordings.

      We understand the reviewer’s concern. We will examine publicly available electrophysiology datasets to gain further insights into any similarities and differences to our findings. Based on these results, we will discuss why these events have not been previously observed/reported.

      (2) The authors posit that these events are linked to the novelty of the context, both in the text, as well as in the title and abstract. However, they do not include any imaging data from subsequent days to demonstrate the failure to see this synchrony in a familiar environment. If these data are available it would strengthen the proposed link to novelty if they were included.

      We thank the reviewer’s constructive suggestion. We will acquire more datasets from subsequent visits to gain further insights into these synchronous events.

      3) In the discussion the authors begin by speculating the theta present during these synchronous events may be slower type II or attentional theta. This can be supported by demonstrating a frequency shift in the theta recording during these events/immobility versus the theta recording during movement.

      We thank the reviewer’s constructive suggestion. We did demonstrate a frequency shift to a lower frequency in the synchrony-associated theta during immobility than during locomotion (see Fig. 4B, the red vs. blue curves). We will enlarge this panel and specifically refer to it in the corresponding discussion paragraph.

      (4) The authors mention in the discussion that they image deep-layer PCs in CA1, however, this is not mentioned in the text or methods. They should include data, such as imaging of a slice of a brain post-recording with immunohistochemistry for a layer-specific gene to support this.

      We thank the reviewer’s constructive suggestion. We do have images of brain slices post-recordings (Author response image 2). Imaged neurons are clearly located in the deep CA1 pyramidal layer. We will add these images and quantification in the revised manuscript.

      Author response image 2.

      Imaged neurons are located in the deep pyramidal layer of the dorsal hippocampal CA1 region.

      Reviewer #3 (Public Review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected in the other side of the brain, and the investigation is flawed due to multiple problems with the point process analyses. The synchrony terminology refers to dozens of milliseconds as opposed to the millisecond timescale referred to in prior work, and the interpretations do not take into account theta phase locking as a simple alternative explanation.

      We genuinely appreciate the reviewer’s feedback and acknowledge the concerns raised. However, we believe these concerns can be effectively addressed without undermining the validity of our conclusions. With this in mind, we respectfully disagree with the assessment that our experiments and investigation are flawed. Please allow us to address these concerns and offer additional context to support the validity of our study.

      Weaknesses:

      The two main messages of the manuscript indicated in the title are not supported by the data. The title gives two messages that relate to CA1 pyramidal neurons in behaving head-fixed mice: (1) synchronous ensembles are associated with theta (2) synchronous ensembles are not associated with ripples.

      There are two main methodological problems with the work:

      (1) Experimentally, the theta and ripple signals were recorded using electrophysiology from the opposite hemisphere to the one in which the spiking was monitored. However, both signals exhibit profound differences as a function of location: theta phase changes with the precise location along the proximo-distal and dorso-ventral axes, and importantly, even reverses with depth. And ripples are often a local phenomenon - independent ripples occur within a fraction of a millimeter within the same hemisphere, let alone different hemispheres. Ripples are very sensitive to the precise depth - 100 micrometers up or down, and only a positive deflection/sharp wave is evident.

      We appreciate the reviewer’s consideration regarding the collection of LFP from the contralateral hemisphere. While we acknowledge the limitation of this design, we believe that our findings still offer valuable insights into the dynamics of synchronous ensembles. Despite potential variations in theta phases with recording locations and depth, we find that the occurrence and amplitudes of theta oscillations are generally coordinated across hemispheres (Buzsaki et al., Neurosci., 2003). Therefore, the presence of prominent contralateral LFP theta around the times of synchronous ensembles in our study (see Figure 4A of the manuscript) strongly supports our conclusion regarding their association with theta oscillations, despite the collection of LFP from the opposite hemisphere.

      In addition, in our manuscript, we specifically mentioned that the “preferred phases” varied from session to session, likely due to the variability of recording locations (see Line 254-256). Therefore, we think that the reviewer’s concern regarding theta phase variability has already been addressed in the present manuscript.

      Regarding ripple oscillations, while we recognize that they can sometimes occur locally, the majority of ripples occur synchronously in both hemispheres (up to 70%, see Szabo et al., Neuron, 2022; Buzsaki et al., Neurosci., 2003). Therefore, using contralateral LFP to infer ripple occurrence on the ipsilateral side has been a common practice in the field, employed by many studies published in respectable journals (Szabo et al., Neuron, 2022; Terada et al., Nature, 2021; Dudok et al., Neuron, 2021; Geiller et al., Neuron, 2020). Furthermore, our observation that 446 synchronous ensembles during immobility do not co-occur with contralateral ripples, and the remaining 313 ensembles during locomotion are not associated with ripples, as ripples rarely occur during locomotion. Therefore, our conclusion that synchronous ensembles are not associated with ripple oscillations is supported by data.

      (2) The analysis of the point process data (spike trains) is entirely flawed. There are many technical issues: complex spikes ("bursts") are not accounted for; differences in spike counts between the various conditions ("locomotion" and "immobility") are not accounted for; the pooling of multiple CCGs assumes independence, whereas even conditional independence cannot be assumed; etc.

      We acknowledge the reviewer’s concern regarding spike train analysis. Indeed, complex bursts or different behavioral conditions can lead to differences in spike counts that could potentially affect the detection of synchronous ensembles. However, our jittering procedure (see Line 121-132) is designed to control for the variation of spike counts. Importantly, while the jittered spike trains also contain the same spike count variations, we found 7.8-fold more synchronous events in our data compared to jitter controls (see Figure 1G of the manuscript), indicating that these factors cannot account for the observed synchrony.

      To explicitly demonstrate that complex bursts cannot account for the observed synchrony, we have performed additional analysis to remove all latter spikes in bursts and only count the single and the first spikes of bursts. Importantly, we found that this procedure did not change the rate and size of synchronous ensembles, nor did it significantly alter the grand-average CCG (see Author response image 3). The results of this analysis explicitly rule out a significant effect of complex spikes on the analysis of synchronous ensembles.

      Author response image 3.

      Population synchrony remains after the removal of spikes in bursts. (A) The grand-average cross correlogram (CCG) was calculated using spike trains without latter spikes in bursts. The gray line represents the mean grand average CCG between reference cells and randomly selected cells from different sessions. (B) Pairwise comparison of the event rates of population synchrony between spike trains containing all spikes and spike trains without latter spikes in bursts. Bar heights indicate group means (n=10 segments, p=0.036, Wilcoxon signed-rank test). (C) Histogram of the ensemble sizes as percentages of cells participating in the synchronous ensembles.

      Beyond those methodological issues, there are two main interpretational problems: (1) the "synchronous ensembles" may be completely consistent with phase locking to the intracellular theta (as even shown by the authors themselves in some of the supplementary figures).

      We agree with the reviewer that the synchronous ensembles are indeed consistent with theta phase locking. However, it is important to note that theta phase locking alone does not necessarily imply population synchrony. In fact, theta phase locking has been shown to “reduce” population synchrony in a previous study (Mizuseki et al., 2014, Phil. Trans. R. Soc. B.). Thus, the presence of theta phase locking cannot be taken as a simple alternative explanation of the synchronous ensembles.

      To directly assess the contribution of theta phase locking to synchronous ensembles, we have performed a new analysis to randomize the specific theta cycles in which neurons spike, while keeping the spike phases constant. This manipulation disrupts spike co-occurrence while preserving theta phase locking, allowing us to test whether theta phase locking alone can explain the population synchrony, or whether spike co-occurrence in specific cycles is required. The grand-average CCG shows a much smaller peak compared to the original peak (Author response image 4A). Moreover, synchronous event rates show a 4.5-fold decrease in the randomized data compared to the original event rates (Author response image 4B). Thus, the new analysis reveals theta phase locking alone cannot account for the population synchrony.

      Author response image 4.

      Drastic reduction of population synchrony by randomizing spikes to other theta cycles while preserving the phases. (A) The grand-average cross correlogram (CCG) was calculated using original spike trains (black) and randomized spike trains where theta phases of the spikes are kept the same but spike timings were randomly moved to other theta cycles (red). (B) Pairwise comparison of the event rates of population synchrony between the original spike trains and randomized spike trains (n=10 segments, p=0.002, Wilcoxon signed-rank test). Bar heights indicate group means. ** p<0.01

      (2) The definition of "synchrony" in the present work is very loose and refers to timescales of 20-30 ms. In previous literature that relates to synchrony of point processes, the timescales discussed are 1-2 ms, and longer timescales are referred to as the "baseline" which is actually removed (using smoothing, jittering, etc.).

      Regarding the timescale of synchronous ensembles, we acknowledge that it varies considerably across studies and cell types. However, it is important to note that a timescale of dozens, or even hundreds of milliseconds is common for synchrony terminology in CA1 pyramidal neurons (see Csicsvari et al., Neuron, 2000; Harris et al., Science, 2003; Malvache et al., Science, 2016; Yagi et al., Cell Reports, 2023). In fact, a timescale of 20-30 ms is considered particularly important for information transmission and storage in CA1, as it matches the membrane time constant of pyramidal neurons, the period of hippocampal gamma oscillations, and the time window for synaptic plasticity. Therefore, we believe that this timescale is relevant and in line with established practices in the field.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Connelly and colleagues provide convincing genetic evidence that importation from mainland Tanzania is a major source of Plasmodium falciparum lineages currently circulating in Zanzibar. This study also reveals ongoing local malaria transmission and occasional near-clonal outbreaks in Zanzibar. Overall, this research highlights the role of human movements in maintaining residual malaria transmission in an area targeted for intensive control interventions over the past decades and provides valuable information for epidemiologists and public health professionals.

      Reviewer #1 (Public Review):

      Zanzibar archipelago is close to achieving malaria elimination, but despite the implementation of effective control measures, there is still a low-level seasonal malaria transmission. This could be due to the frequent importation of malaria from mainland Tanzania and Kenya, reservoirs of asymptomatic infections, and competent vectors. To investigate population structure and gene flow of P. falciparum in Zanzibar and mainland Tanzania, they used 178 samples from mainland Tanzania and 213 from Zanzibar that were previously sequenced using molecular inversion probes (MIPs) panels targeting single nucleotide polymorphisms (SNPs). They performed Principal Component Analysis (PCA) and identity by descent (IBD) analysis to assess genetic relatedness between isolates. Parasites from coastal mainland Tanzania contribute to the genetic diversity in the parasite population in Zanzibar. Despite this, there is a pattern of isolation by distance and microstructure within the archipelago, and evidence of local sharing of highly related strains sustaining malaria transmission in Zanzibar that are important targets for interventions such as mass drug administration and vector control, in addition to measures against imported malaria.

      Strengths:

      This study presents important samples to understand population structure and gene flow between mainland Tanzania and Zanzibar, especially from the rural Bagamoyo District, where malaria transmission persists and there is a major port of entry to Zanzibar. In addition, this study includes a larger set of SNPs, providing more robustness for analyses such as PCA and IBD. Therefore, the conclusions of this paper are well supported by data.

      Weaknesses:

      Some points need to be clarified:

      (1) SNPs in linkage disequilibrium (LD) can introduce bias in PCA and IBD analysis. Were SNPs in LD filtered out prior to these analyses?

      Thank you for this point. We did not filter SNPs in LD prior to this analysis. In the PCA analysis in Figure 1, we did restrict to a single isolate among those that were clonal (high IBD values) to prevent bias in the PCA. In general, disequilibrium is minimal only over small distances <5-10kb without selective forces at play. This is much less than the average spacing of the markers in the panel. If there is minimal LD, the conclusions drawn on relative levels and connections at high IBD are unlikely to be confounded by any effects of disequilibrium.

      ( 2) Many IBD algorithms do not handle polyclonal infections well, despite an increasing number of algorithms that are able to handle polyclonal infections and multiallelic SNPs. How polyclonal samples were handled for IBD analysis?

      Thank you for this point. We added lines 157-161 to clarify. This section now reads:

      “To investigate genetic relatedness of parasites across regions, identity by descent (IBD) estimates were assessed using the within sample major alleles (coercing samples to monoclonal by calling the dominant allele at each locus) and estimated utilizing a maximum likelihood approach using the inbreeding_mle function from the MIPanalyzer package (Verity et al., 2020). This approach has previously been validated as a conservative estimate of IBD (Verity et al., 2020).”

      Please see the supplement in (Verity et al., 2020) for an extensive simulation study that validates this approach.

      Reviewer #1 (Recommendations For The Authors):

      (3) I think Supplementary Figures 8 and 9 are more visually informative than Figure 2.

      Thank you for your response. We performed the analysis in Figure 2 to show how IBD varies between different regions and is higher within a region than between.

      Reviewer #2 (Public Review):

      This manuscript describes P. falciparum population structure in Zanzibar and mainland Tanzania. 282 samples were typed using molecular inversion probes. The manuscript is overall well-written and shows a clear population structure. It follows a similar manuscript published earlier this year, which typed a similar number of samples collected mostly in the same sites around the same time. The current manuscript extends this work by including a large number of samples from coastal Tanzania, and by including clinical samples, allowing for a comparison with asymptomatic samples.

      The two studies made overall very similar findings, including strong small-scale population structure, related infections on Zanzibar and the mainland, near-clonal expansion on Pemba, and frequency of markers of drug resistance. Despite these similarities, the previous study is mentioned a single time in the discussion (in contrast, the previous research from the authors of the current study is more thoroughly discussed). The authors missed an opportunity here to highlight the similar findings of the two studies.

      Thank you for your insights. We appreciated the level of detail of your review and it strengthened our work. We have input additional sentences on lines 292-295, which now reads:

      “A recent study investigating population structure in Zanzibar also found local population microstructure in Pemba (Holzschuh et al., 2023). Further, both studies found near-clonal parasites within the same district, Micheweni, and found population microstructure over Zanzibar.”

      Strengths:

      The overall results show a clear pattern of population structure. The finding of highly related infections detected in close proximity shows local transmission and can possibly be leveraged for targeted control.

      Weaknesses:

      A number of points need clarification:

      (1) It is overall quite challenging to keep track of the number of samples analyzed. I believe the number of samples used to study population structure was 282 (line 141), thus this number should be included in the abstract rather than 391. It is unclear where the number 232 on line 205 comes from, I failed to deduct this number from supplementary table 1.

      Thank you for this point. We have included 282 instead of 391 in the abstract. We added a statement in the results at lines 203-205 to clarify this point, which now reads:

      “PCA analysis of 232 coastal Tanzanian and Zanzibari isolates, after pruning 51 samples with an IBD of greater than 0.9 to one representative sample, demonstrates little population differentiation (Figure 1A).”

      (2) Also, Table 1 and Supplementary Table 1 should be swapped. It is more important for the reader to know the number of samples included in the analysis (as given in Supplementary Table 1) than the number collected. Possibly, the two tables could be combined in a clever way.

      Thank you for this advice. Rather than switch to another table altogether, we appended two columns to the original table to better portray the information (see Table 1).

      Methods

      (3) The authors took the somewhat unusual decision to apply K-means clustering to GPS coordinates to determine how to combine their data into a cluster. There is an obvious cluster on Pemba islands and three clusters on Unguja. Based on the map, I assume that one of these three clusters is mostly urban, while the other two are more rural. It would be helpful to have a bit more information about that in the methods. See also comments on maps in Figures 1 and 2 below.

      Cluster 3 is a mix of rural/urban while the clusters 2, 4 and 5 are mostly rural. This analysis was performed to see how IBD changes in relation to local context within different regions in Zanzibar, showing that there is higher IBD within locale than between locale.

      (4) Following this point, in Supplemental Figure 5 I fail to see an inflection point at K=4. If there is one, it will be so weak that it is hardly informative. I think selecting 4 clusters in Zanzibar is fine, but the justification based on this figure is unclear.

      The K-means clustering experiment was used to cluster a continuous space of geographic coordinates in order to compare genetic relatedness in different regions. We selected this inflection point based on the elbow plot and based the number to obtain sufficient subsections of Zanzibar to compare genetic relatedness. This point is added to the methods at lines 174-178, which now reads:

      “The K-means clustering experiment was used to cluster a continuous space of geographic coordinates in order to compare genetic relatedness in different regions. We selected K = 4 as the inflection point based on the elbow plot (Supplemental Figure 5) and based the number to obtain sufficient subsections of Zanzibar to compare genetic relatedness.”

      (5) For the drug resistance loci, it is stated that "we further removed SNPs with less than 0.005 population frequency." Was the denominator for this analysis the entire population, or were Zanzibar and mainland samples assessed separately? If the latter, as for all markers <200 samples were typed per site, there could not be a meaningful way of applying this threshold. Given data were available for 200-300 samples for each marker, does this simply mean that each SNP needed to be present twice?

      Population frequency is calculated based on the average within sample allele frequency of each individual in the population, which is an unbiased estimator. Within sample allele frequency can range from 0 to 1. Thus, if only one sample has an allele and it is at 0.1 within sample frequency, the population allele frequency would be 0.1/100 = 0.001. This allele is removed even though this would have resulted in a prevalence of 0.01. This filtering is prior to any final summary frequency or prevalence calculations (see MIP variant Calling and Filtering section in the methods). This protects against errors occurring only at low frequency.

      Discussion:

      (6) I was a bit surprised to read the following statement, given Zanzibar is one of the few places that has an effective reactive case detection program in place: "Thus, directly targeting local malaria transmission, including the asymptomatic reservoir which contributes to sustained transmission (Barry et al., 2021; Sumner et al., 2021), may be an important focus for ultimately achieving malaria control in the archipelago (Björkman & Morris, 2020)." I think the current RACD program should be mentioned and referenced. A number of studies have investigated this program.

      Thank you for this point. We have added additional context and clarification on lines 275-280, which now reads:

      “Thus, directly targeting local malaria transmission, including the asymptomatic reservoir which contributes to sustained transmission (Barry et al., 2021; Sumner et al., 2021), may be an important focus for ultimately achieving malaria control in the archipelago (Björkman & Morris, 2020). Currently, a reactive case detection program within index case households is being implemented, but local transmission continues and further investigation into how best to control this is warranted (Mkali et al. 2023).”

      (7) The discussion states that "In Zanzibar, we see this both within and between shehias, suggesting that parasite gene flow occurs over both short and long distances." I think the term 'long distances' should be better defined. Figure 4 shows that highly related infections rarely span beyond 20-30 km. In many epidemiological studies, this would still be considered short distances.

      Thank you for this point. We have edited the text at lines 287-288 to indicate that highly related parasites mainly occur at the range of 20-30km, which now reads:

      “In Zanzibar, highly related parasites mainly occur at the range of 20-30km.”

      (8) Lines 330-331: "Polymorphisms associated with artemisinin resistance did not appear in this population." Do you refer to background mutations here? Otherwise, the sentence seems to repeat lines 324. Please clarify.

      We are referring to the list of Pfk13 polymorphisms stated in the Methods from lines 146-148. We added clarifying text on lines 326-329:

      “Although polymorphisms associated with artemisinin resistance did not appear in this population, continued surveillance is warranted given emergence of these mutations in East Africa and reports of rare resistance mutations on the coast consistent with spread of emerging Pfk13 mutations (Moser et al., 2021). “

      (9) Line 344: The opinion paper by Bousema et al. in 2012 was followed by a field trial in Kenya (Bousema et al, 2016) that found that targeting hotspots did NOT have an impact beyond the actual hotspot. This (and other) more recent finding needs to be considered when arguing for hotspot-targeted interventions in Zanzibar.

      We added a clarification on this point on lines 335-345, which now reads:

      “A recent study identified “hotspot” shehias, defined as areas with comparatively higher malaria transmission than other shehias, near the port of Zanzibar town and in northern Pemba (Bisanzio et al., 2023). These regions overlapped with shehias in this study with high levels of IBD, especially in northern Pemba (Figure 4). These areas of substructure represent parasites that differentiated in relative isolation and are thus important locales to target intervention to interrupt local transmission (Bousema et al., 2012). While a field cluster-randomized control trial in Kenya targeting these hotspots did not confer much reduction of malaria outside of the hotspot (Bousema et al. 2016), if areas are isolated pockets, which genetic differentiation can help determine, targeted interventions in these areas are likely needed, potentially through both mass drug administration and vector control (Morris et al., 2018; Okell et al., 2011). Such strategies and measures preventing imported malaria could accelerate progress towards zero malaria in Zanzibar.”

      Figures and Tables:

      (10) Table 2: Why not enter '0' if a mutation was not detected? 'ND' is somewhat confusing, as the prevalence is indeed 0%.

      Thank you for this point. We have put zero and also given CI to provide better detail.

      (11) Figure 1: Panel A is very hard to read. I don't think there is a meaningful way to display a 3D-panel in 2D. Two panels showing PC1 vs. PC2 and PC1 vs. PC3 would be better. I also believe the legend 'PC2' is placed in the wrong position (along the Y-axis of panel 2).

      Supplementary Figure 2B suffers from the same issue.

      Thank you for your comment. A revised Figure 1 and Supplemental Figure 2 are included, where there are separate plots for PC1 vs. PC2 and PC1 vs. PC3.

      (12) The maps for Figures 1 and 2 don't correspond. Assuming Kati represents cluster 4 in Figure 2, the name is put in the wrong position. If the grouping of shehias is different between the Figures, please add an explanation of why this is.

      Thank you for this point. The districts with at least 5 samples present are plotted in the map in Figure 1B. In Figure 2, a totally separate analysis was performed, where all shehias were clustered into separate groups with k-means and the IBD values were compared between these clusters. These maps are not supposed to match, as they are separate analyses. Figure 1B is at the district level and Figure 2 is clustering shehias throughout Zanzibar.

      The figure legend of Figure 1B on lines 410-414 now reads:

      “B) A Discriminant Analysis of Principal Components (DAPC) was performed utilizing isolates with unique pseudohaplotypes, pruning highly related isolates to a single representative infection. Districts were included with at least 5 isolates remaining to have sufficient samples for the DAPC. For plotting the inset map, the district coordinates (e.g. Mainland, Kati, etc.) are calculated from the averages of the shehia centroids within each district.”

      The figure legend of Figure 2 on lines 417-425 now reads:

      “Figure 2. Coastal Tanzania and Zanzibari parasites have more highly related pairs within their given region than between regions. K-means clustering of shehia coordinates was performed using geographic coordinates all shehias present from the sample population to generate 5 clusters (colored boxes). All shehias were included to assay pairwise IBD between differences throughout Zanzibar. Pairwise comparisons of within cluster IBD (column 1 of IBD distribution plots) and between cluster IBD (column 2-5 of IBD distribution plots) was done for all clusters. In general, within cluster IBD had more pairwise comparisons containing high IBD identity.”

      (13) Figure 2: In the main panel, please clarify what the lines indicate (median and quartiles?). It is very difficult to see anything except the outliers. I wonder whether another way of displaying these data would be clearer. Maybe a table with medians and confidence intervals would be better (or that data could be added to the plots). The current plots might be misleading as they are dominated by outliers.

      Thank you for this point and it greatly improved this figure. We changed the plotting mechanisms through using a beeswarm plot, which plots all pairwise IBD values within each comparison group.

      (14) In the insert, the cluster number should not only be given as a color code but also added to the map. The current version will be impossible to read for people with color vision impairment, and it is confusing for any reader as the numbers don't appear to follow any logic (e.g. north to south).

      Thank you very much for these considerations. We changed the color coding to a color blind friendly palette and renamed the clusters to more informative names; Pemba, Unguja North (Unguja_N), Unguja Central (Unguja_C), Unguja South (Unguja_S) and mainland Tanzania (Mainland).

      (15) The legend for Figure 3 is difficult to follow. I do not understand what the difference in binning was in panels A and B compared to C.

      Thank you for this point. We have edited the legend to reflect these changes. The legend for Figure 3 on lines 427-433 now reads:

      “Figure 3. Isolation by distance is shown between all Zanzibari parasites (A), only Unguja parasites (B) and only Pemba parasites (C). Samples were analyzed based on geographic location, Zanzibar (N=136) (A), Unguja (N=105) (B) or Pemba (N=31) (C) and greater circle (GC) distances between pairs of parasite isolates were calculated based on shehia centroid coordinates. These distances were binned at 4km increments out to 12 km. IBD beyond 12km is shown in Supplemental Figure 8. The maximum GC distance for all of Zanzibar was 135km, 58km on Unguja and 12km on Pemba. The mean IBD and 95% CI is plotted for each bin.”

      (16) Font sizes for panel C differ, and it is not aligned with the other panels.

      Thank you for pointing this out. Figure 3 and Supplemental Figure 10 are adjusted with matching formatting for each plot.

      (17) Why is Kusini included in Supplemental Figure 4, but not in Figure 1?

      In Supplemental Figure 4, all isolates were used in this analysis and isolates with unique pseudohaplotypes were not pruned to a single representative infection. That is why there are additional isolates in Kusini. The legend for Supplemental Figure 4 now reads:

      “Supplemental Figure 4. PCA with highly related samples shows population stratification radiating from coastal Mainland to Zanzibar. PCA of 282 total samples was performed using whole sample allele frequency (A) and DAPC was performed after retaining samples with unique pseudohaplotypes in districts that had 5 or more samples present (B). As opposed to Figure 1, all isolates were used in this analysis and isolates with unique pseudohaplotypes were not pruned to a single representative infection.”

      (18) Supplemental Figures 6 and 7: What does the width of the line indicate?

      The sentence below was added to the figure legends of Supplemental Figures 6 and 7 and the legends of each network plot were increased in size:

      “The width of each line represents higher magnitudes of IBD between pairs.”

      (19) What was the motivation not to put these lines on the map, as in Figure 4A? This might make it easier to interpret the data.

      Thank you for this comment. For Supplemental Figure 8 and 9, we did not put these lines that represent lower pairwise IBD to draw the reader's attention to the highly related pairs between and within shehias.

      Reviewer #2 (Recommendations For The Authors):

      (1) There is a rather long paragraph (lines 300-323) on COI of asymptomatic infections and their genetic structure. Given that the current study did not investigate most of the hypotheses raised there (e.g. immunity, expression of variant genes), and the overall limited number of asymptomatic samples typed, this part of the discussion feels long and often speculative.

      Thank you for your perspective. The key sections highlighted in this comment, regarding immunity and expression of variant genes, were shortened. This section on lines 300-303 now reads:

      “Asymptomatic parasitemia has been shown to be common in falciparum malaria around the globe and has been shown to have increasing importance in Zanzibar (Lindblade et al., 2013; Morris et al., 2015). What underlies the biology and prevalence of asymptomatic parasitemia in very low transmission settings where anti-parasite immunity is not expected to be prevalent remains unclear (Björkman & Morris, 2020).”

      (2) As a detail, line 304 mentions "few previous studies" but only one is cited. Are there studies that investigated this and found opposite results?

      Thank you for this comment. We added additional studies that did not find an association between clinical disease and COI. These changes are on lines 303-308, which now reads:

      “Similar to a few previous studies, we found that asymptomatic infections had a higher COI than symptomatic infections across both the coastal mainland and Zanzibar parasite populations (Collins et al., 2022; Kimenyi et al., 2022; Sarah-Matio et al., 2022). Other studies have found lower COI in severe vs. mild malaria cases (Robert et al., 1996) or no significant difference between COI based on clinical status (Earland et al. 2019; Lagnika et al. 2022; Conway et al. 1991; Kun et al. 1998; Tanabe et al. 2015)”

      (3) Table 2: Percentages need to be checked. To take one of several examples, for Pfk13-K189N a frequency of 0.019 for the mutant allele is given among 137 samples. 2/137 equals to 0.015, and 3/137 to 0.022. 0.019 cannot be achieved. The same is true for several other markers. Possibly, it can be explained by the presence of polyclonal infections. If so, it should be clarified what the total of clones sequenced was, and whether the prevalence is calculated with the number of samples or number of clones as the denominator.

      Thank you for this point. We mistakenly reported allele frequency instead of prevalence. An updated Table 2 is now in the manuscript. The method for calculating the prevalence is now at lines 148-151:

      “Prevalence was calculated separately in Zanzibar or mainland Tanzania for each polymorphism by the number of samples with alternative genotype calls for this polymorphism over the total number of samples genotyped and an exact 95% confidence interval was calculated using the Pearson-Klopper method for each prevalence.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents useful findings regarding the role of formin-like 2 in mouse oocyte meiosis. The submitted data are supported by incomplete analyses, and in some cases, the conclusions are overstated. If these concerns are addressed, this paper would be of interest to reproductive biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The presented study focuses on the role of formin-like 2 (FMNL2) in oocyte meiosis. The authors assessed FMNL2 expression and localization in different meiotic stages and subsequently, by using siRNA, investigated the role of FMNL2 in spindle migration, polar body extrusion, and distribution of mitochondria and endoplasmic reticulum (ER) in mouse oocytes.

      Strengths:

      Novelty in assessing the role of formin-like 2 in oocyte meiosis.

      Weaknesses:

      Methods are not properly described.

      Overstating presented data.

      It is not clear what statistical tests were used.

      My main concern is that there are missing important details of how particular experiments and analyses were done. The material and methods section are not written in the way that presented experiments could be repeated - it is missing basic information (e.g., used mouse strain, timepoints of oocytes harvest for particular experiments, used culture media, image acquisition parameters, etc.). Some of the presented data are overstated and incorrectly interpreted. It is not clear to me how the analysis of ER and mitochondria distribution was done, which is an important part of the presented data interpretation. I'm also missing important information about the timing of particular stages of assessed oocytes because the localization of both ER and mitochondria differs at different stages of oocyte meiosis. The data interpretation needs to be justified by proper analysis based on valid parameters, as there is considerable variability in the ER and mitochondria structure and localization across oocytes based on their overall quality and stage.

      Thank you for your comment. We regret the oversight of omitting critical information in the manuscript. In the revised manuscript, we have included essential details such as mouse strains, culture media, stages of oocyte and statistical methods in the materials and methods section. Please find our details responses in the “Recommendations for the authors” part.

      Reviewer #2 (Public Review):

      Summary:

      This research involves conducting experiments to determine the role of Fmnl2 during oocyte meiosis I.

      Strengths:

      Identifying the role of Fmnl2 during oocyte meiosis I is significant.

      Weaknesses:

      The quantitative analysis and the used approach to perturb FMNL2 function are currently incomplete and would benefit from more confirmatory approaches and rigorous analysis.

      (1) Most of the results are expected. The new finding here is that FMNL2 regulates cytoplasmic F-actin in mouse oocytes, which is also expected given the role of FMNL2 in other cell types. Given that FMNL2 regulates cytoplasmic F-actin, it is very expected to see all the observed phenotypes. It is already established that F-actin is required for spindle migration to the oocyte cortex, extruding a small polar body and normal organelle distribution and functions.

      Thank you for your comment. In the recent decade, Arp2/3 complex (Nat Cell Biol 2011), Formin2 (Nat Cell Biol 2002, Nat Commun 2020), and Spire (Curr Biol 2011) were reported to be 3 key factors to involve into this process. These factors regulate actin filaments in different ways. However, how they cross with each other for the subcellular events were still fully clear. Our current study identified that FMNL2 played a critical role in coordinating these molecules for actin assembly in oocytes. Our findings demonstrate that FMNL2 interacts with both the Arp2/3 complex and Formin2 to facilitate actin-based meiotic spindle migration. Additionally, we discovered a novel role for FMNL2 in determining the distribution and function of the endoplasmic reticulum and mitochondria, which may in turn influence meiotic spindle migration in oocytes. Our results not only uncover the novel functions of FMNL2-mediated actin for organelle distribution, but also extend our understanding of the molecular basis for the unique meiotic spindle migration in oocyte meiosis.

      (2) The authors used Fmnl2 cRNA to rescue the effect of siRNA-mediated knockdown of Fmnl2. It is not clear how this works. It is expected that the siRNA will also target the exogenous cRNA construct (which should have the same sequence as endogenous Fmnl2) especially when both of them were injected at the same time. Is this construct mutated to be resistant to the siRNA?

      Thank you for your question. We regret any misunderstanding that may have been caused by the inappropriate description in our manuscript. In the rescue experiments, we initially injected FMNL2 siRNA into oocytes, followed by the microinjection of FMNL2 mRNA 18-20 hours later. After conducting our previous experiments, we have verified through Western blotting that endogenous FMNL2 is effectively suppressed 18-20 hours following the microinjection of FMNL2 siRNA. Additionally, we observed a significant increase in exogenous FMNL2 protein expression 2 hours after the injection of FMNL2 mRNA. We believe that the exogenous FMNL2 could compensate the decrease by FMNL2 knockdown, and this approach was adopted in many oocyte studies.

      (3) The authors used only one approach to knockdown FMNL2 which is by siRNA. Using an additional approach to inhibit FMNL2 would be beneficial to confirm that the effect of siRNA-mediated knockdown of FMNL2 is specific.

      Thank you for your question. Yes, the specificity is always the concern for siRNA or morpholino microinjection due to the off-target issue. Due to the limitation we could not generate the knock out model, and there are no known inhibitors with specific targeting capabilities for FMNL2. To solve this, we performed the rescue study with exogenous mRNA to confirm the effective knock down of FMNL2. These measures provide reassurance regarding the credibility of the experimental outcomes, and this is also the general way to avoid the off-target of siRNA or morpholino.

      Reviewer #3 (Public Review):

      Summary:

      The authors focus on the role of formin-like protein 2 in the mouse oocyte, which could play an important role in actin filament dynamics. The cytoskeleton is known to influence a number of cellular processes from transcription to cytokinesis. The results show that downregulation of FMNL2 affects spindle migration with resulting abnormalities in cytokinesis in oocyte meiosis I.

      Weaknesses:

      The overall description of methods and figures is overall dismissively poor. The description of the sample types and number of replicate experiments is impossible to interpret throughout, and the quantitative analysis methods are not adequately described. The number of data points presented is unconvincing and unlikely to support the conclusions. On the basis of the data presented, the conclusions appear to be preliminary, overstated, and therefore unconvincing.

      Thank you for your comment. We regret the oversight of omitting critical information in the manuscript. In the revised manuscript, we have incorporated your suggestions for modification, particularly regarding the Materials and Methods section. Please see the detailed revision and responses in the “Recommendations for the authors” part.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      My main concern is that there are missing important details of how particular experiments and analyses were done. The material and methods section is not written in the way that presented experiments could be repeated - it is missing basic information (e.g., used mouse strain, timepoints of oocytes harvest for particular experiments, used culture media, image acquisition parameters, etc.). Some of the presented data are overstated and incorrectly interpreted. It is not clear to me how the analysis of ER and mitochondria distribution was done, which is an important part of the presented data interpretation. I'm also missing important information about the timing of particular stages of assessed oocytes because the localization of both ER and mitochondria differs at different stages of oocyte meiosis. The data interpretation needs to be justified by proper analysis based on valid parameters, as there is considerable variability in the ER and mitochondria structure and localization across oocytes based on their overall quality and stage. My specific comments are listed below.

      (1) Information about statistical tests that were used needs to be provided for all quantification experiments.

      Thank you for your suggestion. Based on your suggestions, we revised the statistical analysis description in the Materials and Methods section. Additionally, we also included a description of the statistical methods in the legends of the relevant result figures.

      (2) I recommend replacing the plunger plots, used in most quantification data, with alternatives allowing evaluation of the distribution of the data (dot plots, box plots, whisker plots).

      Thank you for your suggestion. Following your suggestion, we replaced the plunger plots in Fig 2C, D, H, I and Fig3 B, C with dot plots.

      (3) Can the authors provide information about particular time points when were individual oocyte stages (GVBD, meiosis I, and meiosis II) harvested/used for immunofluorescence protein detection, western blotting, microinjection, and ER and mitochondria staining? Were the time points always the same in all presented experiments and experimental vs control group? If not, this needs to be clarified.

      Thank you for your suggestion. We used oocytes in the metaphase I (MI) stage for the statistical analysis of spindle migration, actin filament aggregation, endoplasmic reticulum localization, and mitochondrial localization. In the Western blot analysis, GV stage oocytes were utilized to evaluate the efficiency of knockdown and rescue experiments. The protein expression levels of Arp2, Formin2, INF2, Cofilin, Grp78, and Chop in different treatment groups were detected using MI-stage oocytes. In the revised version, we provided all the detailed information about the stages.

      (4) Figure 1B: Can the authors comment on why there is a missing representative image of MII oocyte FMBL2-Ab? I recommend including this in the figure to have a complete view of comparing overexpressed and endogenous FMNL2 localization in oocyte meiosis.

      Thank you for your suggestion. In the revised manuscript, we added immunostaining images of FMNL2 antibody in MII stage oocytes.

      (5) Figure 1C: The figure legend says, "FMNL2 and actin overlapped in cortex and spindle surrounding". In MI oocytes, there is usually no accumulated actin signal around the spindle, which is also true in the presented images, so there cannot be overlapping with the FMNL2 signal. The interpretation should be changed.

      We apologize for this inappropriate description that was used, and we deleted this sentence.

      (6) Figure 2B: What were the parameters of the "large" and "normal" polar bodies for performing the analysis?

      Thank you for your question. In order to assess the size of the polar body, we conducted a comparison between the diameter of the polar body and that of the oocyte. If the diameter of the polar body was found to be less than 1/3 of the oocyte's diameter, we categorized it as normal-sized polar body. Conversely, if the polar body's diameter exceeded 1/3 of the oocyte's diameter, we categorized it as a large polar body. We have included these details in the Results section of the manuscript.

      (7) Figure 2F: Can the authors comment on what can be the second band in the rescue group?

      Thank you for your question. In the rescue experiment, we microinjected exogenous FMNL2-EGFP mRNA into the oocytes. As a result, compared to endogenous FMNL2, the protein size increased due to the addition of the EGFP tag, approximately 27 kDa. Hence, in the Western blot bands of the rescue group, the upper band represents the expression of exogenous FMNL2-EGFP, while the lower band corresponds to the expression of endogenous FMNL2. We have provided annotations in the revised Figure 2F to clarify this.

      (8) Can the authors comment on the variability of PBE between 2C and 2H in the FMNL2-KD groups? In panel C, the PBE in the KD group was 59.5 {plus minus} 2.82%; in panel H, the PBE in the KD group was 48.34 {plus minus} 4.2%, and in the rescue group, the PBE was 62.62 {plus minus} 3.6%. The rescue group has a similar PBE rate as the KD group in panel C. How consistent was the FMNL2 knockdown across individual replicates? Can the authors provide more details on how the rescue experiment was performed?

      Thank you for your question. We believe that the difference in PBE observed in Figure 2C and 2H of the FMNL2-KD group was due to the microinjection times and the duration of in vitro arrest. The results shown in Figure 2C depict the outcome of a single injection of FMNL2 siRNA into GV stage oocytes, followed by 18 hours of in vitro arrest; the results shown in Figure 2H contain a subsequent additional injection of FMNL2-EGFP mRNA with another 2 hours of arrest. The two rounds of microinjection and the extended period of in vitro arrest both affect oocyte maturation rates.

      (9). Figure 2J and K: What groups were compared together? The used statistic needs to be properly described.

      Thank you for your question. The FMNL2-KD, FMNL3-KD, and FMNL2+3-KD groups were all compared to the Control group, therefore, t-test was used for analysis. We have provided explanations in the revised manuscript.

      (10) Figure 4B and C: Can the authors provide representative images without oversaturated actine signal?

      Thank you for your question. For the analysis of oocyte F-actin, the F-actin are divided into cortex actin and cytoplasmic actin. Due to the contrast during imaging, the strong cortex actin signals affected the detection of cytoplasmic actin, therefore, it is necessary to increase the scanning index, which will cause the overexpose the cortex actin signal. This is for the better observation of the cytoplasmic signals.

      (11) Figure 4G + 5H: Can the authors comment on why they used as a housekeeping gene actin instead of tubulin, which was used in the rest of the WB experiments?

      Thank you for your question. In most of the western blot experiments conducted in this study, we used tubulin as a housekeeping gene. However, due to the supply of antibodies by delivery period, we had GAPDH and actin as well for some experiments. These housekeeping genes were all valid for the study.

      (12) Based on what parameters was ER considered normally or abnormally distributed, and what stages of oocytes were assessed?

      Thank you for your question. In this study, we employed oocytes at the MI stage for the analysis of ER localization. In the MI stage, the ER localized around the spindle, which is regarded as the typical localization pattern. The ER displayed a dispersed distribution throughout the cytoplasm or clustered were categorized as aberrant positioning. We included relevant descriptions in the revised version of the manuscript.

      (13) Figure 5H: As a housekeeping gene was used actin - the quantification is labeled as a Grp78 to tubulin ratio.

      Thank you for pointing out the error. This is a label mistake and we corrected it.

      (14) Information about how JC-1 staining was done needs to be provided.

      Thank you for your carefully reading. We included a description of JC1 staining in the Materials and Methods section.

      (15). Line 231-232: "As shown in Figure 4A" - the text doesn't correspond to the figure.

      Thank you for pointing out the error. We revised this mistake in the revised manuscript by correcting "Fig3A" to "Fig4A."

      (16) Line 265: there is probably a missing word "Formin2".

      Thank you and we corrected the error and made the necessary changes in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      (1) Quantification and analysis:

      • Fig. 3B: The rate of spindle migration should be quantified based on the distance from the spindle to the cortex. Also, the orientation of the spindle (Z-position) needs to be taken into consideration.

      • Fig. 5C, D: It is unclear how the rate of ER distribution was calculated.

      • Western blot: In many experiments (such as Fig. 5H), the bands are saturated which will prevent accurate intensity measurements and quantifications.

      For spindle migration, we specifically focused on spindles exhibiting a distinctive spindle-like shape with clear bipolarity to eliminate any statistical discrepancies potentially caused by variations in Z-axis alignment. Our criterion for determining successful migration was based on the contact between the spindle pole and the cortical region of the oocyte. Therefore, we think that the rate is better to reflect the phenotype than the distance.

      For the examination of ER localization, Reviewer 1 also raised this issue. We utilized oocytes at the MI stage in this study. The ER localized around the spindle in MI stage. The ER displayed a dispersed distribution throughout the cytoplasm or clustered were categorized as aberrant positioning. We included relevant descriptions in the revised version of the manuscript.

      For the bands of the western blot results, during the experimental procedure we typically capture multiple images at different exposure levels (3-5 images). In the revised manuscript, we have replaced the inappropriate images with more suitable ones.

      (2) Given that all Immunoprecipitation experiments in this manuscript were performed on the whole ovary which contains more somatic cells than oocytes, the results do not necessarily reflect meiotic oocytes. Please consider this possibility during the interpretation.

      Thank you for your suggestion. Yes, we agree with you. In the revised manuscript, we made appropriate modifications to the relevant descriptions.

      (3) 351-365: The conclusion that Arp2/3 compensates for the decreased formin 2 in FMNL2 knockdown oocytes is a bit unconvincing. 1- In mouse oocytes, it is already known that Arp2/3 and formin 2 regulate different pools of F-actin nucleation. 2- The authors found an increase in Arp2/3 in FMNL2 knockdown oocytes compared to control oocytes without any change in cortical F-actin. Given that Arp2/3 is primarily promoting cortical F-actin, it is expected to see an increase in cortical F-actin in FMNL2 knockdown oocytes, which was not the case.

      Thank you for your question. Yes, previous studies showed that formin2 localizes to the cytoplasm of oocytes and accumulates around the spindle, which facilitate cytoplasmic actin assembly. While Arp2/3 is primarily responsible for actin assembly at the cortex region of oocytes. In invasive cells, FMNL2 is mainly localized in the leading edge of the cell, lamellipodia and filopodia tips, to improve cell migration ability by actin-based manner (Curr Biol 2012). We showed that FMNL2 localized both at spindle periphery and cortex, but depletion of FMNL2 did not affect cortex actin intensity. We think that FMNL2 and Arp2/3 both contribute to the cortex actin dynamics, when FMNL2 decreased, ARP2 increased to compensate for this, which maintained the cortex actin level. In the revised manuscript, we have made modifications to avoid excessive extrapolation from our results, ensuring that our conclusions are presented in a more objective manner.

      (4) Lines 195-197: The spindle is initially formed soon after the GVBD, so there is no spindle during GVBD. Also, I can't see oocytes at anaphase I or telophase I in this figure. Please revise.

      Thank you for your suggestion. We apologize for the inappropriate descriptions that were used. In the revised manuscript, we have made modifications to the respective descriptions in the Results part.

      (5) Fig. 2E: It seems that the control oocyte is abnormal with mild cytokinesis defects. Please replace or delete it since this information is already included in Fig. 3A.

      Thank you for your suggestion. Based on our observations, during the extrusion of the first polar body in oocytes, there is a temporary occurrence of cellular morphological fragmentation due to cortical reorganization (11h in control oocyte from Fig 2E). However, after the extrusion of the first polar body, the oocyte morphology returns to normal. Figure 2E illustrates the meiotic division process of oocytes, while Figure 3A primarily focuses on the process of oocyte spindle migration. We think that it is better to retain both to present our results.

      Reviewer #3 (Recommendations for The Authors):

      In the case of the observed phenotype, the stage of GV is important. The phenotypes presented also occur in meiotic or developmentally incompetent oocytes. In addition, the images of GV oocytes appear as NSN, which also show the KD phenotype in Figs. 2 and 3.

      Thank you for your concern. As the oocyte grows, the proportion of SN-type oocytes gradually increases. When the oocyte diameter reaches 70-80 μm, the proportion of SN oocytes is approximately 52.7% (Mol Reprod Dev. 1995). In our study, both the control and knockdown groups collected oocytes with a diameter of around 80 μm, which is considered as fully-grown oocytes, predominantly in the SN phase. Since the collection period and size of the oocytes were consistent, we can sure that the observed differences between the control and knockdown groups in phenotype analysis could be solid and reliable.

      MII is absent in Fig. 1B.

      In the revised manuscript, we added immunostaining images of FMNL2 in MII stage oocytes.

      The result of KD is not convincing. Also, discuss whether the heterozygous effect of Fmnl2 deletion affects reproductive fitness.

      Thank you for your concern. In our investigation, limited to the setup of knock out model, we employed siRNA to knockdown FMNL2 expression, to avoid the risk of off-target, we performed rescue experiment with exogenous mRNA, which we believe that it could solve this issue. When designing siRNA sequences, we ensured their specificity for binding to FMNL2 mRNA only, and we assessed the levels of FMNL2 and FMNL3 mRNA in oocytes after injection of FMNL2 siRNA. The results showed that, compared to the control group, the expression of FMNL2 mRNA decreased by approximately 70% after 18 hours of FMNL2 siRNA injection, while the level of FMNL3 mRNA was not decreased.

      Fig. 2F rescue experiment with double bands. What bands are seen here? Did the authors inject tagged or untagged FMNL2? Or does endogenous FMNL2 appear higher in the sample after KD?

      Thank you for your question. In the rescue experiment, we microinjected exogenous FMNL2-EGFP mRNA into the oocytes. As a result, compared to endogenous FMNL2, the protein size increased due to the addition of the EGFP tag, approximately 27 kDa. Hence, in the Western blot bands of the rescue group, the upper band represents the expression of exogenous FMNL2-EGFP, while the lower band corresponds to the expression of endogenous FMNL2. We provided annotations in the revised Figure 2F to clarify this.

      Variability in mitochondria and ER distribution patterns is also known in healthy and developing oocytes, although the authors described only a single phenotype.

      Thank you for your concern. Yes, mitochondria and ER show dynamic localization in different stage of oocyte maturation. However, in this study we employed oocyte MI stage for the analysis of ER and mitochondria localization, and in MI stage, both the ER and mitochondria localize around the spindle. This pattern is considered as the normal localization. Several studies showed that dispersed or clustered localization contributed to maturation defects. We included relevant descriptions in the revised manuscript.

      What exactly is meant by input in the IP experiments? Why is the target missing in the input sample?

      Thank you for your question. We subjected the input samples to electrophoresis on a single channel, all the analyzed proteins demonstrated normal expression, thereby confirming the viability of the input sample. However, upon simultaneous exposure with the IP samples, we observed a lack of clear signal for certain proteins in the input group. This phenomenon is due to the excessive signal intensity resulting from protein enrichment in the IP group, which caused the low exposure of proteins in input group.

      Explain the rationale for using, actin or tubulin as loading or normalization controls in the study focusing on the cytoskeleton.

      Thank you for your question. Actin and tubulin are both widely used as the control due to their stable expression. For actin, there are α-actin and β-actin isoforms. Formins and Arp2/3 complex regulate the polymerization of α-actin and β-actin to form F-actin, not isoform expression. In our study F-actin (the functional type) was examined. While α-tubulin and β-tubulin are two subtypes of tubulin, and they interact with each other to form stable α/β-tubulin heterodimers. The changes of cytoskeleton dynamics could not change the expression of α/β-tubulin. Therefore, β-actin and α-tubulin could be used as normalization controls.

      Fig. 6E shows only , but the legend says *.

      Thank you for pointing out the error. We correct the mistake in the revised manuscript.

      Spindle positioning appears to differ between control and KD. Does this affect the quantification of Fig. 6F? Adequate nomenclature should be used here.

      Thank you for your question. Yes, spindle positioning was affected by FMNL2 depletion. However, central spindle or cortex spindle all belong to MI stage, and JC1 is not related with the stage difference. To avoid misunderstanding we replaced the representative images and corresponding description in Figure 6F.

      The description of the methods and legends should be significantly improved.

      Thank you for your suggestion. Reviewer 1 and 2 also raised the similar concern. We enriched the description of methods and legends in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors collected genomic information from public sources covering 423 eukaryote genomes and around 650 prokaryote genomes. Based on pre-computed CDS annotation, they estimated the frequency of alternative splicing (AS) as a single average measure for each genome and computed correlations with this measure and other genomic properties such as genome size, percentage of coding DNA, gene and intergenic span, etc. They conclude that AS frequency increases with genome complexity in a somewhat directional trend from "lower" organisms to "higher" organisms.

      Strengths:

      The study covers a wide range of taxonomic groups, both in prokaryotes and eukaryotes.

      Weaknesses:

      The study is weak both methodologically and conceptually. Current high throughput sequencing technologies, coupled with highly heterogeneous annotation methods, can observe cases of AS with great sensitivity, and one should be extremely cautious of the biases and rates of false positives associated with these methods. These issues are not addressed in the manuscript.

      We are aware of the bias that may exist in annotation files. Since the source of noise can be highly variable, we have assumed that most of the data has a similar bias. However, we agree with the reviewer that we could perform some analysis to test for these biases and their association to different methodologies. Thus, we will measure the uncertainty present in the data. From one side, we will be more explicit about the data limitations and the biases it can generate in the results. On the other side, while analyzing the false positives in the data is out of our scope, we will perform a statistical test to detect possible biases regarding different methods of sequencing and annotation, and types of organisms (model or non-model organisms). If positive, we will proceed, as far as possible, to normalize the data or to estimate a confidence interval.

      Here, AS measures seem to be derived directly from CDS annotations downloaded from public databases, and do not account for differing annotation methods or RNA sequencing depth and tissue sample diversity.

      Beyond taking into account the differential bias that may exist in the data, we do not consider that our AS measure is problematic. The NCBI database is one of the most reliable databases that we have to date and is continuously updated from all scientific community. So, the use of this data and the corresponding procedures for deriving the AS measure are perfectly acceptable for a comparative analysis on such a huge global scale. Furthermore, the proposal of a new genome-level measure of AS that allows to compare species spanning the whole tree of life is part of the novelty of the study. We understand that small-scale studies require a high specificity about the molecular processes involved in the study. However, this is not the case, where we are dealing with a large-scale problem. On the other side, as we have previously mention, we agree with the reviewer to analyze the degree of uncertainty in the data to better interpret the results.

      There is no mention of the possibility that AS could be largely caused by random splicing errors, a possibility that could very well fit with the manuscript's data. Instead, the authors adopt early on the view that AS is regulated and functional, generally citing outdated literature.

      There is no question that some AS events are functional, as evidenced by strongly supported studies. However, whether all AS events are functional is questionable, and the relative fractions of functional and non-functional AS are unknown. With this in mind, the authors should be more cautious in interpreting their data.

      Many studies suggest that most of the AS events observed are the result of splicing errors and are therefore neither functional nor conserved. However, we still have limited knowledge about the functionality of AS. Just because we don’t have a complete understanding of its functionality, doesn’t mean there isn’t a fundamental cause behind these events. AS is a highly dynamic process that can be associated with processes of a stochastic nature that are fundamental for phenotypic diversity and innovation. This is one of the reasons why we do not get into a discussion about the functionality of AS and consider it as a potential measure of biological innovation. Nevertheless, we agree with the reviewer’s comments, so we will add a discussion about this issue with updated literature and look at any possible misinterpretation of the results.

      The "complexity" of organisms also correlates well (negatively) with effective population size. The power of selection to eliminate (slightly) deleterious mutations or errors decreases with effective population size. The correlation observed by the authors could thus easily be explained by a non-adaptive interpretation based on simple population genetics principles.

      We appreciate the observation of the reviewer. We know well the M. Lynch’s theory on the role of the effective population size and its eventual correlation with genomic parameters, but we want to emphasize that our objective is not to find an adaptive or non-adaptive explanation of the evolution of AS, but rather to reveal it. Nevertheless, as the reviewer suggests, we will look at the correlation between the AS and the effective population size and discuss about a possible non-adaptive interpretation.

      The manuscript contains evidence that the authors might benefit from adopting a more modern view of how evolution proceeds. Sentences such as "... suggests that only sophisticated organisms optimize alternative splicing by increasing..." (L113), or "especially in highly evolved groups such as mammals" (L130), or the repeated use of "higher" and "lower" organisms need revising.

      As the reviewer suggests, we will proceed with the corresponding linguistic corrections.

      Because of the lack of controls mentioned above, and because of the absence of discussion regarding an alternative non-adaptive interpretation, the analyses presented in the manuscript are of very limited use to other researchers in the field. In conclusion, the study does not present solid conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this contribution, the authors investigate the degree of alternative splicing across the evolutionary tree and identify a trend of increasing alternative splicing as you move from the base of the tree (here, only prokaryotes are considered) towards the tips of the tree. In particular, the authors investigate how the degree of alternative splicing (roughly speaking, the number of different proteins made from a single ORF (open reading frame) via alternative splicing) relates to three genomic variables: the genome size, the gene content (meaning the fraction of the genome composed of ORFs), and finally, the coding percentage of ORFs, meaning the ratio between exons and total DNA in the ORF. When correlating the degree of alternative splicing with these three variables, they find that the different taxonomic groups have a different correlation coefficient, and identify a "progressive pattern" among metazoan groups, namely that the correlation coefficient mostly increases when moving from flowering plants to arthropods, fish, birds, and finally mammals. They conclude that therefore the amount of splicing that is performed by an organismal group could be used as a measure of its complexity.

      Weaknesses:

      While I find the analysis of alternative splicing interesting, I also find that it is a very imperfect measure of organismal complexity and that the manuscript as a whole is filled with unsupported statements. First, I think it is clear to anyone studying evolution over the tree of life that it is the complexity of gene regulation that is at the origin of much of organismal structural and behavioral complexity. Arguably, creating different isoforms out of a single ORF is just one example of complex gene regulation. However, the complexity of gene regulation is barely mentioned by the authors.

      We disagree with the reviewer with that our measure of AS is imperfect. Just as we responded to the first reviewer, we will quantify the uncertainty in the data and correct for differential biases caused by annotation and sequencing methods. Thus, beyond correcting relevant biases in the data, we consider that our measure is adequate for a comparative analysis at a global scale. A novelty of our study is the proposal of a genome-level measure of AS that takes into account data from the entire scientific community.

      We want also to emphasize that we assume from the beginning that AS may reflect some kind of biological complexity, it is not a conclusion from the results. An argument in favor of such an assumption is that AS is associated with stochastic processes that are fundamental for phenotypic diversity and innovation. Of course, we agree with the reviewer that it is not the only mechanism behind biological complexity, so we will emphasize it in the manuscript. On the other side, we will be more explicit about the assumptions and objectives, and will correct any unsupported statement.

      Further, it is clear that none of their correlation coefficients actually show a simple trend (see Table 3). According to these coefficients, birds are more complex than mammals for 3 out of 4 measures.

      An evolutionary trend is broadly defined as the gradual change in some characteristic of organisms as they evolve or adapt to a specific environment. Under our context, we define an evolutionary trend as the gradual change in genome composition and its association with AS across the main taxonomic groups. If we look at Figure 4 and Table 3 we can conclude that there is a progressive trend. We will be more precise about how we define an evolutionary trend and correct any possible misinterpretation of the results. On the other side, we do not assume that mammals should be more complex than birds. First, we will emphasize that our results show that birds have the highest values of such a trend. Second, after reading the reviewer’s comments, we have decided that we will perform an additional analysis to correct for differences in the taxonomic group sizes, which will allow us to have more confidence in the results.

      It is also not clear why the correlation coefficient between alternative splicing ratio and genome length, gene content, and coding percentage should display such a trend, rather than the absolute value. There are only vague mechanistic arguments.

      The study analyzes the relationship of AS with genomic composition for the large taxonomic groups. We assume that significant differences in these relationships are indicators of the presence of different mechanisms of genome evolution. However, we agree with the reviewer that a correlation does not imply a causal relation, so we will be more cautious when interpreting the results.

      To quantify the relationships we use correlation coefficients, the slopes of such correlations, and the relation of variability. Although the absolute values of AS are also illustrated in Table 4, we consider that they are less informative than if we include how it relates to the genomic composition. For example, we observe that plants have a different genome composition and relation with AS if compared to animals, which suggest that they follow different mechanisms of genome evolution. On the other hand, we observe a trend in animals, where high values of AS are associated to a large percentage of introns and a percentage of intergenic DNA of about the 50% of genomes.

      Much more troubling, however, is the statement that the data supports "lineage-specific trends" (lines 299-300). Either this is just an ambiguous formulation, or the authors claim that you can see trends within lineages.

      We agree with the reviewer that this statement is not correct, so we will proceed to correct it.

      The latter is clearly not the case. In fact, within each lineage, there is a tremendous amount of variation, to such an extent that many of the coefficients given in Table 3 are close to meaningless. Note that no error bars or p-values are presented for the values shown in Table 3. Figure 2 shows the actual correlation, and the coefficient for flowering plants there is given as 0.151, with a p-value of 0.193. Table 3 seems to quote r=0.174 instead. It should be clear that a correlation within a lineage or species is not a sign of a trend.

      The reviewer is not understanding correctly the results in Table 3. It is precisely the variation of the genome variables what we are measuring. Given the standardization of these values by the mean values, we have proceeded to compare the variability between groups, which is the result shown in Table 3. In this case there are no error bars or p-values associated. On the other hand, we agree that a correlation is not a sign of a trend. But the relations of variability, together with the results obtained in Figure 3, are indicators of a trend. As we mentioned before, we will proceed to analyze whether the variation in the group sizes is causing a bias in the results.

      There are several wrong or unsupported statements in the manuscript. Early on, the authors state that the alternative splicing ratio (a number greater or equal to one that can be roughly understood as the number of different isoforms per ORF) "quantifies the number of different isoforms that can be transcribed using the same amount of information" (lines 51-52). But in many cases, this is incorrect, because the same sequence can represent different amounts of information depending on the context. So, if a changed context gives rise to a different alternative splice, it is because the genetic sequence has a different meaning in the changed context: the information has changed.

      We agree that there are not well supported statements, so we will proceed to revise them.

      In line 149, the authors state that "the energetic cost of having large genomes is high". No citation is given, and while such a statement seems logical, it does not have very solid support.

      We will also revise the bibliography and support our statements with updated references.

      If there was indeed a strong selective force to reduce genome size, we would not see the stunning diversity of genome sizes even within lineages. This statement is repeated (without support) several times in the manuscript, apparently in support of the idea that mammals had "no choice" to increase complexity via alternative splicing because they can't increase it by having longer genomes. I don't think this reasoning can be supported.

      We agree with the reviewer in this issue, so we will carefully revise the statements that indirectly (or directly) assume the action of selective forces on the genome composition.

      Even more problematic is the statement that "the amount of protein-coding DNA seems to be limited to a size of about 10MB" (line 219). There is no evidence whatsoever for this statement.

      In Figure 1A we observe a one-to-one relationship between the genome size and the amount of coding. However, in multicellular organisms, although the genome size increases we observe that the amount of coding does not increase by more than 10Mb, which suggest the presence of some genomic limitation. Of course, this is not an absolute or general statement, but rather a suggestion. We are only describing our results.

      The reference that is cited (Choi et al 2020) suggests that there is a maximum of 150GB in total genome size due to physiological constraints. In lines 257-258, the authors write that "plants are less restricted in terms of storing DNA sequences compared to animals" (without providing evidence or a citation).

      We will revise the bibliography and add updated references.

      I believe this statement is made due to the observation that plants tend to have large intergenic regions. But without examining the functionality of these interagency regions (they might host long non-coding RNA stretches that are used to regulate the expression of other genes, for example) it is quite adventurous to use such a simple measure as being evidence that plants "are less restricted in terms of storing DNA sequences", whatever that even means. I do not think the authors mean that plants have better access to -80 freezers. The authors conclude that "plant's primary mechanism of genome evolution is by expanding their genome". This statement itself is empty: we know that plants are prone to whole genome duplication, but this duplication is not, as far as we understand, contributing to complexity. It is not a "primary mechanism of genome evolution".

      We will revise these statements.

      In lines 293-294, the authors claim that "alternative splicing is maximized in mammalian genomes". There is no evidence that this ratio cannot be increased. So, to conclude (on lines 302-303) that alternative splicing ratios are "a potential candidate to quantify organismal complexity" seems, based on this evidence, both far-fetched and weak at the same time.

      Our results show the highest values of AS in mammals, but we understand that the results are limited to the availability and accuracy of data, which we will emphasize in the manuscript. As we previously mention, we will also proceed to analyze the uncertainty in data and carry out the appropriate corrections.

      I am also not very comfortable with the data analysis. The authors, for example, say that they have eliminated from their analysis a number of "outlier species". They mention one: Emmer wheat because it has a genome size of 900 Mb (line 367). Since 900MB does not appear to be extreme, perhaps the authors meant to write 900 Gb. When I consulted the paper that sequenced Triticum dicoccoides, they noted that 14 chromosomes are about 10GB. Even a tetraploid species would then not be near 900Gb. But more importantly, such a study needs to state precisely which species were left out, and what the criteria are for leaving out data, lest they be accused of selecting data to fit their hypothesis.

      The reviewer is right, we wanted to say 900Mb, which is approximately 7.2Gb. We had a mistake of nomenclature. This value is extreme compared to the typical values, so it generates large deviations when applying measures of central tendency and dispersion. We want to obtain mean values that are representative of the most species composing the taxonomic groups, so we find appropriate to exclude all outlier values in the study. Nevertheless, we will specify the criteria that we have used to select the data in a rigorous way.

      I understand that Methods are often put at the end of a manuscript, but the measures discussed here are so fundamental to the analysis that a brief description of what the different measures are (in particular, the "alternative splicing ratio") should be in the main text, even when the mathematical definition can remain in the Methods.

      We agree with the reviewer, so we will add a brief description of the genomic variables at the beginning of the Results section.

      Finally, a few words on presentation. I understand that the following comments might read differently after the authors change their presentation. This manuscript was at the border of being comprehensible. In many cases, I could discern the meaning of words and sentences in contexts but sometimes even that failed (as an example above, about "species-specific trends", illustrates). The authors introduced jargon that does not have any meaning in the English language, and they do this over and over again.

      Note that I completely agree with all the comments by the other reviewer, who alerted me to problems I did not catch, including the possible correlation with effective population size: a possible non-adaptive explanation for the results.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We understand that this is a unique situation (or at least to us it is) to have a reviewer resign and we are now down to one reviewer. In light of this, would either of you be willing to let us know your thoughts and suggestions on our manuscript? We think this would help direct us as to how to proceed.

      In light of our work being confirmed by a different laboratory as well as the numerous improvements and additions we provided adhering to the requirements of both reviewers, we were surprised and disappointed by no changes in the eLife assessment of the manuscript.

      In more detail, please note and consider that:

      (1) Our results have been confirmed, i.e., replicated already by a different independent team (Gonzalez-Palomares, E., Boulanger-Bertolus, J., Dupin, M. et al. Amplitude modulation pattern of rat distress vocalisations during fear conditioning. Sci Rep 13, 11173 (2023). https://doi.org/10.1038/s41598-023-38051-7); and these authors are citing our Elife preprint.

      They have observed very comparable 44-kHz vocalizations during a similar fear conditioning protocol – with 10 foot shocks; 44-kHz vocalizations were described to occur during later conditioning trials, i.e., 8-10, in contrast to trials 1-3 (Fig. S4) – all of which confirms our findings. Before our Elife report, it was difficult or discouraging to report 44-kHz vocalizations since they did not follow or fit the established definitions of types/subtypes of rat vocalizations.

      We are confused our findings gained no merit and reliability despite the replication.

      (2) We have addressed all the concerns and comments of both reviewers, and made substantial additions. In particular, Reviewer #1 suggested fatigue as a possible cause of increased emissions of 44-kHz vocalizations. We followed this hypothesis in good faith and performed several analyses which required several months of work and manual inspection of numerous vocalizations and video recordings. In the end, these results concerning the duration and mean power of the calls, which do not support the fatigue idea were included. However, this addition was not mentioned in the reviews or the assessment.

      We are not sure what to do in regards to some recommendations (e.g., concerning “behavioral response to call playback” or aversive calls which sometimes “can occur at frequencies above 22 kH”) repeated in the second review which we have already addressed.

      Also, we are unsure of the basis of some of the Reviewer’s concerns, in particular: that 44-kHz calls are not a separate group but “merely represent expected variation in vocalization parameters” or a problem with referring to 44-kHz calls as “a new group” (is it because they likely have been emitted before, in different experiments – though undetected, or they are closely related to 22-kHz calls, i.e., constitute the same group).

      (3) We would be happy to make additional improvements and changes (e.g., changing “Rats” to “Male rats” within the title) but it would be really helpful to have some confirmation as to which ones of these changes are crucial according to the journal Editors. For example, we are now running new analyses to show that 44-kHz calls occur in groups, i.e., they usually happen one after the other – but again we are not sure if this is the kind of analysis required at this stage of the manuscript submission; and whether it addresses the concerns of Reviewer #1.


      The following is the authors’ response to the original reviews.

      We would like to express our gratitude to the reviewers for their suggestions and critiques as we continually strive to enhance the quality of the manuscript. We improved it, by incorporating the reviewers’ suggestions, changing the content and numbering of figures (Figs 1, 3S1 were edited; 4 figures were moved to supplemental materials), and adding several analyses suggested by the reviewers along with accompanying figures (1S2, 1S3) and tables (1 and 2). These analyses include investigating the link between freezing behavior and 44-kHz calls as well as their sound mean power and duration. Also, we have introduced detailed information regarding the experiments performed as well as expanded the description and discussion of the results section. Finally, we added the information about 44-kHz calls reported by another group – which was inspired by our findings.

      Below is the point-by-point response to the reviewers’ comments.

      Reviewer #1 (Public Review):

      Olszyński and colleagues present data showing variability from canonical "aversive calls", typically described as long 22 kHz calls rodents emit in aversive situations. Similarly long but higher-frequency (44 kHz) calls are presented as a distinct call type, including analyses both of their acoustic properties and animals' responses to hearing playback of these calls. While this work adds an intriguing and important reminder, namely that animal behavior is often more variable and complex than perhaps we would like it to be, there is some caution warranted in the interpretation of these data. The authors also do not provide adequate justification for the use of solely male rodents. With several reported sex differences in rat vocal behaviors this means caution should be exercised when generalizing from these findings.

      We fully agree that our data should be interpreted with caution and we followed the Reviewer’s suggestions along these lines (see below). Also, we appreciate the suggestion to explore the prevalence of 44-kHz calls in female subjects, which would indeed represent an important and intriguing extension of our research. However, due to present financial constraints, we can only plan such experiments. To address the comment, we have added the sentence: “Here we are showing introductory evidence that 44-kHz vocalizations are a separate and behaviorally-relevant group of rat ultrasonic calls. These results require further confirmations and additional experiments, also in form of repetition, including research on female rat subjects.”

      It is important to note that the data presented in the current manuscript originates primarily from previously conducted experiments. These earlier experiments employed male subjects only; it was due to established evidence indicating that the female estrus cycle significantly influences ultrasonic vocalization (Matochik et al., 1992). Adhering to controls for the estrus cycle would require a greater number of female subjects than males, which would not only increase animal suffering but also escalate the demands of human labor and financial costs.

      Firstly, the authors argue that the shift to higher-frequency aversive calls is due to an increase in arousal (caused by the animals having received multiple aversive foot shocks towards the end of the protocols). However, it cannot be ruled out that this shift would be due to factors such as the passage of time and increase in fatigue of the animals as they make vocalizations (and other responses) for extended periods of time. In fact the gradual frequency increase reported for 22 kHz calls and the drop in 44 kHz calls the next day in testing is in line with this.

      Answer: We would like to point out that the “increased-arousal” hypothesis, declared in the manuscript, is only a hypothesis – as reflected by the wording used. However, we changed the beginning of the sentence in question from “It could be argued” to “We would like to propose a hypothesis” to emphasize the speculative aspect of the proposed explanation behind the increase of 44-kHz ultrasonic emissions.

      Also, we do agree that other factors could contribute to the increased emission of 44kHz calls. These factors could include: heightened fear, stress/anxiety, annoyance/anger, disgust/boredom, grief/sadness, despair/helplessness, and weariness/fatigue. We are listing these potential factors in the discussion. Also, we added: “It is not possible, at this stage, to determine which factors played a decisive role. Please note that the potential contribution of these factors is not mutually exclusive”. However, we propose a list of arguments supporting the idea that 44-kHz vocalizations communicate an increased negative emotional state. Among these arguments were the conclusions drawn from additional analyses – mostly inspired by the fatigue hypothesis proposed by the Reviewer #1. In particular, we investigated changes in the sound mean power and duration of 22-kHz and 44-kHz calls. Specifically, we showed that the mean power of 44-kHz vocalizations did not change, and was higher than that of 22-kHz vocalizations (Fig. 1S2EF).

      Finally, the Reviewer #1 listed “the gradual frequency increase reported for 22 kHz calls and the drop in 44 kHz calls the next day” as arguments for the fatigue hypothesis. We do not agree that the “increase” should be interpreted as a sign of fatigue [Producing and maintaining higher frequency calls require greater effort from the vocalizer, on which we elaborated in the manuscript], also we are not sure what “drop in 44 kHz calls” the Reviewer is referring to [We assume it refers to less 44-kHz calls during testing vs. training; we suppose that the levels of arousal are lower in the test due to shorter session time and lack of shocks, which additionally contributes to fear extinction].

      Secondly, regarding the analysis where calls were sorted using DBSCAN based on peak frequency and duration, it is not surprising that the calls cluster based on frequency and duration, i.e. the features that are used to define the 44 kHz calls in the first place. Thus presenting this clustering as evidence of them being truly distinct call types comes across as a circular argument.

      Answer: The DBSCAN sorting results were to convey that when changing the clustering ε value, the degree of cluster separation, the 44-kHz vocalizations remained distinct from the 22-kHz and various short-call clusters that merged. In other words: 44-kHz calls remained separate from long 22-kHz, short 22-kHz and 50-kHz vocalizations, which all consolidated into one common cluster. As a result, in this mathematical analysis, 44-kHz vocalizations remained distinct without applying human biases. Additionally, frequency and duration are the two most common features used to define all types of calls (Barker et al., 2010; Silkstone & Brudzynski, 2019a, 2019b; Willey & Spear, 2013). In summary, we did not expect the analysis to isolate out the 44-kHz calls, and we were surprised by this result.

      The sparsity of calls in the 30-40 kHz range (shown in the individual animal panels in Figure 2C) could in theory be explained by some bioacoustics properties of rat vocal cords, without necessarily the calls below and above that range being ethologically distinct.

      Answer: We respectfully disagree with the argument regarding sparsity. It is important to note that, during prolonged fear conditioning experiments, we observed an increased incidence of 44-kHz calls (Fig. 1E-G) of up to >19% (Fig. 1S2AB) of the total ultrasonic vocalizations during specific inter-trial intervals. Also, it is possible that in observed experimental circumstances almost every fifth call could be attributed to the vocal apparatus as an artifact of its functioning (assuming we are interpreting the Reviewer’s argument correctly). While we do not believe this to be the case, we acknowledge the importance of considering such a hypothesis.

      The behavioral response to call playback is intriguing, although again more in line with the hypothesis that these are not a distinct type of call but merely represent expected variation in vocalization parameters. Across the board animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls. This does raise interesting questions about how, ethologically, animals may interpret such variation and integrate this interpretation in their responses. However, the categorical approach employed here does not address these questions fully.

      Answer: We are unsure of the Reviewer’s critique in this paragraph and will attempt to address it to the best of our understanding. Our finding of up to >19% of long seemingly aversive, 44-kHz calls, at a frequency in the define appetitive ultrasonic range (usually >32 kHz) is unexpected rather than “expected”. We would agree that aversive call variation is expected, but not in the appetitive frequency range.

      Kindly note the findings by Saito et al. (2019), which claim that frequency band plays the main role in rat ultrasonic perception. It is possible that the higher peak frequency of 44kHz calls may be a strong factor in their perception by rats, which is, however, modified by the longer duration and the lack of modulation.

      Also, from our experience, it is quite challenging to demonstrate different behavioral responses of naïve rats to pre-recorded 22-kHz (aversive) vs. 50-kHz (appetitive) vocalizations. Therefore, to demonstrate a difference in response to two distinct, potentially aversive, calls, i.e., 22-kHz vs. 44-kHz calls, to be even more difficult (as to our knowledge, a comparable experiment between short vs. long 22-kHz ultrasonic vocalizations, has not been done before).

      Therefore, we do not take lightly the surprising and interesting finding that “animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls”. We would rather put this description in analogous words: “the rats responded similarly to hearing 44-kHz calls as they did to hearing aversive 22-kHz calls, especially regarding heartrate change, despite the 44-kHz calls occupying the frequency band of appetitive 50-kHz vocalizations” and “other responses to 44-kHz calls were intermediate, they fell between response levels to appetitive vs. aversive playback” – which we added to the Discussion.

      Finally, we acknowledge that our findings do not present a finite and complete picture of the discussed aspects of behavioral responses to the presented ultrasonic stimuli (44-kHz vocalizations). Therefore, we have incorporated the Reviewer’s suggestion in the discussion. The added sentence reads: “Overall, these initial results raise further questions about how, ethologically, animals may interpret the variation in hearing 22-kHz vs. 44-kHz calls and integrate this interpretation in their responses.”

      In sum, rather than describing the 44kHz long calls as a new call type, it may be more accurate to say that sometimes aversive calls can occur at frequencies above 22 kHz. Individual and situational variability in vocalization parameters seems to be expected, much more so than all members of a species strictly adhering to extremely non-variable behavioral outputs.

      Answer: The surprising fact that there are presumably aversive calls that are beyond the commonly applied thresholds, i.e. >32 kHz, while sharing some characteristics with 22-kHz calls, is the main finding of the current publication. Whether they be finally assigned as a new type, subtype, i.e. a separate category or become a supergroup of aversive calls with 22-kHz vocalizations is of secondary importance to be discussed with other researchers of the field of study.

      However, we would argue – by showing a comparison – that 22-kHz calls occur at durations of <300 ms and also >300 ms, and are, usually, referred to in literature as short and long 22-kHz vocalizations, respectively (not introduced with a description that “sometimes 22kHz calls can occur at durations below 300 ms”). These are then regarded and investigated as separate groups or classes usually referred to as two different “types” (e.g., Barker et al., 2010) or “subtypes” (e.g., Brudzynski, 2015). Analogously, 44-kHz vocalizations can also be regarded as a separate type or a subtype of 22-kHz calls. The problem with the latter is that 22-kHz vocalizations are traditionally and predominantly defined by 18–32 kHz frequency bandwidth (Araya et al., 2020; Barroso et al., 2019; Browning et al., 2011; Brudzynski et al., 1993; Hinchcliffe et al., 2022; Willey & Spear, 2013).

      Reviewer #2 (Public Review):

      Olszyński et al. claim that they identified a "new-type" ultrasonic vocalization around 44 kHz that occurs in response to prolonged fear conditioning (using foot-shocks of relatively high intensity, i.e. 1 mA) in rats. Typically, negative 22-kHz calls and positive 50-kHz calls are distinguished in rats, commonly by using a frequency threshold of 30 or 32 kHz. Olszyński et al. now observed so-called "44-kHz" calls in a substantial number of subjects exposed to 10 tone-shock pairings, yet call emission rate was low (according to Fig. 1G around 15%, according to the result text around 7.5%).

      Answer: We are thankful for praising the strengths. Please note Figure 1G referred to 10-trial Wistar rats during delay fear conditioning session in which 44-kHz constituted 14.1% of ultrasonic vocalizations. The 7.5% number in results refers to the total of vocalizations analyzed across all animal groups used in fear conditioning experiments. These values have been updated in the current version of the manuscript. Also, please note – 44-kHz calls constituted up to 19.4% of calls, on average, in one of the ITI during fear conditioning session. However, the prevalence of aversive calls and of 44-kHz vocalizations in particular varied. It varied between individual rats; we added the text: “for n = 3 rats, 44-kHz vocalizations accounted for >95% of all calls during at least one ITI (e.g., 140 of total 142, 222 of 231, and 263 of 265 tallied 44-kHz calls), and in n = 9 rats, 44-kHz vocalizations constituted >50% of calls in more than one ITI.” See also further for the description of the array of experiments analyzed and the prevalence/percentage of 44-kHz calls encountered (Tab. 1, Fig. 1S3).

      Weaknesses: I see a number of major weaknesses.

      While the descriptive approach applied is useful, the findings have only focused importance and scope, given the low prevalence of "44 kHz" calls and limited attempts made to systematically manipulate factors that lead to their emission. In fact, the data presented appear to be derived from reanalyses of previously conducted studies in most cases and the main claims are only partially supported. While reading the manuscript, I got the impression that the data presented here are linked to two or three previously published studies (Olszyński et al., 2020, 2021, 2023). This is important to emphasize for two reasons:

      (1) It is often difficult (if not impossible) to link the reported data to the different experiments conducted before (and the individual experimental conditions therein). While reanalyzing previously collected data can lead to important insight, it is important to describe in a clear and transparent manner what data were obtained in what experiment (and more specifically, in what exact experimental condition) to allow appropriate interpretation of the data. For example, it is said that in the "trace fear conditioning experiment" both single- and grouphoused rats were included, yet I was not able to tell what data were obtained in single- versus group-housed rats. This may sound like a side aspect, however, in my view this is not a side aspect given the fact that ultrasonic vocalizations are used for communication and communication is affected by the social housing conditions.

      Answer: Preparing the current manuscript, we indeed used data collected during fear conditioning experiments which were described previously (Olszyński et al., 2021; Olszyński et al., 2022). Please note, however, that vocalization behavior during the fear conditioning itself was not the main subject of these publications. Our previous publications (Olszyński et al., 2020; Olszyński et al., 2021; Olszyński et al., 2022) present primarily ultrasonic-vocalization data from playback-part of experiments whereas here we analyze recordings obtained during fear conditioning experiments, thus we are analyzing new parts, i.e., not yet analyzed, of previously published studies. Also, we have performed additional experiments.

      In the first version of the current manuscript, we did not attempt to demonstrate exactly which calls were recorded in which conditions as the focus was to demonstrate that 44-kHz calls were emitted in several different fear-conditioning experiments. Also, as the experiments were not performed simultaneously and are results from different experimental situations, we would prefer to not compare these results directly.

      However, in the current version of the manuscript, we have introduced an additional reference system, based on Tab. 1, to more clearly indicate which rats have been employed in each analysis, e.g. the group of “Wistar rats that undergone 10 trials of fear conditioning” are described as “Tab. 1/Exp. 1-3/#2,4,8,13; n = 46”, i.e., these are the rats listed in rows 2, 4, 8, and 13 of Tab. 1.

      We have also tried to unify the analyses, in terms of rats used, as much as possible. Finally, we have also introduced Fig. 1S3 to demonstrate the prevalence of 44-kHz calls in all experiments analyzed with the note that “the experiments were not performed in parallel”.

      Regarding the Reviewer’s concerns about analyzing single- and pair-housed rats together. We have examined ultrasonic vocalizations emitted and freezing behavior in these two groups.

      • Ultrasonic vocalizations; when comparing the number of vocalizations, their duration, peak frequency and latency to first occurrence, equally for all types of calls and divided into types (short 22-kHz, long 22-kHz, 44-kHz, 50-kHz), the only difference was observed in peak frequency in 50-kHz vocalizations (50.7 ± 2.8 kHz for paired vs. 61.8 ± 3.1 kHz for single rats; p = 0.0280, Mann-Whitney). Since 50-kHz calls are not the subject of the current publication, we did not investigate this difference further. Also, this difference was not observed during playback experiments (Olszyński et al., 2020, Tab. 1).

      • Freezing. There were no differences between single- and pair-housed groups in freezing behavior, both in the time before first shock presentation and during fear conditioning training (Mann-Whitney).

      In summary, since the two groups did not differ in relevant ultrasonic features and freezing, we decided to present the results obtained from these rats together. However, we agree with the Reviewer, and it is possible that social housing conditions may in fact affect the emission of 44-kHz vocalizations, which could be a subject of another project – involving, e.g., larger experimental groups observed under hypothesis-oriented and defined conditions.

      (2) In at least two of the previously published manuscripts (Olszyński et al., 2021, 2023), emission of ultrasonic vocalizations was analyzed (Figure S1 in Olszyński et al., 2021, and Fig. 1 in Olszyński et al., 2023). This includes detailed spectrographic analyses covering the frequency range between 20 and 100 kHz, i.e. including the frequency range, where the "newtype" ultrasonic vocalization, now named "44 kHz" call, occurs, as reflected in the examples provided in Fig. 1 of Olszyński et al. (2023). In the materials and methods there, it was said: "USV were assigned to one of three categories: 50-kHz (mean peak frequency, MPF >32 kHz), short 22-kHz (MPF of 18-32 kHz, <0.3 s duration), long 22-kHz (MPF of 18-32 kHz, >0.3 s duration)". Does that mean that the "44 kHz" calls were previously included in the count for 50-kHz calls? Or were 44 kHz calls (intentionally?) left out? What does that mean for the interpretation of the previously published data? What does that mean for the current data set? In my view, there is a lack of transparency here.

      Answer: As mentioned above, we indeed used data collected during fear conditioning experiments which were described previously (Olszyński et al., 2021; Olszyński et al., 2022). However, in these publications, ultrasonic vocalizations emitted during playback experiments were the main subject, while the ultrasonic calls emitted during fear conditioning (performed before the playback) were only analyzed in a preliminary way. As a result, the 44-kHz vocalizations analyzed in the current manuscript were not included in the previous analyses. In particular, in Olszyński et al. (2021), we counted the overall number of ultrasonic vocalizations before fear conditioning session to determine the basal ultrasonic emissions (Fig. S1). Then, our next article (Olszyński et al., 2022), we analyzed again the number of all ultrasonic vocalizations before fear conditioning (Fig. S1) and restricted the analysis of vocalizations during fear conditioning to 22-kHz calls (Tab. S1 and S2).

      Also, we re-reviewed all the data used in our previous playback publications. Overall, 44-kHz calls were extremely rare in playback parts of the experiments. There were no 44-kHz calls in the playback data used in Olszyński et al. (2022) and Olszyński et al. (2020). In Olszyński et al. (2021), one rat produced eight 44-kHz calls. These 44-kHz calls constituted 0.03% of all vocalizations analyzed in the experiment (8/24888) and were included in the total number of calls analyzed (but not in the 50-kHz group), they were not described in further detail in that publication.

      Moreover, whether the newly identified call type is indeed novel is questionable, as also mentioned by the authors in their discussion section. While they wrote in the introduction that "high-pitch (>32 kHz), long and monotonous ultrasonic vocalizations have not yet been described", they wrote in the discussion that "long (or not that long (Biały et al., 2019)), frequency-stable high-pitch vocalizations have been reported before (e.g. Sales, 1979; Shimoju et al., 2020), notably as caused by intense cholinergic stimulation (Brudzynski and Bihari, 1990) or higher shock-dose fear conditioning (Wöhr et al., 2005)" (and I wish to add that to my knowledge this list provided by the authors is incomplete). Therefore, I believe, the strong claims made in abstract ("we are the first to describe a new-type..."), introduction ("have not yet been described"), and results ("new calls") are not justified.

      Answer: We would argue that 44-kHz vocalizations were indeed reported but not described. As far as we are concerned, an in-depth analysis of the properties and experimental circumstance of emission of long, high-frequency calls has not yet been performed. These researchers have observed, at least to a degree, similar calls to the ones we observed – as we mentioned in the discussion section. However, since these reported 44-kHz vocalizations were not fully described, we can only guess that they may be similar to ours. We speculate that perhaps like us, these researchers unknowingly recorded 44-kHz calls in their experiments and may also be able to describe them more extensively when re-analyzing their data as we have done here.

      Possibly, it was difficult to find reports on vocalizations, similar to the 44-kHz calls that we observed, because of the canonical and accepted definitions of ultrasonic vocalization types. Biały et al. (2019) allocated them as a part of 22-kHz group, perhaps because their calls were often of a step variation having both low and high components. Shimoju et al. (2020) grouped them along with 50-kHz vocalizations because they appeared during stroking rats held vertically; this procedure was compared to tickling which usually elicits appetitive calls.

      The Reviewer #2 states there are other publications to complete the list. We are aware of other articles authored by the same team as Shimoju et al. (2020) with different first authors. However, they are reporting similar findings to the cited article. Otherwise, we would gladly cite a more complete list of publications showing atypical, long, monotonous highfrequency vocalizations, similar to those observed in our experiments. Therefore, we would argue that ultrasonic vocalizations which were long, flat, high in frequency, and repeatedly occurring in a defined behavioral situation, have not been reported before. However, concerning the strong claims of novelty of our finding, we toned them down where we found this was warranted.

      In general, the manuscript is not well written/ not well organized, the description of the methods is insufficient, and it is often difficult (if not impossible) to link the reported data to the experiments/ experimental conditions described in the materials and methods section.

      Answer: The description of the methods has been adjusted and expanded. We added the requested link to each particular experiment as a formula “Tab. 1/Exp. nos./# nos.” which shows, each time, which experiments and experimental groups were analyzed. The list of the experiments and groups is found in the Tab. 1.

      For example, I miss a clear presentation of basic information: 1) How many rats emitted "44 kHz" calls (in total, per experiment, and importantly, also per experimental condition, i.e. single- versus group-housed)?

      Answer: We now clearly show which experiments were performed and how many animals were tested in each condition (Tab. 1), while the prevalence of 44-kHz calls amongst experimental conditions and animal groups is shown in Fig. 1S3. Also, we included information regarding the number of animals and treatment of each group of rats when reporting results. For example, we are stating that:

      (1a) “53 of all 84 conditioned Wistar rats (Tab. 1/Exp. 1-3/#2,4,6-8,13, Figs 1B, 1E, 1S1BC) displayed” 44-kHz vocalizations – as a general assessment; these numbers are different from those in the first version of the Ms, when we are mentioning Wistar rats conditioned 6 or 10 times only.

      (1b) “From this group of rats (n = 46), n = 41 (89.1%) emitted long 22-kHz calls, and 32 of them (69.6%) emitted 44-kHz calls” – this time referring only to 10-times conditioned Wistar rats as the biggest group that could be analyzed together (Figs 1F, 1G, 1S2A).

      (1c) “for n = 3 rats, 44-kHz vocalizations accounted for >95% of all calls during at least one ITI (e.g., 140 of total 142, 222 of 231, and 263 of 265 tallied 44-kHz calls), and in n = 9 rats, 44kHz vocalizations constituted >50% of calls in more than one ITI.”

      (2) Out of the ones emitting "44 kHz" calls, what was the prevalence of "44 kHz" calls (relative to 22- and 50-kHz calls, e.g. shown as percentage)?

      Answer: The prevalence of 44-kHz vocalizations in all investigated experiments and groups is shown in Fig. 1S3CD. Also, more information regarding the percentage of 44-kHz calls was demonstrated in Fig. 1S2AB where we calculated the distribution of 44-kHz calls to 22-kHz calls in Wistar rats, in 10-trial fear conditioning, across the length of the session.

      Additionally, the values are listed in the sentence regarding all Wistar rats which underwent 10 trials of fear conditioning: “these vocalizations were less frequent following the first trial (1.2 ± 0.4% of all calls), and increased in subsequent trials, particularly after the 5th (8.8 ± 2.8%), through the 9th (19.4 ± 5.5%, the highest value), and the 10th (15.5 ± 4.9%) trials, where 44-kHz calls gradually replaced 22-kHz vocalizations in some rats (Fig. 1F, 1S2B, Video 1; comp Fig. 1D vs. 1E).”

      (3) How did this ratio differ between experiments and experimental conditions?

      Answer: The prevalence of 44-kHz vocalizations in all experimental conditions is shown in Fig. 1S3. However, the direct comparison of results obtained in different conditions was not the goal of the present work. Also, we would argue, that such direct comparisons of results of different experiments would not be allowed. These experiments were done with different groups of animals, at different times, with different timetables of experimental manipulations.

      However, we are comfortable to state that:

      • There were more 44-kHz vocalizations during fear conditioning training than testing in all fear-conditioned Wistar rats;

      • We observed more 44-kHz vocalizations in Wistar rats compared to SHR.

      (4) Was there a link to freezing? Freezing was apparently analyzed before (Olszyński et al., 2021, 2023) and it would be important to see whether there is a correlation between "44-kHz" calls and freezing. Moreover, it would be important to know what behavior the rats are displaying while such "44-kHz" calls are emitted? (Note: Even not all 22-kHz calls are synced to freezing.) All this could help to substantiate the currently highly speculative claims made in the discussion section ("frequency increases with an increase in arousal" and "it could be argued that our prolonged fear conditioning increased the arousal of the rats with no change in the valence of the aversive stimuli"). Such more detailed analyses are also important to rule out the possibility that the "new-type" ultrasonic vocalization, the so-called "44 kHz" call, is simply associated with movement/ thorax compression.

      Answer: We analyzed freezing behavior and its association with ultrasonic emissions. The emission of 44-kHz vocalizations was associated with freezing. The results are now described and presented in the manuscript, i.e., Tab. 2, its legend and the description in Results: “Freezing during the bins of 22-kHz calls only (p < 0.0001, for both groups) and during 44-kHz calls only bins (p = 0.0003) was higher than during the first 5 min baseline freezing levels of the session. Also, the freezing associated with emissions of 44-kHz calls only was higher than during bins with no ultrasonic vocalizations (p = 0.0353), and it was also 9.9 percentage points higher than during time bins with only long 22-kHz vocalizations, but the difference was not significant (p = 0.1907; all Wilcoxon)” and “To further investigate this potential difference, we measured freezing during the emission of randomly selected single 44-kHz and 22-kHz vocalizations. The minimal freezing behavior detection window was reduced to compensate for the higher resolution of the measurements (3, 5, 10, or 15 video frames were used). There was no difference in freezing during the emission of 44-kHz vs. 22-kHz vocalizations for ≥150ms-long calls (3 frames, p = 0.2054) and for ≥500-ms-long calls (5 frames, p = 0.2404; 10 frames, p = 0.4498; 15 frames, p = 0.7776; all Wilcoxon, Tab. 2B).”

      Please note, that the general observation that "frequency increases with an increase in arousal" is not our claim but a general rule derived from large body of observations and proposed by the others (Briefer et al., 2012); we changed the wording of this statement to: “frequency usually increases with an increase in arousal (Briefer et al., 2012)”.

      The figures currently included are purely descriptive in most cases - and many of them are just examples of individual rats (e.g. majority of Fig. 1, all of Fig. 2 to my understanding, with the exception of the time course, which in case of D is only a subset of rats ("only rats that emitted 44-kHz calls in at least seven ITI are plotted" - is there any rationale for this criterion?)), or, in fact, just representative spectrograms of calls (all of Fig. 3, with the exception of G, all of Fig. 4).

      Answer: Please note, the former figures 2, 4, 6, and 8 have been now moved to supplementary figures 1S1, 2S1, 3S1, and 4S1 – to better organize the presentation of data. Figures 1, 3, 5, 7 are now 1, 2, 3, 4 respectively. In regards to presenting data from individual rats, this was to show the general patterns of ultrasonic-calls distributions observed. Showing the full data set as seen in Fig. 5A (now Fig. 3A) would obscure the readability of the graph without using mathematical clustering techniques such as DBSCAN.

      Concerning the Reviewer’s #2 question regarding the criterion of “minimum seven ITI”, we selected the highest vocalizers by taking animals above the 75th percentile of the number of ITI with 44-kHz calls. However, in the current version of the manuscript, we decided to omit this part of the analysis and the accompanying part of the figure, since it did not provide any additional informative value (apart from employing questionable criterion).

      Moreover, the differences between Fig. 5 and Fig. 6 are not clear to me. It seems Fig. 5B is included three times - what is the benefit of including the same figure three times?

      Answer: We hope that designating Fig. 6 as supplementary to Fig. 5 (now Figs 3S1 and 3, respectively) will make interpreting them more streamlined. Fig. 6A (now Fig. 3S1A) is a more detailed look on information presented in Fig. 5B (now Fig. 3B) with spectrogram images of ultrasonic vocalizations from different areas of the plot. Also, Fig. 3B (former Fig. 5B) was removed from Fig. 3S1B (former Fig. 6B).

      A systematic comparison of experimental conditions is limited to Fig. 7 and Fig. 8, the figures depicting the playback results (which led to the conclusion that "the responses to 44-kHz aversive calls presented from the speaker were either similar to 22-kHz vocalizations or in between responses to 22-kHz and 50-kHz playbacks", although it remains unclear to me why differences were seen b e f o r e the experimental manipulation, i.e. the different playback types in Fig. 8B).

      Answer: There were indeed instances of such before-differences. Such differences were observed in our previous studies (Olszyński et al., 2020, Tabs S9-12; Olszyński et al., 2021, Tabs S7; Olszyński et al., 2022, Tabs S4, S9, S13, S17, S18) and were most likely due to analyzing multiple comparisons. However, we think that the carry-over effect, mentioned by the Reviewer #2 (see below), also played a role.

      Related to that, I miss a clear presentation of relevant methodological aspects: 1) Why were some rats single-housed but not the others?

      Answer: As stated before, data were collected from our previous experiments and the observation of 44-kHz vocalizations in fear conditioning was an emergent discovery as we decided to analyze ultrasonic recordings from fear conditioning procedures. Single-housed animals were part of our experiment comparing fear conditioning and social situation on the perception of ultrasonic playback as described in Olszyński et al. (2020). Aside from this experiment, all other rats were housed in pairs.

      (2) Is the experimental design of the playback study not confounded? It is said that "one group (n = 13) heard 50-kHz appetitive vocalization playback while the other (n = 16) 22-kHz and 44kHz aversive calls". How can one compare "44 kHz" calls to 22- and 50-kHz calls when "44 kHz" calls are presented together with 22-kHz calls but not 50-kHz calls? What about carry-over effects? Hearing one type of call most likely affects the response to the other type of call. It appears likely that rats are a bit more anxious after hearing aversive 22-kHz calls, for example. Therefore, it would not be very surprising to see that the response to "44 kHz" calls is more similar to 22-kHz calls than 50-kHz calls.

      Of note, in case of the other playback experiment it is just said that rats "received appetitive and aversive ultrasonic vocalization playback" but it remains unclear whether "44 kHz" calls are seen as appetitive or aversive. Later it says that "rats were presented with two 10-s-long playback sets of either 22-kHz or 44-kHz calls, followed by one 50-kHz modulated call 10-s set and another two playback sets of either 44-kHz or 22-kHz calls not previously heard" (and wonder what data set was included in the figures and how - pooled?). Again, I am worried about carry-over effects here. This does not seem to be an experimental design that allows to compare the response to the three main call types in an unbiased manner.

      Answer: We apologize for being confounding and brief in our original description of the playback experiments. We wanted to avoid confusion associated with including several additional playback signals (please note some are not related to the current comparisons and include different 50-kHz ultrasonic subtypes and two different subtypes of short 22-kHz calls). We lengthened the description of these playback experiments in the current version.

      In general, including more than one type of ultrasonic calls as playback has a risk of a carry-over effect as well as a habituation effect (the responses become weak). However, it greatly reduces the number of required animals. Finally, regarding the first experiment, we chose 3 playbacks to compare the rats’ reactions, as this was the most conservative choice we thought of.

      We would like to highlight that we wanted to compare specifically the rats’ responses to 22-kHz vs. 44-kHz playback (as well as the effects of playback of different subtypes 50-kHz calls, which is not the subject of the current work). Therefore, we would argue, that the design of both experiments is actually unbiased regarding this key comparison (responses to 22-kHz vs. 44-kHz playback). In both experiments, 22-kHz and 44-kHz playbacks were included in the same sequences of stimuli and counterbalanced regarding their order (i.e., taking into account possible carry-over effects), and presented to the same rats. We regarded the group of rats that heard 50-kHz recordings as a baseline/control, since we know from previous playback studies what reactions to expect from rats exposed to these vocalizations (and 22-kHz playback), while in the second experiment, we reduced the 50-kHz playback to one set in order to minimize possible habituation to multiple playbacks.

      We agree that the design of both experiments does not allow for full comparison of the effects of aversive playbacks to 50-kHz playback. Also, we agree that some carry-over effects could play a role. It was mentioned in the discussion: ”Please factor in potential carryover effects (resulting from hearing playbacks of the same valence in a row) in the differences between responses to 50-kHz vs. 22/44-kHz playbacks, especially, those observed before the signal (Fig. 4AB).” However, we would still argue that the observed lack of difference in heartrate response (Fig. 4A) and the differences regarding the number of 50-kHz calls emitted (e.g., Fig. 4S1F) are void of the constraints raised by the Reviewer #2.

      We acknowledge that our studies do not give a complete picture of 44-kHz ultrasonic perception in relation to other ultrasonic bands and, given the possibility, we would like to perform more in-depth and focused experiments to study this aspect of 44-kHz calls in the future.

      Finally, regarding the second experiment, the description of the rats now includes that they “received 22-kHz, 44-kHz, and 50-kHz ultrasonic vocalization playback”, while the description of the experiment itself includes: “Responses to the pairs of playback sets were averaged”.

      Of note, what exactly is meant by "control rats" in the context of fear conditioning is also not clear to me. One can think of many different controls in a fear conditioning experiment.

      More concrete information is needed.

      Answer: This information was included in our previous publications. However, it was now provided in the method section of the current version of the manuscript. In general, control rats were subjected to the same procedures but did not receive electric shocks.

      Literature included in the answers

      Araya, E. I., Baggio, D. F., Koren, L. O., Andreatini, R., Schwarting, R. K. W., Zamponi, G. W., & Chichorro, J. G. (2020). Acute orofacial pain leads to prolonged changes in behavioral and affective pain components. Pain, 161(12), 2830-2840. https://doi.org/10.1097/j.pain.0000000000001970

      Barker, D. J., Root, D. H., Ma, S., Jha, S., Megehee, L., Pawlak, A. P., & West, M. O. (2010). Dose-dependent differences in short ultrasonic vocalizations emitted by rats during cocaine self-administration. Psychopharmacology (Berl), 211(4), 435-442. https://doi.org/10.1007/s00213-010-1913-9

      Barroso, A. R., Araya, E. I., de Souza, C. P., Andreatini, R., & Chichorro, J. G. (2019). Characterization of rat ultrasonic vocalization in the orofacial formalin test: Influence of the social context. Eur Neuropsychopharmacol, 29(11), 1213-1226. https://doi.org/10.1016/j.euroneuro.2019.08.298

      Biały, M., Podobinska, M., Barski, J., Bogacki-Rychlik, W., & Sajdel-Sulkowska, E. M. (2019). Distinct classes of low frequency ultrasonic vocalizations in rats during sexual interactions relate to different emotional states. Acta Neurobiol Exp (Wars), 79(1), 1-12. https://www.ncbi.nlm.nih.gov/pubmed/31038481

      Briefer, E. F., Padilla de la Torre, M., & McElligott, A. G. (2012). Mother goats do not forget their kids' calls. Proc Biol Sci, 279(1743), 3749-3755. https://doi.org/10.1098/rspb.2012.0986

      Browning, J. R., Browning, D. A., Maxwell, A. O., Dong, Y., Jansen, H. T., Panksepp, J., & Sorg, B. A. (2011). Positive affective vocalizations during cocaine and sucrose self administration: a model for spontaneous drug desire in rats. Neuropharmacology, 61(1-2), 268-275. https://doi.org/10.1016/j.neuropharm.2011.04.012

      Brudzynski, S. M. (2015). Pharmacology of Ultrasonic Vocalizations in adult Rats: Significance, Call Classification and Neural Substrate. Curr Neuropharmacol, 13(2), 180-192. https://doi.org/10.2174/1570159x13999150210141444

      Brudzynski, S. M., & Bihari, F. (1990). Ultrasonic vocalization in rats produced by cholinergic stimulation of the brain. Neurosci Lett, 109(1-2), 222-226. https://doi.org/10.1016/0304-3940(90)90567-s

      Brudzynski, S. M., Bihari, F., Ociepa, D., & Fu, X. W. (1993). Analysis of 22 kHz ultrasonic vocalization in laboratory rats: long and short calls. Physiol Behav, 54(2), 215-221. https://doi.org/10.1016/0031-9384(93)90102-l

      Hinchcliffe, J. K., Jackson, M. G., & Robinson, E. S. (2022). The use of ball pits and playpens in laboratory Lister Hooded male rats induces ultrasonic vocalisations indicating a more positive affective state and can reduce the welfare impacts of aversive procedures. Lab Anim, 56(4), 370-379. https://doi.org/10.1177/00236772211065920

      Matochik, J. A., White, N. R., & Barfield, R. J. (1992). Variations in scent marking and ultrasonic vocalizations by Long-Evans rats across the estrous cycle. Physiol Behav, 51(4), 783-786. https://doi.org/10.1016/0031-9384(92)90116-j

      Olszyński, K. H., Polowy, R., Małż, M., Boguszewski, P. M., & Filipkowski, R. K. (2020). Playback of Alarm and Appetitive Calls Differentially Impacts Vocal, Heart-Rate, and Motor Response in Rats. iScience, 23(10), 101577. https://doi.org/10.1016/j.isci.2020.101577

      Olszyński, K. H., Polowy, R., Wardak, A. D., Grymanowska, A. W., & Filipkowski, R. K. (2021). Increased Vocalization of Rats in Response to Ultrasonic Playback as a Sign of Hypervigilance Following Fear Conditioning. Brain Sci, 11(8). https://doi.org/10.3390/brainsci11080970

      Olszyński, K. H., Polowy, R., Wardak, A. D., Grymanowska, A. W., Zieliński, J., & Filipkowski, R. K. (2022). Spontaneously hypertensive rats manifest deficits in emotional response to 22-kHz and 50-kHz ultrasonic playback. Prog Neuropsychopharmacol Biol Psychiatry, 120, 110615. https://doi.org/10.1016/j.pnpbp.2022.110615

      Saito, Y., Tachibana, R. O., & Okanoya, K. (2019). Acoustical cues for perception of emotional vocalizations in rats. Scientific Reports, 9(1), 10539.

      Sales, G. D. (1979). Strain Differences in the Ultrasonic Behavior of Rats (Rattus norvegicus) Am Zool, 19(2), 513-527. https://www.jstor.org/stable/3882331

      Shimoju, R., Shibata, H., Hori, M., & Kurosawa, M. (2020). Stroking stimulation of the skin elicits 50-kHz ultrasonic vocalizations in young adult rats. J Physiol Sci, 70(1), 41. https://doi.org/10.1186/s12576-020-00770-1

      Silkstone, M., & Brudzynski, S. M. (2019a). The antagonistic relationship between aversive and appetitive emotional states in rats as studied by pharmacologically-induced ultrasonic vocalization from the nucleus accumbens and lateral septum. Pharmacology Biochemistry and Behavior, 181, 77-85. https://doi.org/10.1016/j.pbb.2019.04.009

      Silkstone, M., & Brudzynski, S. M. (2019b). Intracerebral injection of R-(-)-Apomorphine into the nucleus accumbens decreased carbachol-induced 22-kHz ultrasonic vocalizations in rats. Behavioural Brain Research, 364, 264-273. https://doi.org/10.1016/j.bbr.2019.01.044

      Willey, A. R., & Spear, L. P. (2013). The effects of pre-test social deprivation on a natural reward incentive test and concomitant 50 kHz ultrasonic vocalization production in adolescent and adult male Sprague-Dawley rats. Behav Brain Res, 245, 107-112. https://doi.org/10.1016/j.bbr.2013.02.020

      Wöhr, M., Borta, A., & Schwarting, R. K. (2005). Overt behavior and ultrasonic vocalization in a fear conditioning paradigm: a dose-response study in the rat. Neurobiol Learn Mem, 84(3), 228-240. https://doi.org/10.1016/j.nlm.2005.07.004

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Additional considerations:

      The discussion of the "perfect fifth" and the proposition that this observation could be evidence of an evolutionary mechanism underlying it is rather far-fetched, especially for being presented in the Results section (with no supporting non-anecdotal evidence).

      Answer: We agree with the Reviewer #1. The text was modified, the word “evolutionary” was deleted. Instead, we expended on the possible reason for prevalence of the perfect fifth in the current version of the manuscript; we added that the prevalence of the perfect fifth: “could be explained by the observation that all physical objects capable of producing tonal sounds generate harmonic vibrations, the most prominent being the octave, perfect fifth, and major third (Christensen, 1993, discussed in Bowling and Purves, 2015).”

      It is not clear why Sprague-Dawleys were used as "receivers" in the playback experiment, when presumably the calls were recorded from Wistars and SHRs. While this does not critically impact the conclusions, within the species rats should be able to respond appropriately to calls made by rats of different genetic backgrounds, it adds an unnecessary source of variance.

      Answer: Sprague-Dawley rats were used to test another normotensive strain of rats. Regarding the Reviewer’s main point – we beg to differ as we think that it is worth testing playback stimuli in different strains. Diverging the stimuli between different rat strains would add unnecessary variance and it seemed logical to use the same recordings to test effects in different strains. Please note that finally, in spite of this additional variance, the results of both playback experiments are, in general, similar – which may point to a universal effect of 44-kHz playback across rat strains.

      It is pertinent to note that for the trace fear conditioning experiment, the rats had previously been exposed to a vocalization playback experiment. While such a pre-exposure is unlikely to be a very strong stressor, the possibility for it to influence the vocal behaviors of these rats in later experiments cannot be ruled out. It is also not clear what the control rats in this experiment experienced (home cage only?), nor what they were used for in analyses.

      Answer: In the current version of the manuscript, we have described in greater detail all the experiments performed and analyzed. We would like to emphasize that both delay and trace fear conditioning experiments with radiotelemetric transmitters were not performed specifically to elicit any particular response during fear conditioning, rather that our observation of 44-kHz vocalizations emerged as a result of re-examining the audio recordings. As a result, this work summarizes our observations of 44-kHz calls from several different experiments. It is relevant to note, that 44-kHz vocalizations were observed “in rats which were exposed to vocalization playback experiment”, in rats before the playback experiments as well as in naïve rats, without transmitters implemented, trained in fear conditioning (Tab. 1/Exp. 1-3).

      Our main message is that 44-kHz vocalizations were present in several experiments, with different conditions and subjects, while we are not attempting to compare in detail the results across the different experiments. In other words, we agree that pre-exposure to playback (and even more likely – transmitters implantation) could influence, but are not necessary, for 44-kHz ultrasonic emissions by the rats. To demonstrate this, we added a prolonged fear conditioning group with naïve Wistar rats (Exp. 3) to verify the emission of 44kHz calls in the absence of those experimental factors.

      We modified the methods section to clarify the circumstances under which these discoveries were made, such as including the information regarding the control rats in trace fear conditioning. In particular we mention that: “Control rats were subjected to the exact same procedures but did not receive the electric shock at the end of trace periods”.

      For Figure 1A-E, only example call distributions from individual rats are shown. It would perhaps be more informative to see the full data set displayed in this manner, with color/shape codes distinguishing individuals if desired.

      Answer: Please note the Fig. 1S1 shows more examples of ultrasonic call distribution. Showing all the data would make it more difficult to read and interpret. The problem is partly amended in Fig. 3A.

      It is not clear what is presented in Figure 2D vs. E, i.e. panel D is shown only for "selected rats" but the legend does not clarify how and why these rats were selected. It is also not clear why the legend reports p-values for both Friedman and Wilcoxon tests; the latter is appropriate for paired data which seems to be the case when the question is whether the call peak frequency alters across time, but the Friedman assumes non-paired input data.

      Answer: The question refers to the current Fig. 1S2C panel (former Fig. 2E panel) and the former Fig. 2D panel. The latter was not included in the current version of the manuscript, since both reviewers opposed the presentation of “selected rats” only (see above). The full description of the Fig. 1S2C panel is now in the results section together with p-values for Friedman and Wilcoxon test. We used the latter to investigate the difference between the first and the last ITI (selected paired data), while the Friedman to investigate the presence of change within the chain of ten ITI – since it is a suitable test for a difference between two or more paired samples.

      Reviewer #2 (Recommendations For The Authors):

      The weaknesses listed in the public review need to be addressed.

      Answer: We have done our best to address the weaknesses.

      Notes: 1) Page and line numbers would have been useful.

      Answer: We are including a separate manuscript version with page and line numbers.

      .(2) English language needs to be improved.

      Answer: The text has been checked by two native English speakers (one with a scientific background). Both only identified minor changes to improve the text which we applied.

      (3) I am a bit unsure whether the comment about the Star Wars movie (1997) and the Game of Thrones series (2011) is supposed to be a joke.

      Answer: These are indeed two genuine examples of the perfect fifth in human music that we hope are easily recognizable and familiar to readers. Parts of the same examples of the perfect fifth can also heard in the rat voice files provided.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you very much for the careful and positive reviews of our manuscript. We have addressed each comment in the attached revised manuscript. We describe the modifications below. To avoid confusion, we've changed supplementary figure and table captions to start with "Supplement Figure" and "Supplementary Table," instead of "Figure" and "Table."

      We have modified/added:

      ● Supplementary Table S1: AUC scores for the top 10 frequent epitope types (pathogens) in the testing set of epitope split.

      ● Supplementary Table S5: AUCs of TCR-epitope binding affinity prediction models with BLOSUM62 to embed epitope sequences.

      ● Supplementary Table S6: AUCs of TCR-epitope binding affinity prediction models trained on catELMo TCR embeddings and random-initialized epitope embeddings.

      ● Supplementary Table S7: AUCs of TCR-epitope binding affinity prediction models trained on catELMo and BLOSUM62 embeddings.

      ● Supplementary Figure 4: TCR clustering performance for the top 34 abundant epitopes representing 70.55% of TCRs in our collected databases.

      ● Section Discussion.

      ● Section 4.1 Data: TCR-epitope pairs for binding affinity prediction.

      ● Section 4.4.2 Epitope-specific TCR clustering.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors described a computational method catELMo for embedding TCR CDR3 sequences into numeric vectors using a deep-learning-based approach, ELMo. The authors applied catELMo to two applications: supervised TCR-epitope binding affinity prediction and unsupervised epitope-specific TCR clustering. In both applications, the authors showed that catELMo generated significantly better binding prediction and clustering performance than other established TCR embedding methods. However, there are a few major concerns that need to be addressed.

      (1) There are other TCR CDR3 embedding methods in addition to TCRBert. The authors may consider incorporating a few more methods in the evaluation, such as TESSA (PMCID: PMC7799492), DeepTCR (PMCID: PMC7952906) and the embedding method in ATM-TCR (reference 10 in the manuscript). TESSA is also the embedding method in pMTnet, which is another TCR-epitope binding prediction method and is the reference 12 mentioned in this manuscript.

      TESSA is designed for characterizing TCR repertoires, so we initially excluded it from the comparison. Our focus was on models developed specifically for amino acid embedding rather than TCR repertoire characterization. However, to address the reviewer's inquiry, we conducted further evaluations. Since both TESSA and DeepTCR used autoencoder-based models to embed TCR sequences, we selected one used in TESSA for evaluation in our downstream prediction task, conducting ten trials in total. It achieved an average AUC of 75.69 in TCR split and 73.3 in epitope split. Notably, catELMo significantly outperformed such performance with an AUC of 96.04 in TCR split and 94.10 in epitope split.

      Regarding the embedding method in ATM-TCR, it simply uses BLOSUM as an embedding matrix which we have already compared in Section 2.1. Furthermore, we have provided the comparison results between our prediction model trained on catELMo embeddings with the state-of-the-art prediction models such as netTCR and ATM-TCR in Table 6 of the Discussion section.

      (2) The TCR training data for catELMo is obtained from ImmunoSEQ platform, including SARS-CoV2, EBV, CMV, and other disease samples. Meanwhile, antigens related to these diseases and their associated TCRs are extensively annotated in databases VDJdb, IEDB and McPAS-TCR. The authors then utilized the curated TCR-epitope pairs from these databases to conduct the evaluations for eptitope binding prediction and TCR clustering. Therefore, the training data for TCR embedding may already be implicitly tuned for better representations of the TCRs used in the evaluations. This seems to be true based on Table 4, as BERT-Base-TCR outperformed TCRBert. Could catELMo be trained on PIRD as TCRBert to demonstrate catELMo's embedding for TCRs targeting unseen diseases/epitopes?

      We would like to note that catELMo was trained exclusively on TCR sequences in an unsupervised manner, which means it has never been exposed to antigen information. We also ensured that the TCRs used in catELMo's training did not overlap with our downstream prediction data. Please refer to the section 4.1 Data where we explicitly stated, “We note that it includes no identical TCR sequences with the TCRs used for training the embedding models.”. Moreover, the performance gap (~1%) between BERT-Base-TCR and TCRBert, as observed in Table 4, is relatively small, especially when compared to the performance difference (>16%) between catELMo and TCRBert.

      To further address this concern, we conducted experiments using the same number of TCRs, 4,173,895 in total, sourced exclusively from healthy ImmunoSeq repertoires. This alternative catELMo model demonstrated a similar prediction performance (based on 10 trials) to the one reported in our paper, with an average AUC of 96.35% in TCR split and an average AUC of 94.03% in epitope split.

      We opted not to train catELMo on the PIRD dataset for several reasons. First, approximately 7.8% of the sequences in PIRD also appear in our downstream prediction data, which could be a potential source of bias. Furthermore, PIRD encompasses sequences related to diseases such as Tuberculosis, HIV, CMV, among others, which the reviewer is concerned about.

      (3) In the application of TCR-epitope binding prediction, the authors mentioned that the model for embedding epitope sequences was catElMo, but how about for other methods, such as TCRBert? Do the other methods also use catELMo-embedded epitope sequences as part of the binding prediction model, or use their own model to embed the epitope sequences? Since the manuscript focuses on TCR embedding, it would be nice for other methods to be evaluated on the same epitope embedding (maybe adjusted to the same embedded vector length).

      Furthermore, the authors found that catELMo requires less training data to achieve better performance. So one would think the other methods could not learn a reasonable epitope embedding with limited epitope data, and catELMo's better performance in binding prediction is mainly due to better epitope representation.

      Review 1 and 3 have raised similar concerns regarding the epitope embedding approach employed in our binding affinity prediction models. We address both comments together on page 6 where we discuss the epitope embedding strategies in detail.

      (4) In the epitope binding prediction evaluation, the authors generated the test data using TCR-epitope pairs from VDJdb, IEDB, McPAS, which may be dominated by epitopes from CMV. Could the authors show accuracy categorized by epitope types, i.e. the accuracy for TCR-CMV pair and accuracy for TCR-SARs-CoV2 separately?

      The categorized AUC scores have been added in Supplementary Table 7. We observed significant performance boosts from catELMo compared with other embedding models.

      (5) In the unsupervised TCR clustering evaluation, since GIANA and TCRdist direct outputs the clustering result, so they should not be affected by hierarchical clusters. Why did the curves of GIANA and TCRdist change in Figure 4 when relaxing the hierarchical clustering threshold?

      For fair comparisons, we performed GIANA and TCRdist with hierarchical clustering instead of the nearest neighbor search. We have clarified it in the revised manuscript as follows.

      “Both methods are developed on the BLOSUM62 matrix and apply nearest neighbor search to cluster TCR sequences. GIANA used the CDR3 of TCRβ chain and V gene, while TCRdist predominantly experimented with CDR1, CDR2, and CDR3 from both TCRα and TCRβ chains. For fair comparisons, we perform GIANA and TCRdist only on CDR3 β chains and with hierarchical clustering instead of the nearest neighbor search.”

      (6 & 7) In the unsupervised TCR clustering evaluation, the authors examined the TCR related to the top eight epitopes. However, there are much more epitopes curated in VDJdb, IEDB and McPAS-TCR. In real application, the potential epitopes is also more complex than just eight epitopes. Could the authors evaluate the clustering result using all the TCR data from the databases? In addition to NMI, it is important to know how specific each TCR cluster is. Could the authors add the fraction of pure clusters in the results? Pure cluster means all the TCRs in the cluster are binding to the same epitope, and is a metric used in the method GIANA.

      We would like to note that there is a significant disparity in TCR binding frequencies across different epitopes in current databases. For instance, the most abundant epitope (KLGGALQAK) has approximately 13k TCRs binding to it, while 836 out of 982 epitopes are associated with fewer than 100 TCRs in our dataset. Furthermore, there are 9347 TCRs having the ability to bind multiple epitopes. In order to robustly evaluate the clustering performance, we originally selected the top eight frequent epitopes from McPAS and removed TCRs binding multiple epitopes to create a more balanced dataset.

      We acknowledge that the real-world scenario is more complex than just eight epitopes. Therefore, we conducted clustering experiments using the top most abundant epitopes whose combined cognate TCRs make up at least 70% of TCRs across three databases (34 epitopes). This is illustrated in Supplementary Figure 5. Furthermore, we extended our analysis by clustering all TCRs after filtering out those that bind to multiple epitopes, resulting in 782 unique epitopes. We found that catELMo achieved the 3rd and 2nd best performance in NMI and Purity, respectively (see Table below). These are aligned with our previous observations of the eight epitopes.

      Author response table 1.

      Reviewer #2 (Public Review):

      In the manuscript, the authors highlighted the importance of T-cell receptor (TCR) analysis and the lack of amino acid embedding methods specific to this domain. The authors proposed a novel bi-directional context-aware amino acid embedding method, catELMo, adapted from ELMo (Embeddings from Language Models), specifically designed for TCR analysis. The model is trained on TCR sequences from seven projects in the ImmunoSEQ database, instead of the generic protein sequences. They assessed the effectiveness of the proposed method in both TCR-epitope binding affinity prediction, a supervised task, and the unsupervised TCR clustering task. The results demonstrate significant performance improvements compared to existing embedding models. The authors also aimed to provide and discuss their observations on embedding model design for TCR analysis: 1) Models specifically trained on TCR sequences have better performance than models trained on general protein sequences for the TCR-related tasks; and 2) The proposed ELMo-based method outperforms TCR embedding models with BERT-based architecture. The authors also provided a comprehensive introduction and investigation of existing amino acid embedding methods. Overall, the paper is well-written and well-organized.

      The work has originality and has potential prospects for immune response analysis and immunotherapy exploration. TCR-epitope pair binding plays a significant role in T cell regulation. Accurate prediction and analysis of TCR sequences are crucial for comprehending the biological foundations of binding mechanisms and advancing immunotherapy approaches. The proposed embedding method presents an efficient context-aware mathematical representation for TCR sequences, enabling the capture and analysis of their structural and functional characteristics. This method serves as a valuable tool for various downstream analyses and is essential for a wide range of applications. Thank you.

      Reviewer #3 (Public Review):

      Here, the authors trained catElMo, a new context-aware embedding model for TCRβ CDR3 amino acid sequences for TCR-epitope specificity and clustering tasks. This method benchmarked existing work in protein and TCR language models and investigated the role that model architecture plays in the prediction performance. The major strength of this paper is comprehensively evaluating common model architectures used, which is useful for practitioners in the field. However, some key details were missing to assess whether the benchmarking study is a fair comparison between different architectures. Major comments are as follows:

      • It is not clear why epitope sequences were also embedded using catELMo for the binding prediction task. Because catELMO is trained on TCRβ CDR3 sequences, it's not clear what benefit would come from this embedding. Were the other embedding models under comparison also applied to both the TCR and epitope sequences? It may be a fairer comparison if a single method is used to encode epitope sequence for all models under comparison, so that the performance reflects the quality of the TCR embedding only.

      In our study, we indeed used the same embedding model for both TCRs and epitopes in each prediction model, ensuring a consistent approach throughout.

      Recognizing the importance of evaluating the impact of epitope embeddings, we conducted experiments in which we used BLOSUM62 matrix to embed epitope sequences for all models. The results (Supplementary Table 5) are well aligned with the performance reported in our paper. This suggests that epitope embedding may not play as critical a role as TCR embedding in the prediction tasks. To further validate this point, we conducted two additional experiments.

      Firstly, we used catELMo to embed TCRs while employing randomly initialized embedding matrices with trainable parameters for epitope sequences. It yielded similar prediction performance as when catELMo was used for both TCR and epitope embedding (Supplementary Table 6). Secondly, we utilized BLOSUM62 to embed TCRs but employed catELMo for epitope sequence embedding, resulting in performance comparable to using BLOSUM62 for both TCRs and epitopes (Supplementary Table 4). These experiment results confirmed the limited impact of epitope embedding on downstream performance.

      We conjecture that these results may be attributed to the significant disparity in data scale between TCRs (~290k) and epitopes (less than 1k). Moreover, TCRs tend to exhibit high similarity, whereas epitopes display greater distinctiveness from one another. These features of TCRs require robust embeddings to facilitate effective separation and improve downstream performance, while epitope embedding primarily serves as a categorical encoding.

      We have included a detailed discussion of these findings in the revised manuscript to provide a comprehensive understanding of the role of epitope embeddings in TCR binding prediction.

      • The tSNE visualization in Figure 3 is helpful. It makes sense that the last hidden layer features separate well by binding labels for the better performing models. However, it would be useful to know if positive and negative TCRs for each epitope group also separate well in the original TCR embedding space. In other words, how much separation between these groups is due to the neural network vs just the embedding?

      It is important to note that we used the same downstream prediction model, a simple three-linear-layer network, for all the discussed embedding methods. We believe that the separation observed in the t-SNE visualization effectively reflects the ability of our embedding model. Also, we would like to mention that it can be hard to see a clear distinction between positive and negative TCRs in the original embedding space because embedding models were not trained on positive/negative labels. Please refer to the t-SNE of the original TCR embeddings below.

      Author response image 1.

      • To generate negative samples, the author randomly paired TCRs from healthy subjects to different epitopes. This could produce issues with false negatives if the epitopes used are common. Is there an estimate for how frequently there might be false negatives for those commonly occurring epitopes that most populations might also have been exposed to? Could there be a potential batch effect for the negative sampled TCR that confounds with the performance evaluation?

      Thank you for bringing this valid and interesting point up. Generating negative samples is non-trivial since only a limited number of non-binding TCR-pairs are publicly available and experimentally validating non-binding pairs is costly [1]. Standard practices for generating negative pairs are (1) paring epitopes with healthy TCRs [2, 3], and (2) randomly shuffling existing TCR-epitope pairs [4,5]. We used both approaches (the former included in the main results, and the latter in the discussion). In both scenarios, catELMo embeddings consistently demonstrated superior performance.

      We acknowledge the possibility of false negatives due to the finite-sized TCR database from which we randomly selected TCRs, however, we believe that the likelihood of such occurrences is low. Given the vast diversity of human TCR clonotypes, which can exceed 10^15[6], the chance of randomly selecting a TCR that specifically recognizes a target epitope is relatively small.

      In order to investigate the batch effect, we generated new negative pairs using different seeds and observed consistent prediction performance across these variations. However, we agree that there could still be a potential batch effect for the negative samples due to potential data bias.

      We have discussed the limitation of generative negative samples in the revised manuscript.

      • Most of the models being compared were trained on general proteins rather than TCR sequences. This makes their comparison to catELMO questionable since it's not clear if the improvement is due to the training data or architecture. The authors partially addressed this with BERT-based models in section 2.4. This concern would be more fully addressed if the authors also trained the Doc2vec model (Yang et al, Figure 2) on TCR sequences as baseline models instead of using the original models trained on general protein sequences. This would make clear the strength of context-aware embeddings if the performance is worse than catElmo and BERT.

      We agree it is important to distinguish between the effects of training data and architecture on model performance.

      In Section 2.4, as the reviewer mentioned, we compared catELMo with BERT-based models trained on the same TCR repertoire data, demonstrating that architecture plays a significant role in improving performance. Furthermore, in Section 2.5, we compared catELMo-shallow with SeqVec, which share the same architecture but were trained on different data, highlighting the importance of data on the model performance.

      To further address the reviewer's concern, we trained a Doc2Vec model on the TCR sequences that have been used for catELMo training. We observed significantly lower prediction performance compared to catELMo, with an average AUC of 50.24% in TCR split and an average AUC of 51.02% in epitope split, making the strength of context-aware embeddings clear.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It is known that TRB CDR3, the CDR1, CDR2 on TRBV gene and the TCR alpha chain also contribute to epitope recognition, but were not modeled in catELMo. It would be nice for the authors to add this as a current limitation for catELMo in the Discussion section.

      We have discussed the limitation in the revised manuscript.

      “Our study focuses on modeling the TCRβ chain CDR3 region, which is known as the primary determinant of epitope binding. Other regions, such as CDR1 and CDR2 on the TRB V gene, along with the TCRα chain, may also contribute to specificity in antigen recognition. However, a limited number of available samples for those additional features can be a challenge for training embedding models. Future work may explore strategies to incorporate these regions while mitigating the challenges of working with limited samples.”

      (2) I tried to follow the instructions to train a binding affinity prediction model for TCR-epitope pairs, however, the cachetools=5.3.0 seems could not be found when running "pip install -r requirements.txt" in the conda environment bap. Is this cachetools version supported after Python 3.7 so the Python 3.6.13 suggested on the GitHub repo might not work?

      This has been fixed. We have updated the README.md on our github page.

      Reviewer #2 (Recommendations For The Authors):

      The article is well-constructed and well-written, and the analysis is comprehensive.

      The comments for minor issues that I have are as follows:

      (1) In the Methods section, it will be clearer if the authors interpret more on how the standard deviation is calculated in all tables. How to define the '10 trials'? Are they based on different random training and test set splits?

      ‘10 trials' refers to the process of splitting the dataset into training, validation, and testing sets using different seeds for each trial. Different trials have different training, validation, and testing sets. For each trial, we trained a prediction model on its training set and measured performance on its testing set. The standard deviation was calculated from the 10 measurements, estimating model performance variation across different random splits of the data.

      (2) The format of AUCs and the improvement of AUCs need to be consistent, i.e., with the percent sign.

      We have updated the format of AUCs.

      Reviewer #3 (Recommendations For The Authors):

      In addition to the recommendations in the public review, we had the following more minor questions and recommendations:

      • Could you provide some more background on the data, such as overlaps between the databases, and how the training and validation split was performed between the three databases? Also summary statistics on the length of TCR and epitope sequence data would be helpful.

      We have provided more details about data in our revision.

      • Could you comment on the runtime to train and embed using the catELMo and BERT models?

      Our training data is TCR sequences with relatively short lengths (averaging less than 20 amino acid residues). Such characteristic significantly reduces the computational resources required compared to training large-scale language models on extensive text corpora. Leveraging standard machines equipped with two GeForce RTX 2080 GPUs, we were able to complete the training tasks within a matter of days. After training, embedding one sequence can be accomplished in a matter of seconds.

      • Typos and wording:

      • Table 1 first row of "source": "immunoSEQ" instead of "immuneSEQ"

      This has been corrected.

      • L23 of abstract "negates the need of complex deep neural network architecture" is a little confusing because ELMo itself is a deep neural network architecture. Perhaps be more specific and add that the need is for downstream tasks.

      We have made it more specific in our abstract.

      “...negates the need for complex deep neural network architecture in downstream tasks.”

      References

      (1) Montemurro, Alessandro, et al. "NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data." Communications biology 4.1 (2021): 1060.

      (2) Jurtz, Vanessa Isabell, et al. "NetTCR: sequence-based prediction of TCR binding to peptide-MHC complexes using convolutional neural networks." BioRxiv (2018): 433706.

      (3) Gielis, Sofie, et al. "Detection of enriched T cell epitope specificity in full T cell receptor sequence repertoires." Frontiers in immunology 10 (2019): 2820.

      (4) Cai, Michael, et al. "ATM-TCR: TCR-epitope binding affinity prediction using a multi-head self-attention model." Frontiers in Immunology 13 (2022): 893247.

      (5) Weber, Anna, et al. "TITAN: T-cell receptor specificity prediction with bimodal attention networks." Bioinformatics 37 (2021): i237-i244.

      (6) Lythe, Grant, et al. "How many TCR clonotypes does a body maintain?." Journal of theoretical biology 389 (2016): 214-224.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments. We were pleased that they thought our study was "well crafted and written", "important", and that it provides a "valuable resource for researchers studying color vision". They also expressed several constructive criticisms, concerning – among other things – the lack of details regarding experimental procedures and analysis, the challenge in relating retinal data to cortical recordings, and consistency of results across animals. In response to the reviewers’ comments and following their suggestions, we performed additional analyses, and substantially revised the paper:

      We added a section in the Discussion about "Limitations of the stimulus paradigm". In addition, we added a new Suppl. Figure that illustrates the effect of deconvolution of calcium traces on our results and clarified in the text why we use deconvolved signals for all analyses. The new Suppl. Figure also shows an additional analysis with a more conservative threshold of neuron exclusion.

      We now clarify how retinal signaling relates to our cortical results and rewrote the text to be more conservative regarding our conclusions.

      In addition, we added a new Suppl. Figure showing the key analyses from Figures 2 and 4 separately for each animal. We now mention consistency across animals in the Results section and clearly state which analyses were performed an data pooled across animals.

      We are positive that these additions address the issues raised by the reviewers. Please find our point-by-point replies to all comments below.

      eLife assessment

      Franke et al. explore and characterize the color response properties in the mouse primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The data is solid; however, the evidence supporting some conclusions and details about some procedures are incomplete. In its current form, the paper makes a useful contribution to how color is coded in mouse V1. Significance would be enhanced with some additional analyses and resolution of some technical issues.

      We thank the reviewers for appreciating our manuscript and their thoughtful comments.

      Referee 1 (Remarks to the Author):

      Summary:

      In this study, Franke et al. explore and characterize the color response properties across the primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The authors use awake-behaving 2P imaging to define the spectral response properties of visual interneurons in Layer 2/3. They find that opponent responses are more prominent at photopic light levels, and diversity in color opponent responses exists across the visual science, with green ON/ UV OFF responses being stronger represented in the upper visual field. This is argued to be relevant for detecting certain features that are more salient when the chromatic space is used, possibly due to noise reductions.

      Strengths:

      The work is well crafted and written and provides a thorough characterization that reveals an uncharacterized diversity of visual properties in V1. I find this characterization important because it reveals how strongly chromatic information can modulate the response properties in V1. In the upper visual field, 25% of the cells differentially relay chromatic information, and one may wonder how this information will be integrated and subsequently used to aid vision beyond the detection of color per see. I personally like the last paragraph of the discussion that highlights this fact.

      We thank the reviewer for appreciating our manuscript.

      Weaknesses: One major point highlighted in this paper is the fact that Green ON/UV OFF responses are not generated in the retina. But glancing through the literature, I saw this is not necessarily true. Fig 1. of Joesch and Meister, a paper cited, shows this can be the case. Thus, I would not emphasize that this wasn’t present in the retina. This is a minor point, but even if the retina could not generate these signals, I would be surprised if the diversity of responses would only arise through feed-forward excitation, given the intricacies of cortical connectivity. Thus, I would argue that the argument holds for most of the responses seen in V1; they need to be further processed by cortical circuitries.

      We thank the reviewer for this comment. When analyzing available data from the retina using a similar center-surround color flicker stimulus (Szatko et al. 2020), we found that Green On/UV Off color opponency is very rare in the RF center of retinal ganglion cells (Suppl. Fig. 5). This suggests that center Green On/UV Off color opponency in V1 neurons is not inherited by the RF center of retinal neurons. However, we agree with the reviewer that retinal neurons might still contribute to V1 color opponency, for example by being center-surround color opponent (e.g. Joesch et al. 2016 and Szatko et al. 2020). We rephrased the text to acknowledge this fact.

      This takes me to my second point, defining center and surround. The center spot is 37.5 deg of visual angle, more than 1 mm of the retinal surface. That means that all retinal cells, at least half and most likely all of their surrounds will also be activated. Although 37.5 deg is roughly the receptive field size previously determined for V1 neurons, the one-to-one comparison with retinal recording, particularly with their center/surround properties, is difficult. This should be discussed. I assume that the authors tried a similar approach with sparse or dense checker white noise stimuli. If so, it would be interesting if there were better ways of defining the properties of V1 neurons on their complex/simple receptive field properties to define how much of their responses are due to an activation of the true "center" or a coactivation of the surround. Interestingly, at least some of the cells (Fig. 1d, cells 2 and 5) don’t have a surround. Could it be that in these cases, the "center" and "surround" are being excited together? How different would the overall statistics change if one used a full-filed flicker stimulus instead of a center/surround stimulus? How stable are the results if the center/surround flicker stimulus is shifted? These results won’t change the fact that chromatic coding is present in the VC and that there are clear differences depending on their position, but it might change the interpretation. Thus, I would encourage you to test these differences and discuss them.

      Thanks for this comment. We agree with the reviewer that a one-to-one comparison of retina and V1 data is challenging, due to differences in both RF and stimulus size. We rephrased the Results text to clarify this point and now also mention it in the Discussion.

      To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons. As the reviewer mentions, the disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we used the following steps:

      For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      Together, we believe these points strongly suggest that the center spot and the surround annulus of the noise stimulus predominantly drive center (i.e. classical RF) and surround (i.e. extraclassical RF), respectively, of the recorded V1 neurons. This is further supported by the fact that color response types identified using an automated clustering method were robust across mice (Suppl. Fig. 6c), indicating consistent stimulus centering.

      Nevertheless, we cannot exclude that the stimulus was misaligned for a subset of the recorded neurons used for analysis. We agree with the reviewer that such misalignment might have contributed to cells not having surround STAs, due to simultaneous activation of antagonistic center and surround RF components by the surround stimulus. While a full-field stimulus would get rid of the misalignment problem, it would not allow to study color tuning in center and surround RF components separately. Instead, one could compare the results of our approach with an approach that centers the stimulus on individual neurons. However, we believe that performing these additional experiments is out of the scope of the current study.

      To acknowledge the experimental limitations of our study and the concerns brought up by the reviewer, we now explicitly mention the steps we perform to reduce the effects of stimulus misalignment in the Results section and discuss the problem of stimulus alignment in the Discussion. We believe these changes will help the reader to interpret our results.

      Referee 2 (Remarks to the Author):

      Summary:

      Franke et al. characterize the representation of color in the primary visual cortex of mice and how it changes across the visual field, with a particular focus on how this may influence the ability to detect aerial predators. Using calcium imaging in awake, head-fixed mice, they characterize the properties of V1 neurons (layer 2/3) using a large center-surround stimulation where green and ultra-violet were presented in random combinations. Using a clustering approach, a set of functional cell-types were identified based on their preference to different combinations of green and UV in their center and surround. These functional types were demonstrated to have varying spatial distributions in V1, including one neuronal type (Green-ON/UV-OFF) that was much more prominent in the posterior V1 (i.e. upper visual field). Modelling work suggests that these neurons likely support the detection of predator-like objects in the sky.

      Strengths:

      The large-scale single-cell resolution imaging used in this work allows the authors to map the responses of individual neurons across large regions of the visual cortex. Combining this large dataset with clustering analysis enabled the authors to group V1 neurons into distinct functional cell types and demonstrate their relative distribution in the upper and lower visual fields. Modelling work demonstrated the different capacity of each functional type to detect objects in the sky, providing insight into the ethological relevance of color opponent neurons in V1.

      We thank the reviewer for appreciating our manuscript.

      Weaknesses:

      While the study presents solid evidence a few weaknesses exist, including the size of the dataset, clarity regarding details of data included in each step of the analysis and discussion of caveats of the work. The results presented here are based on recordings of 3 mice. While the number of neurons recorded is reasonably large (n > 3000) an analysis that tests for consistency across animals is missing. Related to this, it is unclear how many neurons at each stage of the analysis come from the 3 different mice (except for Suppl. Fig 4).

      Thank you for this comment. We apologize that the original manuscript did not clearly indicate the consistency of our results across animals. We have revised the manuscript in the following ways:

      We have added an additional Suppl. Figure, which shows the variability of the data within and across animals (Suppl. Fig. 4). Specifically, we show the distribution of color and luminance selectivity for (i) center and surround components of V1 RFs and (ii) for upper and lower visual field. This data is used for all analyses shown in Figures 2-4. The figure legend of this figure also states the number of neurons per animal.

      We now clearly state in the Results section that all analyses in the main figures were performed by pooling data across animals, and refer to the Suppl. Figures for consistency across animals.

      We believe these changes help the reader to interpret our results.

      Finally, the paper would greatly benefit from a more in depth discussion of the caveats related to the conclusion drawn at each stage of the analysis. This is particularly relevant regarding the caveats related to using spike triggered averages to assess the response preferences of ON-OFF neurons, and the conclusions drawn about the contribution of retinal color opponency.

      Thanks. We substantially revised the text to discuss caveats and limitations of the approach. For example, we added a section into the Discussion called "Limitations of the stimulus paradigm". In addition, we clarified how retinal signals relate to cortical ones and phrased our conclusions more conservatively.

      The authors provide solid evidence to support an asymmetric distribution of color opponent cells in V1 and a reduced color contrast representation in lower light levels. Some statements would benefit from more direct evidence such as the integration of upstream visual signals for color opponency in V1.

      Based on the comments from Reviewer 1, we have rephrased the statements regarding the integration of upstream visual signals for color opponency in V1. We think these revisions increase the clarity of the results and help the reader with interpretation.

      Overall, this study will be a valuable resource for researchers studying color vision, cortical processing, and the processing of ethologically relevant information. It provides a useful basis for future work on the origin of color opponency in V1 and its ethological relevance.

      Thanks! We thank the reviewer again for the helpful comments.

      Referee 3 (Remarks to the Author):

      This paper studies chromatic coding in mouse primary visual cortex. Calcium responses of a large collection of cells are measured in response to a simple spot stimulus. These responses are used to estimate chromatic tuning properties - specifically sensitivity to UV and green stimuli presented in a large central spot or a larger still surrounding region. Cells are divided based on their responses to these stimuli into luminance or chromatic sensitive groups. Several technical concerns limit how clearly the data support the conclusions. If these issues can be fixed, the paper would make a valuable contribution to how color is coded in mouse V1.

      We thank the reviewer for the helpful comments.

      Analysis: The central tool used to analyze the data is a "spike triggered average" of the responses to randomly varying stimuli. There are several steps in this analysis that are not documented, and hence evaluating how well it works is difficult. Central to this is that the paper does not measure spikes. Instead, measured calcium traces are converted to estimated spike rates, which are then used to estimate STAs. There are no raw calcium traces shown, and the approach to estimate spike rates is not described in any detail. Confirming the accuracy of these steps is essential for a reader to be able to evaluate the paper. Further, it is not clear why the linear filters connecting the recorded calcium traces and the stimulus cannot be estimated directly, without the intermediate step of estimating spike rates.

      Thank you for this comment. We have used the genetically encoded calcium sensor GCaMP6s in our recordings. This sensor is a very sensitive GCaMP6 variant, but also one with slow kinetics. To remove the effect of the slow sensor kinetics from recorded calcium responses, the recorded traces are commonly deconvolved with the impulse function of the sensor to obtain the deconvolved calcium traces. We now include this reasoning in the Results section. To illustrate the effect of the deconvolution, we added a new Suppl. Figure (Suppl. Fig. 2) showing raw calcium and deconvolved traces, and the STAs estimated from both types of traces. This illustrates that the results regarding neuronal color preferences are consistent across raw and deconvolved calcium traces.

      We agree with the reviewer that the term STA might be confusing. We have replaced it with the term "even-triggered-average" (ETA). In addition, we have replaced the phrase "estimated spike rate" with "deconvolved calcium trace" throughout the manuscript because the unit of the deconvolved traces is not interpretable, like spike rate would be (spikes per second). In the revised version, we now clarify in the Methods section that we estimate the ETAs based on deconvolved calcium traces, which is correlated with and an approximation for spike rate.

      A further issue about the STAs is that the inclusion criterion (correlation of predicted vs measured responses of 0.25) is pretty forgiving. It would be helpful to see a distribution of those correlation values, and some control analyses to check whether the STA is providing a sufficiently accurate measure to support the results (e.g. do the central results hold for the cells with the highest correlations).

      We thank the reviewer for this comment. To exclude noisy neurons from analysis, we used the following procedure:

      For each of the four stimulus conditions (center and surround for green and UV stimuli), kernel quality was measured by comparing the variance of the STA with the variance of the baseline, defined as the first 500 ms of the STA. Only cells with at least 10-times more variance of the kernel compared to baseline for UV or green center STA were considered for further analysis.

      We have added the distribution of quality values to a new Suppl. Figure (Suppl. Fig. 2d,e). We now also show the percentage of neurons above threshold, given different quality thresholds. Finally, we have repeated the analysis shown in Figure 2 for a much more conservative threshold, including only the top 25% of neurons (Suppl. Fig. 2e,f). We now mention this new analysis in the Methods and Results section.

      Limitations of stimulus choice: The paper relies on responses to a large (37.5 degree diameter) modulated spot and surrounding region. This spot is considerably larger than the receptive fields of both V1 cells and retinal ganglion cells. As a result, the spot itself is very likely to strongly activate both center and surround mechanisms, and responses of cells are likely to depend on where the receptive fields are located within the spot (and, e.g., how much of the true neural surround samples the center spot vs the surround region). The impact of these issues on the conclusions is considered briefly at the start of the results but needs to be evaluated in considerably more detail. This is particularly true for retinal ganglion cells given the size of their receptive fields (see also next point).

      We agree with the reviewer that the centering of the stimulus is critical and apologize if this point was not discussed sufficiently. To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons. As the reviewer mentions, the disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we have used different experimental and analysis steps and controls (see also second comment of Reviewer 1):

      For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      We now mention those clearly in the Results section and added the limitations of our approach to the Discussion section.

      Comparison with retina: A key conclusion of the paper is that the chromatic tuning in V1 is not inherited from retinal ganglion cells. This conclusion comes from comparing chromatic tuning in a previously-collected data set from retina with the present results. But the retina recordings were made using a considerably smaller spot, and hence it is not clear that the comparison made in the paper is accurate. This issue may be handled by the analysis presented in the paper, but if so it needs to be described more clearly. The paper from which the retina data is taken argues that rod-cone chromatic opponency originates largely in the outer retina. This mechanism would be expected to be shared across retinal outputs. Thus it is not clear how the Green-On/UV-Off vs Green-Off/UV-On asymmetry could originate. This should be discussed.

      We agree with the reviewer that a one-to-one comparison of retina and V1 data is challenging, due to differences in both RF and stimulus size. We rephrased the Results text to clarify this point and now also mention it in the Discussion.

      When analyzing available data from the retina using a similar center-surround color flicker stimulus (Szatko et al. 2020), we found that Green On/UV Off color opponency is very rare in the RF center of retinal ganglion cells (Suppl. Fig. 5). This suggests that center Green On/UV Off color opponency in V1 neurons is not inherited by the RF center of retinal neurons. However, we agree with the reviewer that retinal neurons might still contribute to V1 color opponency, for example by being center-surround color opponent (e.g. Joesch et al. 2016 and Szatko et al. 2020). We rephrased the text to acknowledge this fact.

      Residual chromatic cells at low mesopic light levels The presence of chromatically tuned cells at the lowest light level probed is surprising. The authors describe these conditions as rod-dominated, in which case chromatic tuning should not be possible. This again is discussed only briefly. It either reflects the presence of an unexpected pathway that amplifies weak cone signals under low mesopic conditions such that they can create spectral opponency or something amiss in the calibrations or analysis. Data collected at still lower light levels would help resolve this.

      Thank you for this comment. We call the lowest light level "low mesopic" and "rod-dominated" because the spectral contrast of V1 center responses in posterior recording fields is green-shifted for this light level (Fig. 3a). This is only expected if responses in the UV-cone dominant ventral retina are predominantly driven by rod photoreceptors. We now explain this rationale in the Results section. In addition, we mention in the Discussion that future studies are required to test whether cone signals need to be amplified for low light levels. While we agree with the reviewer that it would be exciting to use even lower light levels during recordings, we believe this is out of the scope of the current study due to the technical challenges involved in achieving scotopic stimulation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      As you will see, the main changes in the revised manuscript pertain to the structure and content of the introduction. Specifically, we have tried to more clearly introduce our paradigm, the rationale behind the paradigm, why it is different from learning paradigms, and why we study “relief”.

      In this rebuttal letter, we will go over the reviewers’ comments one-by-one and highlight how we have adapted our manuscript accordingly. However, because one concern was raised by all reviewers, we will start with an in-depth discussion of this concern.

      The shared concern pertained to the validity of the EVA task as a model to study threat omission responses. Specifically, all reviewers questioned the effectivity of our so-called “inaccurate”, “false” or “ruse” instructions in triggering an equivalent level of shock expectancy, and relatedly, how this effectivity was affected by dynamic learning over the course of the task.

      We want to thank the reviewers for raising this important issue. Indeed, it is a vital part of our design and it therefore deserves considerable attention. It is now clear to us that in the previous version of the manuscript we may have focused too little on why we moved away from a learning paradigm, and how we made sure that the instructions were successful at raising the necessary expectations; and how the instructions were affected by learning. We believe this has resulted in some misunderstandings, which consequently may have cast doubts on our results. In the following sections, we will go into these issues.

      The rationale behind our instructed design

      The main aim of our study was to investigate brain responses to unexpected omissions of threat in greater detail by examining their similarity to the reward prediction error axioms (Caplin & Dean, 2008), and exploring the link with subjective relief. Specifically, we hypothesized that omission-related responses should be dependent on the probability and the intensity of the expected-but-omitted aversive event (i.e., electrical stimulation), meaning that the response should be larger when the expected stimulation was stronger and more expected, and that fully predicted outcomes should not trigger a difference in responding.

      To this end, we required that participants had varying levels of threat probability and intensity predictions, and that these predictions would most of the time be violated. Although we fully agree with the reviewers that fear conditioning and extinction paradigms can provide an excellent way to track the teaching properties of prediction error responses (i.e., how they are used to update expectancies on future trials), we argued that they are less suited to create the varying probability and intensity-related conditions we required (see Willems & Vervliet, 2021). Specifically, in a standard conditioning task participants generally learn fast, rendering relatively few trials on which the prediction is violated. As a result, there is generally little intraindividual variability in the prediction error responses. This precludes an in-depth analysis of the probability-related effects. Furthermore, conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted. As a result, intensity-related effects cannot be tested. Finally, because CS-US contingencies change over the course of a fear conditioning and extinction study (e.g. from acquisition to extinction), there is never complete certainty about when the US will (not) follow. This precludes a direct comparison of fully predicted outcomes.

      Another added value of studying responses to the prediction error at threat omission outside a learning context is that it can offer a way to disentangle responses to the violation of threat expectancy, with those of subsequent expectancy updating.

      Also note that Rutledge and colleagues (2010), who were the first to show that human fMRI responses in the Nucleus Accumbens comply to the reward prediction error axioms also did not use learning experiences to induce expectancy. In that sense, we argued it was not necessary to adopt a learning paradigm to study threat omission responses.

      Adaptations in the revised manuscript: We included two new paragraphs in the introduction of the revised manuscript to elaborate on why we opted not to use a learning paradigm in the present study (lines 90-112).

      “However, is a correlation with the theoretical PE over time sufficient for neural activations/relief to be classified as a PE-signal? In the context of reward, Caplin and colleagues proposed three necessary and sufficient criteria all PE-signals should comply to, independent of the exact operationalizations of expectancy and reward (the socalled axiomatic approach24,25; which has also been applied to aversive PE26–28). Specifically, the magnitude of a PE signal should: (1) be positively related to the magnitude of the reward (larger rewards trigger larger PEs); (2) be negatively related to likelihood of the reward (more probable rewards trigger smaller PEs); and (3) not differentiate between fully predicted outcomes of different magnitudes (if there is no error in prediction, there should be no difference in the PE signal).”

      “It is evident that fear conditioning and extinction paradigms have been invaluable for studying the role of the threat omission PE within a learning context. However, these paradigms are not tailored to create the varying intensity and probability-related conditions that are required to evaluate the threat omission PE in the light of the PE axioms. First, conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted. As a result, the magnitude-related axiom cannot be tested. Second, in conditioning tasks people generally learn fast, rendering relatively few trials on which the prediction is violated. As a result, there is generally little intra-individual variability in the PE responses. Moreover, because of the relatively low signal to noise ratio in fMRI measures, fear extinction studies often pool across trials to compare omission-related activity between early and late extinction16, which further reduces the necessary variability to properly evaluate the probability axiom. Third, because CS-US contingencies change over the course of the task (e.g. from acquisition to extinction), there is never complete certainty about whether the US will (not) follow. This precludes a direct comparison of fully predicted outcomes. Finally, within a learning context, it remains unclear whether PErelated responses are in fact responses to the violation of expectancy itself, or whether they are the result of subsequent expectancy updating.”

      Can verbal instructions be used to raise the expectancy of shock?

      The most straightforward way to obtain sufficient variability in both probability and intensityrelated predictions is by directly providing participants with instructions on the probability and intensity of the electrical stimulation. In a previous behavioral study, we have shown that omission responses (self-reported relief and omission SCR) indeed varied with these instructions (Willems & Vervliet, 2021). In addition, the manipulation checks that are reported in the supplemental material provided further support that the verbal instructions were effective at raising the associated expectancy of stimulation. Specifically, participants recollected having received more stimulations after higher probability instructions (see Supplemental Figure 2). Furthermore, we found that anticipatory SCR, which we used as a proxy of fearful expectation, increased with increasing probability and intensity (see Supplemental Figure 3). This suggests that it is not necessary to have expectation based on previous experience if we want to evaluate threat omission responses in the light of the prediction error axioms.

      Adaptations in the revised manuscript: We more clearly referred to the manipulation checks that are presented in the supplementary material in the results section of the main paper (lines 135-141).

      “The verbal instructions were effective at raising the expectation of receiving the electrical stimulation in line with the provided probability and intensity levels. Anticipatory SCR, which we used as a proxy of fearful expectation, increased as a function of the probability and intensity instructions (see Supplementary Figure 3). Accordingly, post-experimental questions revealed that by the end of the experiment participants recollected having received more stimulations after higher probability instructions, and were willing to exert more effort to prevent stronger hypothetical stimulations (see Supplementary Figure 2).”

      How did the inconsistency between the instructed and experienced probability impact our results?

      All reviewers questioned how the inconsistency between the instructed and experienced probability might have impacted the probability-related results. However, judging from the way the comments were framed, it seems that part of the concern was based on a misunderstanding of the design we employed. Specifically, reviewer 1 mentions that “To ensure that the number of omissions is similar across conditions, the task employs inaccurate verbal instructions; I.e., 25% of shocks are omitted regardless of whether subjects are told that the probability is 100%, 75%, 50%, 25%, 0%.”, and reviewer 3 states that “... the fact remains that they do not get shocks outside of the 100% probability shock. So learning is occurring, at least for subjects who realize the probability cue is actually a ruse.” We want to emphasize that this was not what we did, and if it were true, we fully agree with the reviewers that it would have caused serious trust- and learning related issues, given that it would be immediately evident to participants that probability instructions were false. It is clear that under such circumstances, dynamic learning would be a big issue.

      However, in our task 0% and 100% instructions were always accurate. This means that participants never received a stimulus following 0% instructions and always received the stimulation of the given intensity on the 100% instructions (see Supplemental Figure 1 for an overview of the trial types). Only for the 25%, 50% and 75% trials an equal reinforcement rate (25%) was maintained, meaning that the stimulation followed in 25% of the trials, irrespective of whether a 25%, 50% or 75% instruction was given. The reason for this was that we wanted to maximize and balance the number of omission trials across the different probability levels, while also keeping the total number of presentations per probability instruction constant. We reasoned that equating the reinforcement rate across the 25%, 50% and 75% instructions should not be detrimental, because (1) in these trials there was always the possibility that a stimulation would follow; and (2) we instructed the participants that each trial is independent of the previous ones, which should have discouraged them to actively count the number of shocks in order to predict future shocks.

      Adaptations in the revised manuscript: We have tried to further clarify the design in several sections of the manuscript, including the introduction (lines 121-125), results (line 220) and methods (lines 478-484) sections:

      Adaptation in the Introduction section: “Specifically, participants received trial-by-trial instructions about the probability (0%, 25%, 50%, 75% and 100%) and intensity (weak, moderate, strong) of a potentially painful upcoming electrical stimulation, time-locked by a countdown clock (see Fig.1A). While stimulations were always delivered on 100% trials and never on 0% trials, most of the other trials (25%-75%) did not contain the expected stimulation and hence provoked an omission PE.”

      Adaptation in the Results section: “Indeed, the provided instructions did not map exactly onto the actually experienced probabilities, but were all followed by stimulation in 25% on the trials (except for the 0% trials and the 100% trials).”

      Adaptation in the Methods section: “Since we were mainly interested in how omissions of threat are processed, we wanted to maximize and balance the number of omission trials across the different probability and intensity levels, while also keeping the total number of presentations per probability and intensity instruction constant. Therefore, we crossed all non-0% probability levels (25, 50, 75, 100) with all intensity levels (weak, moderate, strong) (12 trials). The three 100% trials were always followed by the stimulation of the instructed intensity, while stimulations were omitted in the remaining nine trials. Six additional trials were intermixed in each run: Three 0% omission trials with the information that no electrical stimulation would follow (akin to 0% Probability information, but without any Intensity information as it does not apply); and three trials from the Probability x Intensity matrix that were followed by electrical stimulation (across the four runs, each Probability x Intensity combination was paired at least once, and at most twice with the electrical stimulation).”

      Could the incongruence between the instructed and experienced reinforcement rate have detrimental effects on the probability effect? We agree with reviewer 2 that it is possible that the inconsistency between instructed and experienced reinforcement rates could have rendered the exact probability information less informative to participants, which might have resulted in them paying less attention to the probability information whenever the probability was not 0% or 100%. This might to some extent explain the relatively larger difference in responding between 0% and 25% to 75% trials, but the relatively smaller differences between the 25% to 75% trials.

      However, there are good reasons to believe that the relatively smaller difference between 25% to 75% trials was not caused by the “inaccurate” nature of our instructions, but is inherent to “uncertain” probabilities.

      We added a description of these reasons to the supplementary materials in a supplementary note (supplementary note 4; lines 97-129 in supplementary materials), and added a reference to this note in the methods section (lines 488-490).

      “Supplementary Note 4: “Accurate” probability instructions do not alter the Probability-effect

      A question that was raised by the reviewers was whether the inconsistency between the probability instruction and the experienced reinforcement rate could have detrimental effects on the Probability-related results; especially because the effect of Probability was smaller when only including non-0% trials.

      However, there are good reasons to believe that the relatively smaller difference between 25% to 75% trials was not caused by the “inaccurate” nature of our instructions, but that they are inherent to “uncertain” probabilities.

      First, in a previously unpublished pilot study, we provided participants with “accurate” probability instructions, meaning that the instruction corresponded to the actual reinforcement rate (e.g., 75% instructions were followed by a stimulation in 75% of the trials etc.). In line with the present results and our previous behavioral study (Willems & Vervliet, 2021), the results of this pilot (N = 20) showed that the difference in the reported relief between the different probability levels was largest when comparing 0% and the rest (25%, 50% and 75%). Furthermore the overall effect size of Probability (excluding 0%) matched the one of our previous behavioral study (Willems & Vervliet, 2021): ηp2 = +/- 0.50.”

      Author response image 1.

      Main effect of Probability including 0% : F(1.74,31.23) = 53.94, p < .001, ηp2 = 0.75

      Main effect of Probability excluding 0%: F(1.50, 28.43) = 21.03, p < .001, ηp2 = 0.53

      Second, also in other published studies that used CSs with varying reinforcement rates (which either included explicit written instructions of the reinforcement rates or not) showed that the difference in expectations, anticipatory SCR or omission SCR was largest when comparing the CS0% to the other CSs of varying reinforcement rates (Grings & Sukoneck, 1971; Öhman et al., 1973; Ojala et al., 2022).

      Together, this suggests that when there is a possibility of stimulation, any additional difference in probability will have a smaller effect on the omission responses, irrespective of whether the underlying reinforcement rate is accurate or not.

      Adaptation to methods section: “Note that, based on previous research, we did not expect the inconsistency between the instructed and perceived reinforcement rate to have a negative effect on the Probability manipulation (see Supplementary Note 4).”

      Did dynamic learning impact the believability of the instructions?

      Although we tried to minimize learning in our paradigm by providing instructions that trials are independent from one another, we agree with the reviewers that this cannot preclude all learning. Any remaining learning effects should present themselves by downweighing the effect of the probability instructions over time. We controlled for this time-effect by including a “run” regressor in our analyses. Results of the Run regressor for subjective relief and omission-related SCR are presented in Supplemental Figure 5. These figures show that although there was a general drop in reported relief pleasantness and omission SCR over time, the effects of probability and intensity remained present until the last run. This indicates that even though some learning might have taken place, the main manipulations of probability and intensity were still present until the end of the task.

      Adaptations in the revised manuscript: We more clearly referred to the results of the Blockregressor which were presented in the supplementary material in the results section of the main paper (lines 159-162).

      Note that while there was a general drop in reported relief pleasantness and omission SCR over time, the effects of Probability and Intensity remained present until the last run (see Supplementary Figure 5). This further confirms that probability and intensity manipulations were effective until the end of the task.

      In the following sections of the rebuttal letter, we will go over the rest of the comments and our responses one by one.

      Reviewer #1 (Public Review):

      Summary:

      Willems and colleagues test whether unexpected shock omissions are associated with reward-related prediction errors by using an axiomatic approach to investigate brain activation in response to unexpected shock omission. Using an elegant design that parametrically varies shock expectancy through verbal instructions, they see a variety of responses in reward-related networks, only some of which adhere to the axioms necessary for prediction error. In addition, there were associations between omission-related responses and subjective relief. They also use machine learning to predict relief-related pleasantness, and find that none of the a priori "reward" regions were predictive of relief, which is an interesting finding that can be validated and pursued in future work.

      Strengths:

      The authors pre-registered their approach and the analyses are sound. In particular, the axiomatic approach tests whether a given region can truly be called a reward prediction error. Although several a priori regions of interest satisfied a subset of axioms, no ROI satisfied all three axioms, and the authors were candid about this. A second strength was their use of machine learning to identify a relief-related classifier. Interestingly, none of the ROIs that have been traditionally implicated in reward prediction error reliably predicted relief, which opens important questions for future research.

      Weaknesses:

      To ensure that the number of omissions is similar across conditions, the task employs inaccurate verbal instructions; i.e. 25% of shocks are omitted, regardless of whether subjects are told that the probability is 100%, 75%, 50%, 25%, or 0%. Given previous findings on interactions between verbal instruction and experiential learning (Doll et al., 2009; Li et al., 2011; Atlas et al., 2016), it seems problematic a) to treat the instructions as veridical and b) average responses over time. Based on this prior work, it seems reasonable to assume that participants would learn to downweight the instructions over time through learning (particularly in the 100% and 0% cases); this would be the purpose of prediction errors as a teaching signal. The authors do recognize this and perform a subset analysis in the 21 participants who showed parametric increases in anticipatory SCR as a function of instructed shock probability, which strengthened findings in the VTA/SN; however given that one-third of participants (n=10) did not show parametric SCR in response to instructions, it seems like some learning did occur. As prediction error is so important to such learning, a weakness of the paper is that conclusions about prediction error might differ if dynamic learning were taken into account.

      We thank the reviewer for raising this important concern. We believe we replied to all the issues raised in the general reply above.

      Lastly, I think that findings in threat-sensitive regions such as the anterior insula and amygdala may not be adequately captured in the title or abstract which strictly refers to the "human reward system"; more nuance would also be warranted.

      We fully agree with this comment and have changed the title and abstract accordingly.

      Adaptations in the revised manuscript: We adapted the title of the manuscript.

      “Omissions of Threat Trigger Subjective Relief and Prediction Error-Like Signaling in the Human Reward and Salience Systems”

      Adaptations in the revised manuscript: We adapted the abstract (lines 27-29).

      “In line with recent animal data, we showed that the unexpected omission of (painful) electrical stimulation triggers activations within key regions of the reward and salience pathways and that these activations correlate with the pleasantness of the reported relief.”

      Reviewer #2 (Public Review):

      The question of whether the neural mechanisms for reward and punishment learning are similar has been a constant debate over the last two decades. Numerous studies have shown that the midbrain dopamine neurons respond to both negative and salient stimuli, some of which can't be well accounted for by the classic RL theory (Delgado et al., 2007). Other research even proposed that aversive learning can be viewed as reward learning, by treating the omission of aversive stimuli as a negative PE (Seymour et al., 2004).

      Although the current study took an axiomatic approach to search for the PE encoding brain regions, which I like, I have major concerns regarding their experimental design and hence the results they obtained. My biggest concern comes from the false description of their task to the participants. To increase the number of "valid" trials for data analysis, the instructed and actual probabilities were different. Under such a circumstance, testing axiom 2 seems completely artificial. How does the experimenter know that the participants truly believe that the 75% is more probable than, say, the 25% stimulation? The potential confusion of the subjects may explain why the SCR and relief report were rather flat across the instructed probability range, and some of the canonical PE encoding regions showed a rather mixed activity pattern across different probabilities. Also for the post-hoc selection criteria, why pick the larger SCR in the 75% compared to the 25% instructions? How would the results change if other criteria were used?

      We thank the reviewer for raising this important concern. We believe the general reply above covers most of the issues raised in this comment. Concerning the post-hoc selection criteria, we took 25% < 75% as criterium because this was a quite “lenient” criterium in the sense that it looked only at the effects of interest (i.e., did anticipatory SCR increase with increasing instructed probability?). However, also when the criterium was more strict (e.g., selecting participants only if their anticipatory SCR monotonically increased with each increase in instructed probability 0% < 25% < 50% < 75% < 100%, N = 11 participants), the probability effect (ωp2 = 0.08), but not the intensity effect, for the VTA/SN remained.

      To test axiom 3, which was to compare the 100% stimulation to the 0% stimulation conditions, how did the actual shock delivery affect the fMRI contrast result? It would be more reasonable if this analysis could control for the shock delivery, which itself could contaminate the fMRI signal, with extra confound that subjects may engage certain behavioral strategies to "prepare for" the aversive outcome in the 100% stimulation condition. Therefore, I agree with the authors that this contrast may not be a good way to test axiom 3, not only because of the arguments made in the discussion but also the technical complexities involved in the contrast.

      We thank the reviewer for addressing this additional confound. It was indeed impossible to control for the delivery of shock since the delivery of the shock was always present on the 100% trials (and thus completely overlapped with the contrast of interest). We added this limitation to our discussion in the manuscript. In addition, we have also added a suggestion for a contrast that can test the “no surprise equivalence” criterium.

      Adaptations in the revised manuscript: We adapted lines 358-364.

      “Thus, given that we could not control for the delivery of the stimulation in the 100% > 0% contrast (the delivery of the stimulation completely overlapped with the contrast of interest), it is impossible to disentangle responses to the salience of the stimulation from those to the predictability of the outcome. A fairer evaluation of the third axiom would require outcomes that are roughly similar in terms of salience. When evaluating threat omission PE, this implies comparing fully expected threat omissions following 0% instructions to fully expected absence of stimulation at another point in the task (e.g. during a safe intertrial interval).”

      Reviewer #3 (Public Review):

      We thank the reviewer for their comments. Overall, based on the reviewer’s comments, we noticed that there was an imbalance between a focus on “relief” in the introduction and the rest of the manuscript and preregistration. We believe this focus raised the expectation that all outcome measures were interpreted in terms of the relief emotion. However, this was not what we did nor what we preregistered. We therefore restructured the introduction to reduce the focus on relief.

      Adaptations in the revised manuscript: We restructured the introduction of the manuscript. Specifically, after our opening sentence: “We experience a pleasurable relief when an expected threat stays away1” we only introduce the role of relief for our research in lines 79-89.

      “Interestingly, unexpected omissions of threat not only trigger neural activations that resemble a reward PE, they are also accompanied by a pleasurable emotional experience: relief. Because these feelings of relief coincide with the PE at threat omission, relief has been proposed to be an emotional correlate of the threat omission PE. Indeed, emerging evidence has shown that subjective experiences of relief follow the same time-course as theoretical PE during fear extinction. Participants in fear extinction experiments report high levels of relief pleasantness during early US omissions (when the omission was unexpected and the theoretical PE was high) and decreasing relief pleasantness over later omissions (when the omission was expected and the theoretical PE was low)22,23. Accordingly, preliminary fMRI evidence has shown that the pleasantness of this relief is correlated to activations in the NAC at the time of threat omission. In that sense, studying relief may offer important insights in the mechanism driving safety learning.”

      Summary:

      The authors conducted a human fMRI study investigating the omission of expected electrical shocks with varying probabilities. Participants were informed of the probability of shock and shock intensity trial-by-trial. The time point corresponding to the absence of the expected shock (with varying probability) was framed as a prediction error producing the cognitive state of relief/pleasure for the participant. fMRI activity in the VTA/SN and ventral putamen corresponded to the surprising omission of a high probability shock. Participants' subjective relief at having not been shocked correlated with activity in brain regions typically associated with reward-prediction errors. The overall conclusion of the manuscript was that the absence of an expected aversive outcome in human fMRI looks like a reward-prediction error seen in other studies that use positive outcomes.

      Strengths:

      Overall, I found this to be a well-written human neuroimaging study investigating an often overlooked question on the role of aversive prediction errors, and how they may differ from reward-related prediction errors. The paper is well-written and the fMRI methods seem mostly rigorous and solid.

      Weaknesses:

      I did have some confusion over the use of the term "prediction-error" however as it is being used in this task. There is certainly an expectancy violation when participants are told there is a high probability of shock, and it doesn't occur. Yet, there is no relevant learning or updating, and participants are explicitly told that each trial is independent and the outcome (or lack thereof) does not affect the chances of getting the shock on another trial with the same instructed outcome probability. Prediction errors are primarily used in the context of a learning model (reinforcement learning, etc.), but without a need to learn, the utility of that signal is unclear.

      We operationalized “prediction error” as the response to the error in prediction or the violation of expectancy at the time of threat omission. In that sense, prediction error and expectancy violation (which is more commonly used in clinical research and psychotherapy; Craske et al., 2014) are synonymous. While prediction errors (or expectancy violations) are predominantly studied in learning situations, the definition in itself does not specify how the “expectancy” or “prediction” arises: whether it was through learning based on previous experience or through mere instruction. The rationale why we moved away from a conditioning study in the present manuscript is discussed in our general reply above.

      We agree with the reviewer that studying prediction errors outside a learning context limits the ecological validity of the task. However, we do believe there is also a strength to this approach. Specifically, the omission-related responses we measure are less confounded by subsequent learning (or updating of the wrongful expectation). Any difference between our results and prediction error responses in learning situation can therefore point to this exact difference in paradigm, and can thus identify responses that are specific to learning situations.

      An overarching question posed by the researchers is whether relief from not receiving a shock is a reward. They take as neural evidence activity in regions usually associated with reward prediction errors, like the VTA/SN . This seems to be a strong case of reverse inference. The evidence may have been stronger had the authors compared activity to a reward prediction error, for example using a similar task but with reward outcomes. As it stands, the neural evidence that the absence of shock is actually "pleasurable" is limited-albeit there is a subjective report asking subjects if they felt relief.

      We thank the reviewer for cautioning us and letting us critically reflect on our interpretation. We agree that it is important not to be overly enthusiastic when interpreting fMRI results and to attribute carelessly psychological functions to mere activations. Therefore, we will elaborate on the precautions we took not to minimize detrimental reverse inference.

      First, prior to analyzing our results, we preregistered clear hypotheses that were based on previous research, in addition to clear predictions, regions of interest and a testing approach on OSF. With our study, we wanted to investigate whether unexpected omissions of threat: (1) triggered activations in the VTA/SN, putamen, NAc and vmPFC (as has previously been shown in animal and human studies); (2) represent PE signals; and (3) were related to self-reported relief, which has also been shown to follow a PE time-curve in fear extinction (Vervliet et al., 2017). Based on previous research, we selected three criteria all PE signals should comply to. This means that if omission-related activations were to represent true PE signals, they should comply to these criteria. However, we agree that it would go too far to conclude based on our research that relief is a reward, or even that the omission-related activations represent only PE signals. While we found support for most of our hypotheses, this does not preclude alternative explanations. In fact, in the discussion, we acknowledge this and also discuss alternative explanations, such as responding to the salience (lines 395-397; “One potential explanation is therefore that the deactivation resulted from a switch from default mode to salience network, triggered by the salience of the unexpected threat omission or by the salience of the experienced stimulation.”), or anticipation (line 425-426; “... we cannot conclusively dismiss the alternative interpretation that we assessed (part of) expectancy instead”).

      Second, we have deliberately opted to only use descriptive labels such as omission-related activations when we are discussing fMRI results. Only when we are talking about how the activations were related to self-reported relief, we talk about relief-related activations.

      I have some other comments, and I elaborate on those above comments, below:

      (1) A major assumption in the paper is that the unexpected absence of danger constitutes a pleasurable event, as stated in the opening sentence of the abstract. This may sometimes be the case, but it is not universal across contexts or people. For instance, for pathological fears, any relief derived from exposure may be short-lived (the dog didn't bite me this time, but that doesn't mean it won't next time or that all dogs are safe). And even if the subjective feeling one gets is temporary relief at that moment when the expected aversive event is not delivered, I believe there is an overall conflation between the concepts of relief and pleasure throughout the manuscript. Overall, the manuscript seems to be framed on the assumption that "aversive expectations can transform neutral outcomes into pleasurable events," but this is situationally dependent and is not a common psychological construct as far as I am aware.

      We thank the reviewer for their comment. We have restructured the introduction because we agree with the reviewer that the introduction might have set false expectations concerning our interpretation of the results. The statements related to relief have been toned down in the revised manuscript.

      Still, we want to note that the initial opening statement “unexpected absence of danger constitutes the pleasurable emotion relief” was based on a commonly used definition of relief that states that relief refers to “the emotion that is triggered by the absence of expected or previously experienced negative stimulation ” (Deutsch, 2015). Both aspects that it is elicited by the absence of an otherwise expected aversive event and that it is pleasurable in nature has received considerable empirical support in emotion and fear conditioning research (Deutsch et al., 2015; Leknes et al., 2011; Papalini et al., 2021; Vervliet et al., 2017; Willems & Vervliet, 2021).

      That said, the notion that the feeling of relief is linked to the (reward) prediction error underlying the learning of safety is included in several theoretical papers in order to explain the commonly observed dopaminergic response at the time of threat omission (both in animals and humans; Bouton et al., 2020; Kalisch et al., 2019; Pittig et al., 2020).

      Together, these studies indicate that the definition of relief, and its potential role in threat omission-driven learning is – at least in our research field – established. Still, we felt that more direct research linking feelings of relief to omission-related brain responses was warranted.

      One of the main reasons why we specifically focus on the “pleasantness” of the relief is to assess the hedonic impact of the threat omission, as has been done in previous studies by our lab and others (Leknes et al., 2011; Leng et al., 2022; Papalini et al., 2021; Vervliet et al., 2017; Willems & Vervliet, 2021). Nevertheless, we agree with the reviewer that the relief we measure is a short-lived emotional state that is subjected to individual differences (as are all emotions).

      (2) The authors allude to this limitation, but I think it is critical. Specifically, the study takes a rather simplistic approach to prediction errors. It treats the instructed probability as the subjects' expectancy level and treats the prediction error as omission related activity to this instructed probability. There is no modeling, and any dynamic parameters affected by learning are unaccounted for in this design . That is subjects are informed that each trial is independently determined and so there is no learning "the presence/absence of stimulations on previous trials could not predict the presence/absence of stimulation on future trials." Prediction errors are central to learning. It is unclear if the "relief" subjects feel on not getting a shock on a high-probability trial is in any way analogous to a prediction error, because there is no reason to update your representation on future trials if they are all truly independent. The construct validity of the design is in question.

      (3) Related to the above point, even if subjects veered away from learning by the instruction that each trial is independent, the fact remains that they do not get shocks outside of the 100% probability shock. So learning is occurring, at least for subjects who realize the probability cue is actually a ruse.

      We thank the reviewer for raising these concerns. We believe that the general reply above covers the issues raised in points 2 and 3.

      (4) Bouton has described very well how the absence of expected threat during extinction can create a feeling of ambiguity and uncertainty regarding the signal value of the CS. This in large part explains the contextual dependence of extinction and the "return of fear" that is so prominent even in psychologically healthy participants. The relief people feel when not receiving an expected shock would seem to have little bearing on changing the long-term value of the CS. In any event, the authors do talk about conditioning (CS-US) in the paper, but this is not a typical conditioning study, as there is no learning.

      We fully agree with the reviewer that our study is no typical conditioning study. Nevertheless, because our research mostly builds on recent advances in the fear extinction domain, we felt it was necessary to introduce the fear extinction procedure and related findings. In the context of fear extinction learning, we have previously shown that relief is an emotional correlate of the prediction error driving acquisition of the novel safety memory (CSnoUS; Papalini et al., 2021; Vervliet et al., 2017). The ambiguity Bouton describes is the result of extinguished CS holding multiple meanings once the safety memory is acquired. Does it signal danger or safety? We agree with Bouton that the meaning of the CS for any new encounter will depend on the context, and the passage of time, but also on the initial strength of the safety acquisition (which is dependent on the size of the prediction error, and hence the amount of relief; Craske et al., 2014). However, it was not our objective to directly study the relation of relief to subsequent CS value, and our design is not tailored to do so post hoc.

      (5) In Figure 2 A-D, the omission responses are plotted on trials with varying levels of probability. However, it seems to be missing omission responses in 0% trials in these brain regions. As depicted, it is an incomplete view of activity across the different trial types of increasing threat probability.

      We thank the reviewer for pointing out this unclarity. The betas that are presented in the figures represent the ROI averages from each non-0% vs 0% contrasts (i.e., 25%>0%; 50%>0%; and 75%>0% for the weak, moderate and strong intensity levels). Any positive beta therefore indicates a stronger activation in the given region compared to a fully predicted omission. Any negative beta indicates a weaker activation.

      Adaptations in the revised manuscript: We have adapted the figure captions of figures 2 and 3.

      “The extracted beta-estimates in figures A-D represent the ROI averages from each non0% > 0% contrast (i.e., 25%>0%; 50%>0%; and 75%>0% for the weak, moderate and strong intensity levels). Any positive beta therefore indicates a stronger activation in the given region compared to a fully predicted omission. Any negative beta indicates a weaker activation.”

      (6) If I understand Figure 2 panels E-H, these are plotting responses to the shock versus no-shock (when no-shock was expected). It is unclear why this would be especially informative, as it would just be showing activity associated with shocks versus no-shocks. If the goal was to use this as a way to compare positive and negative prediction errors, the shock would induce widespread activity that is not necessarily reflective of a prediction error. It is simply a response to a shock. Comparing activity to shocks delivered after varying levels of probability (e.g., a shock delivered at 25% expectancy, versus 75%, versus 100%) would seem to be a much better test of a prediction error signal than shock versus no-shock.

      We thank the reviewer for this comment. The purpose of this preregistered contrast was to test whether fully predicted outcomes elicited equivalent activations in our ROIs (corresponding to the third prediction error axiom). Specifically, if a region represents a pure prediction error signal, the 100% (fully predicted shocks) > 0% (fully predicted shock omissions) contrast should be nonsignificant, and follow-up Bayes Factors would further provide evidence in favor of this null-hypothesis.

      We agree with the reviewer that the delivery of the stimulation triggers widespread activations in our regions of interest that confounded this contrast. However, given that it was a preregistered test for the prediction error axioms, we cannot remove it from the manuscript. Instead, we have argued in the discussion that future studies who want to take an axiomatic stance should consider alternative tests to examine this axiom.

      Adaptations in the revised manuscript: We adapted lines 358-364.

      “Thus, given that we could not control for the delivery of the stimulation in the 100% > 0% contrast (the delivery of the stimulation completely overlapped with the contrast of interest), it is impossible to disentangle responses to the salience of the stimulation from those to the predictability of the outcome. A fairer evaluation of the third axiom would require outcomes that are roughly similar in terms of salience. When evaluating threat omission PE, this implies comparing fully expected threat omissions following 0% instructions to fully expected absence of stimulation at another point in the task (e.g. during a safe intertrial interval).”

      Also note that our task did not lend itself for an in-depth analysis of aversive (worse-thanexpected) prediction error signals, given that there was only one stimulation trial for each probability x intensity level (see Supplemental Figure 1). The most informative contrast that can inform us about aversive prediction error signals contrasts all non-100% stimulation trials with all 100% stimulation trials. The results of this contrast are presented in Supplemental Figure 16 and Supplemental Table 11 for completeness.

      (7) I was unclear what the results in Figure 3 E-H were showing that was unique from panels A-D, or where it was described. The images looked redundant from the images in A-D. I see that they come from different contrasts (non0% > 0%; 100% > 0%), but I was unclear why that was included.

      We thank the reviewer for this comment. Our answer is related to that of the previous comment. Figure 3 presents the results of the axiomatic tests within the secondary ROIs we extracted from a wider secondary mask based on the non0%>0% contrast.

      (8) As mentioned earlier, there is a tendency to imply that subjects felt relief because there was activity in "the reward pathway ."

      We thank the reviewer for their comment, but we respectfully disagree. Subjective relief was explicitly probed when the instructed stimulations stayed away. In the manuscript we only talk about “relief” when discussing these subjective reports. We found that participants reported higher levels of relief-pleasantness following omissions of stronger and more probable threat. This was an observation that matches our predictions and replicates our previous behavioral study (Willems & Vervliet, 2021).

      The fMRI evidence is treated separately from the “pleasantness” of the relief. Specifically, we refrain from calling the threat omission-related neural responses “relief-activity” as this would indeed imply that the activation would only be attributed to this psychological function. Instead, we talked about omission-related activity, and we assessed whether it complied to the prediction error criteria as specified by the axiomatic approach.

      Only afterwards, because we hypothesized that omission-related fMRI activation and selfreported relief-pleasantness were related, and because we found a similar response pattern for both measures, we examined how relief and omission-related fMRI activations within our ROIs were related on a trial-by-trial basis. To this end, we entered relief-pleasantness ratings as a parametric modulator to the omission regressor.

      By no means do we want to reduce an emotional experience (relief) to fMRI activations in isolated regions in the brain. We agree with the reviewer that this would be far too reductionist. We therefore also ran a pre-registered LASSO-PCR analysis in order to identify whether a whole-brain pattern of activations can predict subjective relief (independent from the exact instructions we gave, and independent of our a priori ROIs). This analysis used trialby-trial patterns of activation across all voxels in the brain as the predictor and self-reported relief as the outcome variable. It is therefore completely data-driven and can be seen as a preregistered exploratory analysis that is intended to inform future studies.

      (9) From the methods, it wasn't entirely clear where there is jitter in the course of a trial. This centers on the question of possible collinearity in the task design between the cue and the outcome. The authors note there is "no multicollinearity between anticipation and omission regressors in the firstlevel GLMs," but how was this quantified? b The issue is of course that the activity coded as omission may be from the anticipation of the expected outcome.

      We thank the reviewer for pointing out this unclarity. Jitter was introduced in all parts of the trial: i.e., the duration of the inter-trial interval (4-7s), countdown clock (3-7s), and omission window (4-8s) were all jittered (see fig. 1A and methods section, lines 499-507). We added an additional line to the method section.

      Adaptations in the revised manuscript: We added an additional line of to the methods section to further clarify the jittering (lines 498-500).

      “The scale remained on the screen for 8 seconds or until the participant responded, followed by an intertrial interval between 4 and 7 seconds during which only a fixation cross was shown. Note that all phases in the trial were jittered (i.e., duration countdown clock, duration outcome window, duration intertrial interval).”

      Multicollinearity between the omission and anticipation regressors was assessed by calculating the variance inflation factor (VIF) of omission and anticipation regressors in the first level GLM models that were used for the parametric modulation analyses.

      Adaptations in the revised manuscript: We replaced the VIF abbreviation with “variance inflation factor” (line 423-424).

      “Nevertheless, there was no multicollinearity between anticipation and omission regressors in the first-level GLMs (VIFs Variance Inflation Factor, VIF < 4), making it unlikely that the omission responses purely represented anticipation.”

      (10) I did not fully understand what the LASSO-PCR model using relief ratings added. This result was not discussed in much depth, and seems to show a host of clusters throughout the brain contributing positively or negatively to the model. Altogether, I would recommend highlighting what this analysis is uniquely contributing to the interpretation of the findings.

      The main added value of this analyses is that it uses a different approach altogether. Where the (mass univariate) parametric modulation analysis estimated in each voxel (and each ROI) whether the activity in this voxel/ROI covaried with the reported relief, a significant activation only indicated that this voxel was related to relief. However, given that each voxel/ROI is treated independently in this analysis, it remains unclear how the activations were embedded in a wider network across the brain, and which regions contributed most to the prediction of relief. The multivariate LASSO-PCR analysis approach we took attempts to overcome this limitation by examining if a more whole-brain pattern can predict relief. Because we use the whole-brain pattern (and not only our a priori ROIs), this analysis is completely data-driven and is intended to inform future studies. In addition, the LASSO-PCR model was cross-validated using five-fold cross-validation, which is also a difference (and a strength) compared to the mass univariate GLM approach.

      One interesting finding that only became evident when we combined univariate and multivariate approaches is that despite that the parametric modulation analysis showed that omission-related fMRI responses in the ROIs were modulated by the reported relief, none of these ROIs contributed significantly to the prediction of relief based on the identified signature. Instead, some of the contributing clusters fell within other valuation and errorprocessing regions (e.g. lateral OFC, mid cingulate, caudate nucleus). This suggests that other regions than our a priori ROIs may have been especially important for the subjective experience of relief, at least in this task. However, all these clusters were small and require further validation in out of sample participants. More research is necessary to test the generalizability and validity of the relief signature to new individuals and tasks, and to compare the signature with other existing signature models (e.g., signature of pain, fear, reward, pleasure). However, this was beyond the scope of the present study.

      Adaptations in the revised manuscript: We altered the explanation of the LASSO-PCR approach in the results section (lines 286-295) and the discussion (lines 399-402)

      Adaptations in the Results section: “The (mass univariate) parametric modulation analysis showed that omission-related fMRI activity in our primary and secondary ROIs correlated with the pleasantness of the relief. However, given that each voxel/ROI is treated independently in this analysis, it remains unclear how the activations were embedded in a wider network of activation across the brain, and which regions contributed most to the prediction of relief. To overcome these limitations, we trained a (multivariate) LASSO-PCR model (Least Absolute Shrinkage and Selection Operator-Regularized Principle Component Regression) in order to identify whether a spatially distributed pattern of brain responses can predict the perceived pleasantness of the relief (or “neural signature” of relief)31. Because we used the whole-brain pattern (and not only our a priori ROIs), this analysis is completely data driven and can thus identify which clusters contribute most to the relief prediction.”

      Adaptations in the Discussion section: “In addition to examining the PE-properties of neural omission responses in our a priori ROIs, we trained a LASSO-PCR model to establish a signature pattern of relief. One interesting finding that only became evident when we compared the univariate and multivariate approach was that none of our a priori ROIs appeared to be an important contributor to the multivariate neural signature, even though all of them (except NAc) were significantly modulated by relief in the univariate analysis.”

      In addition to the public peer review, the reviewers provided some recommendation on how to further improve our manuscript. We will reply to the recommendations below.

      Reviewer #1 (Recommendations For The Authors):

      Given that you do have trial-level estimates from the classifier analysis, it would be very informative to use learning models and examine responses trial-by-trial to test whether there are prediction errors that vary over time as a function of learning.

      We thank the reviewer for the suggestion. However, based on the results of the run-regressor, we do not anticipate large learning effects in our paradigm. As we mentioned in our responses above, we controlled for time-related drops in omission-responding by including a “run” regressor in our analyses. Results of this regressor for subjective relief and omission-related SCR showed that although there was a general drop in reported relief pleasantness and omission SCR over time, the effects of probability and intensity remained present until the last run. This suggests that even though some learning might have taken place, its effect was likely small and did not abolish our manipulations of probability and intensity. In any case, we cannot use the LASSO-PCR signature model to investigate learning, as this model uses the trial-level brain pattern at the time of US omission to estimate the associated level of relief. These estimates can therefore not be used to examine learning effects.

      Reviewer #2 (Recommendations For The Authors):

      The LASSO-PCR model feels rather disconnected from the rest of the paper and does not add much to the main theme. I would suggest to remove this part from the paper.

      We thank the reviewer for this suggestion. However, the LASSO-PCR analysis was a preregistered. We therefore cannot remove it from the manuscript. We hope to have clarified its added value in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have revised the manuscript mainly in the following aspects: (1) the data of electrophysiological and behavioral responses of larvae and adults to trehalose have been added, and the related figures and texts have been modified accordingly; (2) the photos of taste organs of larvae and adults indicating the position of recorded sensilla have been added; (3) the potential off-target effects of GR knock-out on other GR expressions has been carefully explained and revised in the relevant text; (4) the abstract has been revised to present the findings more technically in a limited number of words; (5) some details of experiments in Materials and Methods and some new literatures have been added; (6) a new figure (Figure 8) summarizing the main findings of the study has been added.

      In the following, we respond to the reviewers’ comments and suggestions one by one. We hope that our answers will satisfy you and the three reviewers. We are also very happy to get further valuable advices from you.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The process of taste perception is significantly more intricate and complex in Lepidopteran insects. This investigation provides valuable insights into the role of Gustatory receptors and their dynamics in the sensation of sucrose, which serves as a crucial feeding cue for insects. The article highlights the differential sensitivity of Grs to sucrose and their involvement in feeding and insect behavior.

      Strengths:

      To support the notion of the differential specificity of Gr to sucrose, this study employed electrophysiology, ectopic expression of Grs in Xenopus, genome editing, and behavioral studies on insects. This investigation offers a fundamental understanding of the gustation process in lepidopteran insects and its regulation of feeding and other gustation-related physiological responses. This study holds significant importance in advancing our comprehension of lepidopteran insect biology, gustation, and feeding behavior.

      Thank you for your recognition of our research.

      Weaknesses:

      While this manuscript demonstrates technical proficiency, there exists an opportunity for additional refinement to optimize comprehensibility for the intended audience. Several crucial sugars have been overlooked in the context of electrophysiology studies and should be incorporated. Furthermore, it is imperative to consider the potential off-target effects of Gr knock-out on other Gr expressions. This investigation focuses exclusively on Gr6 and Gr10, while neglecting a comprehensive narrative regarding other Grs involved in sucrose sensation.

      We accept the reviewer's suggestion. Because trehalose is a main sugar in insect blood, and it is converted by insects after feeding on plant sugars, we have added the new data on electrophysiological and behavioral responses of larvae and adults of Helicoverpa armigera to trehalose (see Figure 1-2, Figure 1-figure supplement 1, Figure 2-figure supplement 1). Now, the total eight sugars include 2 pentoses (arabinose and xylose), 4 hexoses (fructose, fucose, galactose and glucose), and 2 disaccharides (sucrose and trehalose), which were chosen because they are mainly present in host-plants of H. armigera and/or representative in the structure and source of sugars.

      We fully agree to the reviewer’s opinion and have already taken the potential off-target effects of CRISPR/Cas9 knockout of Gr on other GR expressions into consideration. To predict the potential off-target sites of sgRNA of Gr6 and Gr10 establishing homozygous mutants using CRISPR/Cas9 technology, we first use online software CasOFFinder (http://www.rgenome.net/cas-offinder/) to blast the genome of the wild type cotton bollworm and set the mismatch number less than or equal to 3. We found that Gr10 sgRNA had no potential potential off-target site, and the sgRNA of Gr6 had only one potential off-target site. Therefore, we designed primers according to the sequence of potential off-target sites of Gr6 sgRNA, and conducted PCR using genomic DNA of homozygous mutant as a template, performed Sanger sequencing on the PCR products obtained, and found that the potential off-target sites of Gr6 sgRNA were no different from those of the wild type. Particularly, concerning the sgRNA of Gr6 and Gr10 may produce off-target effects on other sugar receptor genes of H. armigera, we conducted the same off-target site analysis with the designed sgRNA on each of the other eight sugar receptor genes, and found that there were no off-target sites on these receptor genes (see Line254-256).

      Reviewer #2 (Public Review):

      Summary:

      To identify sugar receptors and assess the capacity of these genes the authors first set out to identify behavioral responses in larvae and adults as well as physiological response. They used phylogenetics and gene expression (RNAseq) to identify candidates for sugar reception. Using first an in vitro oocyte system they assess the responses to distinct sugars. A subsequent genetic analysis shows that the Gr10 and Gr6 genes provide stage specific functions in sugar perception.

      Strengths:

      A clear strength of the manuscript is the breadth of techniques employed allowing a comprehensive study in a non-canonical model species.

      Thank you for your recognition of our research.

      Weaknesses:

      There are no major weaknesses in the study for the current state of knowledge in this species. Since it is much basic work to establish a broader knowledge, context with other modalities remains unknown. It might have been possible to probe certain contexts known from the fruit fly, which would have strengthened the manuscript.

      Thank you so much for your suggestion. According to this suggestion, we further added some sentences probing sugar sensing and behaviors of fruit fly larvae in the Introduction and discussion sections (Line 68-71 in Introduction section, Line 395-399 in Discussion section).

      Reviewer #3 (Public Review):

      In this study, the authors combine electrophysiology, behavioural analyses, and genetic editing techniques on the cotton bollworm to identify the molecular basis of sugar sensing in this species.

      The larval and adult forms of this species feed on different plant parts. Larvae primarily consume leaves, which have relatively lower sugar concentrations, while adults feed on nectar, rich in sugar. Through a series of experiments-spanning electrophysiological recordings from both larval and adult sensillae, qPCR expression analysis of identified GRs from these sensillae, response profiles of these GRs to various sugars via heterologous expression in Xenopus oocytes, and evaluations of CRISPR mutants based on these parameters-the authors discovered that larvae and adults employ distinct GRs for sugar sensing. While the larva uses the highly sensitive GR10, the adult uses the less sensitive and broadly tuned GR6. This differential use of GRs are in keeping with their behavioral ecology.

      The data are cohesive and consistently align across the methodologies employed. They are also well presented and the manuscript is clearly written.

      Recommendations for the authors:

      While appreciating the quality of the work and its presentation, we have a few comments for the authors, should they wish to consider them, that would significantly improve the presentation of the work.

      Title: Could the authors please revisit their title to better reflect the main finding of their work?

      The title has been changed into “The larva and adult of Helicoverpa armigera use differential gustatory receptors to sense sugars”.

      Text: There are a few comments related to the text, and these are listed below:

      (1) Could the authors place their work in the context of what's known about sugar sensing in Drosophila larva and adult?

      In the Introduction section, we added the status of research on sugar perception in Drosophila larvae, pointing out "No external sugar-sensing mechanism in Drosophila larvae has yet been characterized." (Line 70-71); in the Discussion section, the research progress of sugar sensing in Drosophila adults and larvae was also summarized (Line 397-399).

      (2) For each results section, could the authors please include a sentence or two that interprets the data in the context of previously presented data?

      We accept the reviewer's suggestion. In order to make it easy for readers to follow up, we included a sentence interprets the above data at the beginning of each part of the Results on the premise of avoiding duplication.

      (3) Could the authors please provide details of the generation and screening of the CRISPR mutants?

      We have added more details on mutant establishment and screening in the Materials and Methods section (Line 722-726, 729-732).

      Figures: Could the authors please include images and schematics wherever possible? For example, a schematic depicting the position of the sense organs and one summarising the main findings of the studies.

      In Figure 1 we added the photo of each taste organ, on which the recorded sensilla were indicated. We also added a new figure, Figure 8, summarizing the main findings of the study.

      Choice of Sugars: Could the authors please justify their choice of sugars they have used in the analyses?

      In the first paragraph of the Results section of the article, we further explain the reasons for using the sugars in the study. “We first investigated the electrophysiological responses of the lateral and medial sensilla styloconica in the larval maxillary galea to eight sugars. These sugars were chosen because they are mostly found in host-plants of H. armigera or are representative in the structure and source of sugars.”

      In addition to this, there are several specific comments in the detailed reviewers comments below, which the authors could consider responding to.

      Reviewer #1 (Recommendations For The Authors):

      The article titled "Sucrose taste receptors exhibit dissimilarities between larval and adult stages of a moth" by Shuai-Shuai Zhang and colleagues provides an intriguing analysis. The authors have conducted a meticulously planned and executed study. However, I do have some inquiries.

      (1) What precisely does the term "differ" signify in the title? It can be expounded upon in terms of differing in expression or sensitivity. The title could benefit from being more informative. The authors should appropriately specify the insect species in the title of the paper. This would make it more comprehensible to readers. Merely mentioning the term "moth" does not provide any information about the model organism. Hence, it would be preferable to mention Helicoverpa armigera instead of using the generic term "moth" in the title.

      Thank you for your suggestions. We considered it better to emphasize that the receptors for sucrose are different, and we have accepted the suggestion of adding the name of the animal. The title has been changed into “The larva and adult of Helicoverpa armigera use differential gustatory receptors to sense sugars”.

      (2) The abstract is written in a simple and easily understandable manner, but it overlooks important findings from a technical standpoint.

      We add some key experimental techniques to illustrate some important findings in the Abstract.

      (3). Almost all herbivorous insects are said to consume plants and utilize sucrose as a stimulus for feeding, as stated by the authors. Sucrose, glucose, and fructose sugar are among the commonly observed stimulants for feeding in numerous insects. It would be appropriate to incorporate not only sucrose but also glucose and fructose as feeding stimulants for almost all herbivorous insects.

      Thank you for your suggestion. Sucrose is the major sugar in plants, and its concentration varies greatly from tissue to tissue, while the concentration of the hexose sugars is much lower and the concentration does not change much. In Line 48, we state that sucrose, glucose, and fructose are feeding stimuli for herbivorous insects. From the previous studies, it seems that sucrose is the strongest, followed by fructose, and finally glucose. The cotton bollworm larvae showed no electrophysiological and behavioral response to glucose.

      (4) The reason why trehalose is not considered in the electrophysiology analysis is unclear. Given that trehalose is a major sugar in insects and plants, it would be intriguing to include it in the analysis.

      We have accepted the reviewer's suggestion, and supplemented the electrophysiological responses of taste organs in larvae and adults of Helicoverpa armigera to trehalose (Figure 1, Figure 1-Figure Supplement 1), and also tested the behavioral responses of the larvae and adults to trehalose (Figure 2, Figure 2-Figure Supplement 1). Therefore, all the related figures have been changed.

      (5) The author's intention regarding the co-receptor relationship between Gr5 and Gr6 (line 211) is unclear. If this is indeed the case, then the reason for considering Gr5 in further studies remains uncertain.

      We have changed the sentence as follows: “Since Gr5 was highly expressed with Gr6 in the proboscis and tarsi (Figure 3D-3E, Figure 3—figure supplement 1), we suspected that Gr5 and Gr6 might be expressed in the same cells, and then tested the response profile of their co-expression in oocytes.”

      (6) The homologous nature of Grs is emphasized by the authors. It is not specified how the author ensured that the guide RNA targeting Gr6 or Gr10 did not result in off-target effects on other Grs.

      Thank you so much for your suggestion. We have rewritten the relevant paragraph (Line 238-251), detailing our tests and the results on the potential off-target effects of knocking out GRs by CRISPR/Cas9: “In order to predict the potential off-target sites of sgRNA of Gr6 and Gr10, we used online software Cas-OFFinder (http://www.rgenome.net/cas-offinder/) to blast the genome of H. armigera, and the mismatch number was set to less than or equal to 3. According to the predicted results, the Gr10 sgRNA had no potential off-target region but Gr6 sgRNA had one. Therefore, we amplified and sequenced the potential off-target region of Gr6-/- and found there was no frameshift or premature stop codon in the region compared to WT (Figure 5—figure supplement 2). It is worth mentioning that there was no potential off-target region of Gr6 and Gr10 sgRNA in other sugar receptor genes of H. armigera, Gr4, Gr5, Gr7, Gr8, Gr9, Gr11 and Gr12. We further found there was no difference in the response to xylose of the medial sensilla styloconica among WT, Gr10-/- and Gr6-/- (Figure 5—figure supplement 2). Furthermore, WT, Gr10-/- and Gr6-/- did not show differences in the larval body weight, adult lifespan, and number of eggs laid per female (Figure 5—figure supplement 2). All these results suggest that no off-target effects occurred in the study.”

      (7) Is it possible that knocking out Gr10 is not compensated for by the overexpression of Gr6 or other sucrose sensing Grs? Similarly, would the vice versa scenario hold true?

      In the Discussion section, we have added some sentences to discuss this issue: “From our results, knocking out Gr10 or Gr6 is unlikely to be compensated by overexpression of other sugar GRs. One of our recent studies showed that Orco knockout had no significant effect on the expression of most OR, IR and GR genes in adult antennae of H. armigera, but some genes were up- or down-regulated (Fan et al., 2022).”

      (8) What was the rationale for selecting nine candidate GR genes for expression analysis?

      Based on the reviewer's suggestion, we expanded the relevant paragraphs to illustrate the rationale for selecting nine candidate GR genes for expression analysis: “To reveal the molecular basis of sugar reception in the taste sensilla of H. armigera, we first analyzed the putative sugar gustatory receptor genes based on the reported gene sequences of GRs in H. armigera and their phylogenetic relationship of D. melanogaster sugar gustatory receptors (Jiang et al., 2015; Pearce et al., 2017; Xu et al., 2017). Nine putative sugar GR genes, Gr4–12 were identified, and their full-length cDNA sequences were cloned (The GenBank accession number is provided in Appendix—Table S1).” (Line 155-161)

      (9) What is the potential reason for the difference between the major larval sugar receptors of Drosophila and Lepidopterans?

      The difference between the major larval sugar receptors of Drosophila and Lepidopterans is probably due to differences in the food their larvae feed on. Fruit fly larvae feed on rotten fruit, the main sugar of which is fructose. The larvae of Lepidoptera mainly feed on plants, and the main sugar is sucrose. In the Discussion section, we have added a sentence “This is most likely due to fruit fly larvae feeding on rotten fruits, which contain fructose as the main sugar.” (Line 399-401)

      (10) There is a disparity in GRs, specifically GR5 and GR6, between the female antenna, proboscis, and tarsi. What could be the possible justification and significance of this?

      Thank you so much for this question. We have added a sentence in the Discussion section, “In this study, the expression patterns of 9 sugar GRs in three taste organs of adult H. armigera show that there is a disparity in GRs, specifically GR5 and GR6, between the female antenna, tarsi and proboscis, which may be an evolutionary adaptation reflecting subtle differentiation in the function of these taste organs in adult foraging. Antennae and tarsi play a role in the exploration of potential sugar sources, while the proboscis plays a more precise role in the final decision to feed.” (Line 433-438)

      (11) I suggest that a visual representation illustrating the positioning of GSNs, particularly the lateral and medial sensilla, in both larva and adult stages would enhance the correlation with the results.

      In Figure 1 we added the photo of each taste organ and the position of the recorded sensilla, and also added a new figure, Figure 8 summarizing the main findings of the studies.

      (12) Further experiments can be conducted to elucidate the precise molecular mechanisms, particularly the downstream effects of GRs, in order to establish the specificity of GRs more convincingly.

      Thank you so much for your suggestion. We have discussed the further experiments in the Discussion section, “To elucidate the precise molecular mechanisms of sugar reception in H. armigera is necessary to compare a series of single, double and even multiple Gr knock-out lines and investigate the downstream effects of the GRs.” (Line 363-369)

      (13) Figure 6 caption: In Figure 6 (D to I), the percentage of PER is depicted. There is redundancy in the Y-axis title (Percentage of PER) and the legend. This appears to be repetitive. I suggest that it would be better to include the Y-axis title only in Figure D or in Figures D and G.

      We accept the suggestion. Figure 7 (not Figure 6) has been revised accordingly.

      (14) In Figures 6A and 6C, there is inconsistency in the colors used for WT, Gr6, and Gr10. This could potentially confuse the reader. I recommend using the same colors in both figures instead of using a blue color. Please specify how the authors calculated the feeding area in Figure 6.

      We accept the reviewer's suggestion and have changed the color of Figure 7A, B. We have also added the detail method for calculating feeding area (Line 541-545).

      (15) In Two-choice tests, why did the authors use 0.01% Tween 80? Please provide comments on this.

      Use of 0.01% Tween 80 is to reduce the surface tension and increase the malleability of the solution. We have given detailed explanation in the Method section and cite the reference. (Line538-540)

      (16) It would be valuable if the authors could comment on the prospects of this study, considering that GRs play a vital role in controlling behavior and developmental pathways. What are the potential consequences of blocking or disrupting these receptors in terms of behavioral and developmental phenotypic deformities? Could this potentially lead to increased insect mortality?

      Thank you so much for your suggestions. In the last paragraph of the Discussion section, we have added the following perspectives, “Knockout of Gr10 or Gr6 led to a significant decrease in sugar sensitivity and food preference of the larvae and adults of H. armigera, respectively, which is bound to bring adverse consequences to survival and reproduction of the insects. Therefore, studying the molecular mechanisms underlying sugar perception in phytophagous insects may provide new insights into the behavioral ecology of this important and highly diverse group of insects, and measures blocking or disrupting sugar receptors could also have applications to control agricultural pests and improve crop yields worldwide” (Line 449-456).

      Reviewer #2 (Recommendations for The Authors):

      There are a few comments, that I feel would be beneficial to be addressed.

      • The authors used 7 different sugars for their experimental approach. While I agree that this is a sufficiently large collection for a study, I was wondering why they specifically chose these sugars; an explanatory section might be helpful for a reader to follow the reasoning.

      According to reviewer 1's suggestion, we increased trehalose to 8 sugars in experiments. Trehalose is a main sugar in insect blood. It is converted by insects after feeding on plant sugars. The 8 sugars were chosen because they are present in host-plants of H. armigera or are representative in the structure and source of sugars. They contain 2 pentoses (arabinose and xylose), 4 hexoses (fructose, fucose, galactose and glucose), and 2 disaccharides (sucrose and trehalose).

      • It might be beneficial to provide some broader overview on the gustatory system in the cotton bollworm, particularly at the larval stage since this may not be common knowledge. Along these lines eg. the complexity of sensilla types, organs and overall number (or estimation) of neurons might be good to know, a graphical representation of the sense organs might be informative.

      In the Introduction section, we give a more specific description on sugar sensitive GSNs in the taste system of the larva and adult of H. armigera, and cite the corresponding references.

      • Concerning phylogeny of GRs, it might be relevant to know how complete the genome information is and some more general background on GR diversity in the cotton bollworm.

      We agree to your opinion. According to this idea, we got the putative sugar GRs from the previously published genome (Pearce et al. 2017) and the related annotation of GRs (Jiang et al. 2015, Xu et al. 2012). We have made a more detailed explanation about this in the new version of the manuscript, “We first analyzed the putative sugar gustatory receptor genes based on the genome data of H. armigera (Pearce et al. 2017), the reported gene sequences of sugar GRs in H. armigera and their phylogenetic relationship of D. melanogaster sugar gustatory receptors (Jiang et al. 2015, Xu et al. 2012). All nine putative sugar GR genes in H. armigera, Gr4–12 were validated, and their full-length cDNA sequences were cloned (The GenBank accession number is provided in Appendix—Table S1).” (Line 155-161).

      • Generation of mutants based on CRISPR is intriguing and a powerful step. While the techniques are well described in the method section, there is no information concerning efficiency or broader feasibility of the approach. I feel it would be quite interesting to learn about how feasible or laborious the approach is to generate mutants (e.g. number of initial injected eggs, the resulting F0 offspring, number of back-crosses, number of screened F1s ....).

      In the Materials and Methods section, we have added specific success rates for each step in the process of building the two mutants (Line 722-726, 729-732).

      Reviewer #3 (Recommendations For The Authors):

      I want to congratulate the authors on this very nice study and have only minor comments for them.

      (1) It would be very nice to include pictures of the larva and adult of H. armigera. It would also help to have schematics of where the sensilla they are recording from are.

      We have added photos of four taste organs on which the recoded sensilla were indicated (Figure 1), and picture of the larva and adult on which the stimulating site was indicated (Figure 2).

      (2) A schematic summarising their findings, including the relevance to the animal's behavioural ecology, will greatly improve interpretations for the broader audience.

      A schematic summarizing the findings has been added.

      (3) The manner in which PIs are represented in figure 2A, B (among others) is confusing. Can the authors please plot the PI and not the feeding area? From the PI values listed beside the plot, it actually suggests that the larvae don't really show a preference. Could the authors please comment on this?

      Yes, sucrose has a significant stimulating effect on larva feeding, but the effect is not as large as the predicted based on the sensitivity of the sensillum, the main reasons are as follows: (1) there are many factors affecting larva feeding, sucrose is only one of them; (2) due to the substrate leaf discs also contain sugar, the effect of newly added sucrose may be reduced. After careful consideration, we think it is better to display the feeding area and PI together so that readers have a complete understanding of the data.

      (4) The heterologous expression experiments suggest that co-expression of GR6 with either GR10 or GR5 somehow suppress the response of the GR6 alone to fucose. Am I reading the data correctly? Why would this be? Perhaps the authors could discuss this. In this context, it would help to reproduce all the GR6 data together.

      Your interpretation is reasonable to a certain extent. The result of co-injection might be that Gr10 or Gr5 inhibited the response of Gr6. However, there is another possibility that the amount of Gr6 sRNA was diluted by co-injection of two GRs, resulting in a reduced response of Gr6 to fucose.

      (5) In general, for each results section, it would help to have a sentence or two that interprets the data in the context of previously presented data. This would help the reader digest the data and interpret it as they read along. Currently, the authors summarise the observations and leave all the interpretation to the discussion section.

      We accept the suggestion. In each part of the results, we have added a sentence to explain the above data, which will help readers to clarify the context of the research more easily.

      (6) Is the GR6 data in 4C not lined up correctly?

      Yes, it is right.

      (7) Line 228 suggests that the mutants were validating with qPCRs - I don't see that data.

      The mutants were not validating with qPCR. We used the ordinary PCR technology at the mRNA level to verify whether the related sequences were really deleted in the mutants.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present a detailed study of a nearly complete Entomophthora muscae genome assembly and annotation, along with comparative analyses among related and non-related entomopathogenic fungi. The genome is one of the largest fungal genomes sequenced, and the authors document the proliferation and evolution of transposons and the presence/absence of related genetic machinery to explore how this may have occurred. There has also been an expansion in gene number, which appears to contain many "novel" genes unique to E. muscae. Functionally, the authors were interested in CAZymes, proteases, circadian clock related genes (due to entomopathogenicity/ host manipulation), other insect pathogenspecific genes, and secondary metabolites. There are many interesting findings including expansions in trahalases, unique insulinase, and another peptidase, and some evidence for RIP in Entomophthoralean fungi. The authors performed a separate study examining E. muscae species complex and related strains. Specifically, morphological traits were measured for strains and then compared to the 28S+ITSbased phylogeny, showing little informativeness of these morpho characters with high levels of overlap.

      This work represents a big leap forward in the genomics of non-Dikarya fungi and large fungal genomes. Most of the gene homologs have been studied in species that diverged hundreds of millions of years ago, and therefore using standard comparative genomic approaches is not trivial and still relatively little is known. This paper provides many new hypotheses and potential avenues of research about fungal genome size expansion, entomopathogenesis in zygomycetes, and cellular functions like RIP and circadian mechanisms.

      Strengths:

      There are many strengths to this study. It represents a massive amount of work and a very thorough functional analysis of the gene content in these fungi (which are largely unsequenced and definitely understudied). Too often comparative genomic work will focus on one aspect and leave the reader wondering about all the other ways genome(s) are unique or different from others. This study really dove in and explored the relevant aspects of the E. muscae genome.

      The authors used both a priori and emergent properties to shape their analyses (by searching for specific genes of interest and by analyzing genes underrepresented, expanded, or unique to their chosen taxa), enabling a detailed review of the genomic architecture and content. Specifically, I'm impressed by the analysis of missing genes (pFAMs) in E. muscae, none of which are enriched in relatives, suggesting this fungus is really different not by gene loss, but by its gene expansions.

      Analyzing species-level boundaries and the data underlying those (genetic or morphological) is not something frequently presented in comparative genomic studies, however, here it is a welcome addition as the target species of the study is part of a species complex where morphology can be misleading and genetic data is infrequently collected in conjunction with the morphological data.

      Thank you for your careful reading of our work. We’re glad that you identified these areas as strengths.

      Weaknesses:

      The conclusions of this paper are mostly well supported by data, but a few points should be clarified.

      In the analysis of Orthogroups (OGs), the claim in the text is that E. muscae "has genes in multi-species OGs no more frequently than Enotomophaga maimaiga. (Fig. 3F)" I don't see that in 3F. But maybe I'm really missing something.

      Thank you for catching this. You were, in fact, not missing anything at all. There was a mismatch between the data plotted in F and G and how the caption described these data. We very much apologize for the confusion that this must have caused. We have corrected these plots and also made changes to improve interpretability (see below).

      Also related, based on what is written in the text of the OG section, I think portions of Figure 3G are incorrect/ duplicated. First, a general question, related to the first two portions of the graph. How do "Genes assigned to an OG" and "Genes not assigned to an OG" not equal 100% for each species? The graph as currently visualized does not show that. Then I think the bars in portion 3 "Genes in speciesspecific OG" are wrong (because in the text it says "N. thromboides had just 16.3%" species-specific OGs, but the graph clearly shows that bar at around 50%. I think portion 3 is just a duplicate of the bars in portion 4 - they look exactly the same - and in addition, as stated in the text portion 4 "Potentially speciesspecific genes" should be the simple addition of the bars in portion 2 and portion 3 for each species.

      As mentioned above, we sincerely regret the error made in the plot and for the confusion that this caused. F now reflects the percentage of orthogroups (OGs) that possess at least one representative from the indicated species (left) and the percentage of OGs that are species-specific (only possess genes from one species; right). The latter is a subset of the former. G now reflects the percentage of annotated genes that were assigned an OG, per species, as well as the inverse of this - genes that were not assigned to any OG. These should, and now do, sum to 100%. The “Within species-specific OG” data summed with the “Not assigned OG” data yields the “Potentially species-specific data” in the rightmost column.

      In the introduction, there is a name for the phenomenon of "clinging to or biting the tops of plants," it's called summit disease. And just for some context for the readers, summit disease is well-documented in many of these taxa in the older literature, but it is often ignored in modern studies - even though it is a fascinating effect seen in many insect hosts, caused by many, many fungi, nematodes (!), etc. This phenomenon has evolved many times. Nice discussions of this in Evans 1989 and Roy et al. 2006 (both of whom cite much of the older literature).

      You’re right. We have now clarified that this behavior is called “summit disease” and referenced the suggested articles, along with a more recent review.

      Reviewer #2 (Public Review):

      In their study, Stajich and co-authors present a new 1.03 Gb genome assembly for an isolate of the fungal insect parasite Entomophthora muscae (Entomophthoromycota phylum, isolated from Drosophila hydei). Many species of the Entomophthoromycota phylum are specialised insect pathogens with relatively large genomes for fungi, with interesting yet largely unexplored biology. The authors compare their new E. muscae assembly to those of other species in the Entomophthorales order and also more generally to other fungi. For that, they first focus on repetitive DNA (transposons) and show that Ty3 LTRs are highly abundant in the E. muscae genome and contribute to ~40% of the species' genome, a feature that is shared by closely related species in the Entomophthorales. Next, the authors describe the major differences in protein content between species in the genus, focusing on functional domains, namely protein families (pfam), carbohydrate-active enzymes, and peptidases. They highlight several protein families that are overrepresented/underrepresented in the E. muscae genome and other

      Entomophthorales genomes. The authors also highlight differences in components of the circadian rhythm, which might be relevant to the biology of these insect-infecting fungi. To gain further insights into E. muscae specificities, the authors identify orthologous proteins among four Entomophthorales species. Consistently with a larger genome and protein set in E. muscae, they find that 21% of the 17,111 orthogroups are specific to the species. To finish, the authors examine the consistency between methods for species delineation in the genus using molecular (ITS + 28S) or morphological data (# of nuclei per conidia + conidia size) and highlight major incongruences between the two.

      Although most of the methods applied in the frame of this study are appropriate with the scripts made available, I believe there are some major discrepancies in the datasets that are compared which could undermine most of the results/conclusions. More precisely, most of the results are based on the comparison of protein family content between four Entomophthorales species. As the authors mention on page 5, genome (transcriptome) assembly and further annotation procedures can strongly influence gene discovery. Here, the authors re-annotated two assemblies using their own methods and recovered between 30 and 60% more genes than in the original dataset, but if I understand it correctly, they perform all downstream comparative analyses using the original annotations. Given the focus on E. muscae and the small sample size (four genomes compared), I believe performing the comparisons on the newly annotated assemblies would be more rigorous for making any claim on gene family variation.

      Thank you for this comment. While we did compare gene model predictions for two of these assemblies to assess if this difference could account for discrepancies in gene counts, completely reannotating all non-E. muscae datasets was outside of the scope of this study. In our opinion, the total number of predicted genes in a genome is not a best representation of differences since splitting or fusing gene models can inflate seeming differences; the orthology and domain counts are a more accurate assessment of the content. It’s possible that annotation differences may have inflated some gene family counts, however we will note that similar domain trends were observed between the closest species to E. muscae, Entomophaga maimaiga, suggesting that these differences were not sufficient to prevent us from detecting real biological signals. We look forward to continued improvement of our genome through additional sequencing and more clarity on total gene content of E. muscae.

      The authors also investigate the putative impact of repeat-induced point mutation on the architecture of the large Entomophthorales genomes (for three of the eight species in Figure 1) and report low RIP-like dinucleotide signatures despite the presence of RID1 (a gene involved in the RIP process in Neurospora crassa) and RNAi machinery. They base their analysis on the presence of specific PFAM domains across the proteome of the three Entomophthorales species. In the case of RID1, the authors searched for a DNA methyltransferase domain (PF00145), however other proteins than RID1 bear such functional domain (DNMT family) so that in the current analysis it is impossible to say if the authors are actually looking at RID1 homologs (probably not, RID1 is monophyletic to the Ascomycota I believe). Similar comments apply to the analysis of components of the RNAi machinery. A more reliable alternative to the PFAM analysis would be to work with full protein sequences in addition to the functional domains.

      While we understand this concern regarding domain vs. full length protein, the advantage of the domain search is that HMM-based searches are sensitive to detecting more distantly related homologs. Entomophthoralean fungi are distantly related from the ascomycetes in which these mechanisms have been characterized, so we chose a broader search approach that may identify proteins with similar domain structure, but are not necessarily homologs. These searches are presented in the manuscript as preliminary, but worth further investigation. However, our RID-based analysis did not identify convincing homologs for RID1 in entomophthoralean fungi included in our investigation, and we reported low homology (i.e., 12-14%) among our orthogroup of interest and RID1. We have further edited this section to clarify our understanding that these candidates are not RID1 homologs. We had hoped to avoid this implication, but we felt this investigation and null result were worth reporting.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific points:

      Results:

      "1.03 Gb genome consisting of 7,810 contigs (N50 = 301.1 kb). Additional... resulted in a final contig count of 7,810 (N50 = 329.6 kb)" So you started and ended with the same contig count but a different N50? Is this a typo?

      Yes, this was a typo. Thank you for bringing this to our attention.

      Figure 1D.

      The colors of Complete1x and Complete2x are too similar to tell them apart.

      The colors have been made more distinct.

      Figure 4B.

      I know C. rosea has been found from insects before, but it's mostly a mycoparasite and occasionally an endophyte, and has bioactivity against a lot of things. I just saw that it's listed as an entomopathogen, and I was surprised. Anyway, leave it as is if you want to, but it's definitely better studied and better known (Google Scholar) as a mycoparasite.

      Thanks for this comment. For the sake of including a more diverse representation of entomopathogenic fungi, we have opted to leave this as is.

      Full references (from the public comment)

      Evans, H.C., 1989. Mycopathogens of insects of epigeal and aerial habitats. Insect-fungus interactions, pp.205-238.

      Roy, H.E., Steinkraus, D.C., Eilenberg, J., Hajek, A.E. and Pell, J.K., 2006. Bizarre interactions and endgames: entomopathogenic fungi and their arthropod hosts. Annu. Rev. Entomol., 51, pp.331-357.

      Reviewer #2 (Recommendations For The Authors):

      I believe the manuscript could largely benefit from restructuring the results section to enhance clarity. The results section reads like a lot of descriptive back and forth, so that the reader lacks a clear rationale. The absence of a consistent dataset used for the different comparisons made all along the manuscript makes it hard to follow.

      Minor comments:

      (No line numbers were available so I refer to page numbers).

      p1

      • not sure about the use of "allied" to describe other fungal species in the title and after (sister species?).

      We didn’t want to use the word sister because not all of these species could be considered sister.

      • Genomic defence against transposable elements rather than "anti"?

      We have rephrased to genomic defense.

      p3

      • Extra parenthesis at Bronski et al.

      This is now corrected.

      • What does newly-available mean here?

      We mean recent. A lot of the datasets we used were very new, and we wanted to emphasize that point.

      • The back and forth between genomes and transcriptomes makes it hard to follow, would clarify from the beginning (in addition to the sequencing method - short vs long-read assemblies as in Figure 1B) or perhaps use a consistent dataset for all subsequent comparative analysis in the Entomophthorales.

      We have denoted our transcriptomic datasets in Fig 1C using parentheses.

      p5

      • Perhaps clarify that class II DNA transposons can also "copy" (single-strand excisions can be repaired by the host machinery).

      We have now included mention of “copy” as well as “jump” mechanisms of Class II transposons per your suggestion.

      p6

      • "beginning roughly concurrently", not clear what "began".

      This is now corrected.

      • "control" rather than "protect against"?

      We’ve changed “protect against” to “counter”.

      • I believe RIP has only been observed (experimentally) in a handful of fungal species, all from the Ascomycota phylum.

      Hood et al, 2005 found signatures of RIP in anther-smut fungus and Horns et al, 2012, found evidence of hypermutability across repeat elements within several Pucciniales species.

      • "RID1 contains two DNA_methylase domains", RID1 has one methyltransferase domain according to the reference Freitag et al, 2002.

      Thank you for drawing this to our attention. It is true RID1 has one methyltransferase region; however, the sequence deposited by Freitag et al, 2002 (AAM27408) is predicted by HMMer to have two adjacent Pfam DNA_methylase domains (i.e., PF00145). In this exploratory analysis, we tried to leverage this characteristic to identify candidate proteins of interest. We have reworded this section to clarify this.

      p8

      • Here and after I would use more informative titles for each paragraph.

      With the exception of the headings for Pfam, CAZy and MEROPs analyses, we believe the other headings are informative. We appreciate this comment, but opt to leave the heading titles as is.

      • I believe presenting the orthology analysis before the more in-depth protein family domain search.

      We leveraged the OG analysis mostly as a way to identify potentially unique genes in E. muscae, so we think the current order makes the most sense.

      p10

      • Figures 3F and G are confusing. The legend for Figure 3F mentions "OGs with >= 2 species" while the figure shows "multi-species OGs", and reads as redundant with the "species-specific" OGs. For the "OGs within species" do I understand it correctly that it represents the number of genes assigned to OGs for each species? If yes, the numbers are in contradiction with Figure 3G. And in Figure 3G shouldn't the sum of "genes assigned in OGs" and "genes nor assigned in OGs" add up to 100? I'm probably missing something here, but I would clarify what the different sets of orthogroups are in the figure and in the text (perhaps adopting a pangenome-like nomenclature).

      Thanks for this comment. This legend, unfortunately, reflected an earlier version of the figure and was overlooked prior to submission. We have since amended this and sincerely apologize for the error on our part.

      p12

      • The whole first paragraph reads more like it should be part of an introduction/discussion.

      We’ve moved some of this paragraph to the discussion but left the background information necessary for the reader to understand why we were looking for homologs of wc and frq.

      p13

      • The last paragraph reads like discussion.

      We have revised this paragraph so it now reads: “Because E. muscae is an obligate insect-pathogen only living inside live flies, we investigate the presence of canonical entomopathogenic enzymes in the genome. We find that E. muscae appear to have an expanded group of acid-trehalases compared to other entomopathogenic and non-entomopathogenic Entomophthorales (Fig. 4A), which correlates with the primary sugar in insect blood (hemolymph) being trehalose (Thompson, 2003). The obligate insectpathogenic lifestyle is also evident when comparing the repertoire of lipases, subtilisin-like serine proteases, trypsins, and chitinases in our focal species versus Zoopagomycota and Ascomycota fungi that are not obligate insect pathogens (Fig. 4B). Sordariomycetes within Ascomycota contains the other major transition to insect-pathogenicity within the kingdom Fungi (Araújo and Hughes, 2016). Based on our comparison of gene numbers, Entomophthorales possess more enzymes suitable for cuticle penetration than Sordariomycetes (Fig. 4B). In contrast, insect-pathogenic fungi within Hypocreales possess a more diverse secondary metabolite biosynthesis machinery as evidenced by the absence of polyketide synthase (PKS) and indole pathways in Entomophthorales (Fig. 4C).”

      p15 and 16

      • This all reads as redundant with the previous protein family domain analysis. I would try to merge them.

      Thank you for this comment, however we have opted to maintain the current structure.

      p18

      • In the first sentence, I'm not sure about what was performed here.

      This has been reworded to clarify.

      p20

      • Regarding the assembly, do I understand it correctly that a nuclear genome can be partially haploid / diploid?

      Thanks for your comment. The genome itself is, of course, some integer multiple of n, but based on BUSCO scores our assembly doesn’t appear to have completely collapsed into a haploid genome. We think it makes more sense here to say “partially haploid” than “partially diploid” so have altered this.

      p21

      • RIP has only been observed in a couple of Ascomycetes. RIP-like genomic signatures (GC bias) have been observed elsewhere.

      Hood et al, 2005 found signatures of RIP in anther-smut fungus and Horns et al, 2012, found evidence of hypermutability across repeat elements within several Pucciniales species.

      p23

      • Interesting that the peptidase A2B domain is found uniquely in E. muscae genome and is associated with Ty3 activity. Does the domain often overlap with annotated Ty3 in E. muscae genome? Or how come the domain is not present in other sister species with large genomes full of Ty3 transposons? Could it relate to a new active transposon in E. muscae specifically?

      Thanks for this comment. The domain-based analysis was only performed on the predicted transcriptome of the genome assembly, which does not include the repeat elements (e.g., Ty3). It could be that this peptidase reflects a new active transposon that’s specific to E. muscae, which would certainly be very interesting. We’ve now included this idea in the discussion.

      p26

      • In the case of fungal genomes, I would not advise masking the assembly for repeated sequences prior to gene annotation (in particular given the current focus on protein family variation).

      Thank you for this comment, however we disagree with this assertion as a typical approach for genome annotation in fungi and eukaryotic genomes is to use soft masking of transposable elements before performing gene prediction to avoid over-prediction. While there could be alternative approaches that compare masked or unmasked. This is a recommended protocol for underlying tools like Augustus (10.1002/cpbi.57) and in general descriptions of genome annotation (10.1002/0471250953.bi0401s52). The false positive rate of genes predicted through TE regions is likely to be more a problem than false negatives of missed genes in our experience. Further it seems appropriate to use consistent approach to annotation throughout when including genomes from other sources (e.g., Joint Genome Institute annotated genomes) which also use a repeat masking approach first before annotation. It seems most appropriate to use consistent methods when generating datasets to be used for comparative analyses. It is outside the scope of this project to reannotate all genomes with and without repeat masking.

      p27

      • Interrupted sentence at "Classification of DNA and LTR .. by similarity The".

      This was an unnecessary partial phrase as the information on classification of elements via RepBase was made a few sentences above this.

      p28

      • Enriched/depleted rather than "significantly different"?

      Thank you for this comment, however we have opted to maintain the current phrasing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for a careful review of the manuscript and for their comments, which we address below.

      Reviewer #1:

      (1) …the authors could examine division in a population of cells with only one centrosome. Seeing some restoration of mitotic progression in the absence of SAC-dependent delays would suggest that even one centrosome with uninhibited Eg5 is sufficient to negate SAC-dependent delays, and would limit models for what exactly centrosomes contribute.

      We agree that the one-centrosome question (i.e. whether cells with a single centriole, and therefore a single centrosome, have the same SAC dependence) would be interesting to address. It is known that cells with a single centriole generated through centrinone treatment also have elongated mitoses, like cells lacking centrioles (see Chinen, et. al. 2021, compare Fig 2C to Fig 2D), We have tried this experiment in RPE-1 cells with preliminary results confirming that there is a mitotic delay. It is not known whether this delay requires SAC activity, and we hope to address that in future work. In addition, we note that we show in Fig. 4b-c that cells with the normal centrosome number but with a single focus of microtubules due to Eg5 inhibition, were also sensitive to MPS1 inhibition. This suggests that centrosome presence alone cannot overcome the requirement for SAC activity, rather, the centrosomes need to be able to separate in a timely fashion.

      Reviewer #2:

      (1) An example is how to interpret the effect of Aurora B inhibition, which does not block acentrosomal cell division. If Aurora B is required for SAC activity, it suggests this effect of MPS1 may be a function other than SAC. Given the complexity of the SAC, it would be informative to test other SAC components. Instead, the authors conclude that the mitotic delay caused by MPS is required for acentrosomal cell division. I don't think they have ruled out, or even addressed other functions of MPS1.

      We agree that it is possible that functions of the MPS1 kinase other than those involved in the SAC could be important. Although we have not directly tested other SAC components, we did “mimic” SAC activity by delaying anaphase onset using APC/C inhibition while also inhibiting MPS1 (Fig. 2b-b’’). The fact that this restored division suggests that it is the SAC function of MPS1 kinase activity that is relevant to this delay. 

      (2) The authors find that when both the APC and MPS1 are inhibited, the cells eventually divide. These results are intriguing, but hard to interpret. The authors suggest that the failure to divide in MPS1-inhibited cells is because they enter anaphase, and then must back out. This is hard to understand and there is not data supporting some kind of aborted anaphase. Is the division observed with double inhibition some sort of bypass of the block caused by MPS1 inhibition alone? It is not clear why inhibition of APC causes increased cell division when MPS1 is inhibited.

      As described in the response to 1), we believe that reinstating the delay to anaphase onset by APC/C inhibition provided the time needed to establish a functional bipolar spindle even in the absence of the SAC, and that cells eventually overcome the proTAME block and proceed through mitosis, as observed in control cells in our experiments. We note that we chose concentrations of proTAME specifically for each cell line (RPE-1 and U2OS) that would result only in a temporary block, following on the work of Lara-Gonzalez and Taylor (2012), who reported similar findings for HeLa cells.

      (3) The authors characterize MTOC formation in these cells, which is also interesting. MTOCs are established after NEB in acentrosomal cells. Indeed, forming these MTOCs is probably a key mechanism for how these cells complete a division, like mouse oocytes.

      We agree that the observed intermediates of MTOCs are interesting and likely crucial to the mechanism of cell division in acentrosomal somatic cells. We are investigating further the differences and similarities between somatic cell MTOC formation in the absence of centrosomes and the naturally-occurring form of that process in oocytes.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      Seleit and colleagues set out to explore the genetics of developmental timing and tissue size by mapping natural genetic variation associated with segmentation clock period and presomitic mesoderm (PSM) size in different species of Medaka fish. They first establish the extent of variation between five different Medaka species of in terms of organismal size, segmentation rate, segment size and presomitic mesoderm size, among other traits. They find that these traits are species-specific but strongly correlated. In a massive undertaking, they then perform developmental QTL mapping for segmentation clock period and PSM size in a set of ~600 F2 fish resulting from the cross of Orizyas sakaizumii (Kaga) and Orizyas latipes (Cab). Correlation between segmentation period and segment size was lost among the F2s, indicating that distinct genetic modules control these traits. Although the researchers fail to identify causal variants driving these traits, they perform proof of concept perturbations by analyzing F0 Crispants in which candidate genes were knocked out. Overall, the study introduces a completely new methodology (QTL mapping) to the field of segmentation and developmental tempo, and therefore provides multiple valuable insights into the forces driving evolution of these traits.

      Major comments: - The first sentence in the abstract reads "How the timing of development is linked to organismal size is a longstanding question". It is therefore disappointing that organismal size is not reported for the F2 hybrids. Was larval length measured in the F2s? If so, it should be reported. It is critical to understand whether the correlation between larval size and segmentation clock period is preserved in F2s or not, therefore determining if they represent a single or separate developmental modules. If larval length data were not collected, the authors need to be more careful with their wording.

      The question the reviewer raises here is indeed a very relevant one, and a question that we also were curious about ourselves. While it was not possible (logistically) to grow the 600 F2 fish to adulthood, we did measure larval length in a subset of F2 hatchling (n=72) to ask precisely the question the reviewer raises here. Our results (new Supplementary Figure 5) show that the correlation between larval length and segmentation timing (which we report across the Oryzias species) is absent in the F2s. This indeed argues that the traits represent separate developmental modules.

      In the current version of the paper, organismal size is often incorrectly equated to tissue size (e.g. PSM size, segment size). For example, in page 3 lines 33-34, the authors state that faster segmentation occurred in embryos of smaller size (Fig. 1D). However, Fig. 1D shows correlation between segmentation rate and unsegmented PSM area. The appropriate data to show would be segmentation rate vs. larval or adult length.

      The reviewer is correct. We have now linked the data more clearly to data we show in Supplementary Figure 1, which shows that adult length and adult mass are strongly correlated (S1A) and that adult mass is in turn strongly correlated with segmentation rate in the different Oryzias species (S1B). Additionally main Figure 1B shows that larval length is correlated with PSM length. We have corrected the main text to reflect these relationships more clearly.

      • Is my understanding correct in that the her7-venus reporter is carried by the Cab F0 but not the Kaga F0? Presumably only F2s which carried the reporter were selected for phenotyping. I would expect the location of the reporter in the genome to be obvious in Figure 3J as a region that is only Cab or het but never Kaga. Can the authors please point to the location of the reporter?

      The reviewer is correct. Indeed the location of our her7-venus KI is on chromosome 16 and the recombination patterns on this chromosome overwhelmingly show either Hom Cab (green) or Het Cab/Kaga (Black). This is expected as we selected fish carrying the her7-venus KI for phenotyping.

      • devQTL mapping in this study seems like a wasted opportunity. The authors perform mapping only to then hand pick their targets based on GO annotations. This biases the study towards genes known to be involved in PSM development, when part of the appeal of QTL mapping is precisely its unbiased nature and the potential to discover new functionally relevant genes. The authors need to better justify their rationale for candidate prioritization from devQTL peaks. The GO analysis should be shown as supplemental data. What criteria were used to select genes based on GO annotations?

      We have now commented on these valid points and outlined our rationale in more detail in the text (page 4, lines 20-30). Our rationale now also includes selection of differentially expressed genes (n=5 genes) that fall within segmentation timing devQTL hits (for more details see below). Essentially, while we indeed finally focused on the proof of principle using known genes, these genes were previously not known to play a role in either setting the timing of segmentation or controlling the size of the PSM. Hence, we do think our strategy demonstrates the "the potential to discover new functionally relevant genes", even though the genes themselves had been involved overall in somitogenesis. We added the GO analysis as supplemental data as requested (new Supplementary Figure 7E).

      • Analysis of the predicted functional consequence of divergent SNPs (Fig. S6B, F) is superficial. Among missense variants, which genes harbor the most deleterious mutations? Which missense variants are located in highly conserved residues? Which genes carry variants in splice donors/acceptors? Carefully assessing the predicted effect of SNPs in coding regions would provide an alternative, less biased approach to prioritize candidate genes.

      We now included our analysis of SNPs based on the Variant effect predictor (VEP) tool from ensembl. This analysis does rank the predicted severity of the SNP on protein structure and function (Impact: low, moderate, high) and does annotate which variants can affect splice donors/acceptors. The VEP analysis for both phenotypes is now added to the manuscript as supplemental data (new Supplementary Data S2, S5).

      • Another potential way to prioritize candidate genes within devQTL peaks would be to use the RNA seq data. The authors should perform differential expression analysis between Kaga and Cab RNA-seq datasets. Do any of the differentially expressed genes fall within the devQTL peaks?

      As suggested we have performed this additional experiment and report the RNAseq differential analysis in new Supplement Figure 7C-D. The analysis revealed 2606 differentially expressed genes in the PSM between Kaga and Cab, five of which were candidate genes from the devQTL analysis. We now tested all of these (5 in total, 4 new and 1 previously targeted adgrg1) for segmentation timing by CRISPR/Cas9 KO in the her7-venus background, none of which showed a timing phenotype (new Supplementary Figure 7F-F'). We provide the complete set of results in new Supplementary Figure 7 , Supplementary Data file 3 (DE-genes), all data were deposited on publicly available repository Biostudies under accession number: E-MTAB-13927.

      • The use of crispants to functionally test candidate genes is inappropriate. Crispants do not mimic the effect of divergent SNPs and therefore completely fail to prove causality. While it is completely understandable that Medaka fish are not amenable to the creation of multiple knock-in lines where divergent SNPs are interconverted between species, better justification is needed. For instance, is there enough data to suggest that the divergent alleles for the candidate genes tested are loss of function? Why was a knockout approach chosen as opposed to overexpression?

      We agree with the reviewer that we do not address the causality of SNPs with the CRISPR/Cas9 KO approach we followed. And medaka does offer the genome editing capabilities to create tailored sequence modifications. So in principle, this can be done. In practice, however, we reasoned that any given SNP will contribute only partially to the observed phenotypes and combinatorial sequence edits are simply very laborious given the current state of the art in genome editing technologies. We therefore opted for an alternative proof of principle approach that aims to "to discover new functionally relevant genes", not SNPs.

      -Along the same line, now that two candidate genes have been shown to modulate the clock period in crispants (mespb and pcdh10b), the authors should at least attempt to knock in the respective divergent SNPs for one of the genes. This is of course optional because it would imply several months of work, but it would significantly increase the impact of the study.

      As above, this is in principle the correct rationale to follow though very time, cost and labour intensive. It is for the later practical consideration that we decided not to follow this option.

      Minor Comments - It would be highly beneficial to describe the ecological differences between the two Medaka species. For example, do the northern O. sakaizumii inhabit a colder climate than the southern O. latipes? Is food more abundant or easily accessible for one species compared to the other? What, if anything, has been described about each species' ecology?

      There are indeed differences in the ecology of both species, with the northern O.sakaizumii inhabiting a colder climate than the southern O. latipes. In addition, it is known that the breeding season is shorter in the north than the south, and also there is the fact that northern species have been shown to have a faster juvenile growth rate than southern species. While it would be premature to link those ecological factors to the timing differences we observe, we can certainly speculate. A line to this effect has been added to the main text (Page 5, line 28-30).

      • The authors describe two different methods for quantifying segmentation clock period (mean vs. intercept). It is still unclear what is the difference between Figs. 3A (clock period), S4A (mean period) and S4B (intercept period). Is clock period just mean period? Are the data then shown twice? How do Fig. 3A and S4A differ?

      The clock period shown in all the main figures is the intercept period, which was also used for the devQTL analysis. Both measurements (mean and intercept) are indeed highly correlated and we include both in supplement for completeness.

      • devQTL as shorthand for developmental QTL should be defined in page 4 line 1 (where the term first appears), not later in line 12 of the same page.

      Noted and corrected, we thank the reviewer for spotting this error.

      • Python code for period quantification should be uploaded to Github and shared with reviewers.

      All period quantification code that was used in this study was obtained from the publicly available tool Pyboat (https://www.biorxiv.org/content/10.1101/2020.04.29.067744v3). All code that is used in PyBoat is available from the Github page of the creator of the tool (https://github.com/tensionhead/pyBOAT). Both are linked in the references and materials and methods sections.

      • RNA-seq data should be uploaded to a publicly accessible repository and the reviewer token shared with reviewers.

      We have uploaded all RNA-sequencing Data to public repository BioStudies under accession numbers : E-MTAB-13927, E-MTAB-13928. This information is now also added to material and methods in the manuscript text.

      Why are the maintenance (27-28C) vs. imaging (30C) temperatures different?

      Medaka fish have a wide range of temperatures they can physiologically tolerate, i.e. 17-33. The temperature 30C was chosen for practical reasons, i.e. a slightly faster developmental rate enables higher sample throughput in overnight real-time imaging experiments.

      • For Crispants, control injections should have included a non-targeting sgRNA control instead of simply omitting the sgRNA.

      We agree a non-targeting sgRNA control can be included, though we choose a different approach. For clarity, we now also include a control targeting Oca2, a gene involved in the pigmentation of the eye to probe for any injection related effect on timing and PSM size. As expected, 3 sgRNAs + Cas9 against Oca2 had no impact on timing or PSM size. This data is now shown in new Supplementary Figure 9 F-G'.

      It is difficult to keep track of the species and strains. It would be most helpful if Fig. S1 appeared instead in main figure 1.

      We agree and included an overview of the phylogenetic relationship of all species and their geographical locales in new Figure 1 A-B.

      Significance

      • The study introduces a new way of thinking about segmentation timing and size scaling by considering natural variation in the context of selection. This new framing will have an important impact on the field.
      • Perhaps the most significant finding is that the correlation between segment timing and size in wild populations is driven not by developmental constraints but rather selection pressure, whereas segment size scaling does form a single developmental module. This finding should be of interest to a broad audience and will influence how researchers in the field approach future studies.
      • It would be helpful to add to the conclusion the author's opinion on whether segmentation timing is a quantitative trait based on the number of QTL peaks identified.
      • The authors should be careful not to assign any causality to the candidate genes that they test in crispants.
      • The data and results are generally well-presented, and the research is highly rigorous.
      • Please note I do have the expertise to evaluate the statistical/bioinformatic methods used for devQTL mapping.

      Reviewer #2

      Evidence, reproducibility and clarity

      Seleit et al. investigate the correlation between segment size, presomitic mesoderm and the rhythm of periodic oscilations in the segmentation clock of developing medaka fish. Specifically, they aim to identify the genetic determinants for said traits. To do so, they employ a common garden approach and measure such traits in separate strains (F0) and in interbreedings across two generations (F1 and F2). They find that whereas presomitic mesoderm and segment size are genetically coupled, the tempo of her7 oscilations it is not. Genetic mapping of the F0 and F2 progeny allows them to identify regions associated to said traits. They go on an perturb 7 loci associated to the segmentation clock and X related to segment size. They show that 2/7 have a tempo defect, and 2/ affect size.

      Major comments: The conclusions are convincing and well supported by the data. I think the work could be published as is in its current state, and no additional experiments that I can think of are needed to support the claims in the paper.

      Minor comments: - The authors could provide a more detailed characterization of the identified SNPs associated to the clock and to PSM size. For the segmentation clock, the authors identify 46872 SNPs, most of which correspond to non-coding regions and are associated to 57 genes. They narrow down their approach to those expressed in the PSM of Cab Kaga. Was the RNA selected from F1 hybrids? I wonder if this would impact the analysis for tempo and or size in any way, as F2 are derived from these, and they show broader variability in the clock period than the F0 and F1 fishes.

      The RNA was obtained from the pure F0 strains and we have now extended this analysis by deep bulk-RNA sequencing and differential gene expression analysis. As indicated also to reviewer 1, this revealed 2606 differentially expressed genes in the unsegmented tails of Kaga and Cab embryos, some of which occurred in devQTL peaks. Based on this information we expanded our list of CRISPR/Cas9 KOs by targeting all differentially expressed genes (5 in total, 4 new and 1 previously targeted) for segmentation timing, none of which showed a timing phenotype (new Supplementary figure 7C-D). We provide the complete set of results in new Supplementary Figure 7, Supplementary Data file 3 (DE-genes). All data were deposited on publicly available repository Biostudies under accession number: E-MTAB-13927.

      It would be good if the authors could discuss if there were any associated categories or overall functional relationships between the SNPs/genes associated to size. And what about in the case of timing?

      In the case of PSM size there were no clear GO terms or functional relationships between the genes that passed the significance threshold on chromosome 3.

      For the 35 genes related to segmentation timing, there were a number of GO enrichment terms directly related to somitogenesis. We have included the GO analysis in the new Supplementary Figure 7E.

      • Have any of the candidate genes or regulatory loci been associated to clock defects (57) or segment size (204) previously in the literature?

      To the best of our knowledge none of the genes have been associated with clock or PSM size defects so far. It might be worthwhile using our results to probe their function in other systems enabling higher throughput functional analysis, such as newly developed organoid models.

      • When the authors narrow down the candidate list, it is not clear if the genes selected as expressed in the PSM are tissue specific. If they are, I wonder if genes with ubiquitous expression would be more informative to investigate tempo of development more broadly. It would be good if the authors could specifically discuss this point in the manuscript.

      We have not addressed the spatial expression pattern of the 35 identified PSM genes in this study, so we cannot speculate further. But the reviewer raises an important point, how timing of individual processes (body axis segmentation) are linked at organismal scale is indeed a fundamental, additional, question that will be addressed in future studies, indeed the in-vivo context we follow here would be ideal for such investigations.

      Can the authors speculate mechanistically why mespb or pchd10b accelerates the period of her7 oscillations?

      While we do not have a mechanistic explanation yet, an additional experiment we performed, i.e. bulk-RNAsequencing on WT and mespb mutant tails, provided additional insight, we now added this data to the manuscript . This analysis revealed 808 differentially expressed genes between wt and mespb mutants. Interestingly, many of these affected genes are known to be expressed outside of the mespb domain, i.e. in the most posterior PSM (i.e. tbxt, foxb1,msgn1, axin2, fgf8, amongst others). This indicates that the effect of mespb downregulation is widespread and possibly occurs at an earlier developmental stage. This requires more follow up studies. This data is now shown in new Supplementary figure 9A, Supplementary Data file S4. We now comment on this point in the revised manuscript.

      • Are there any size difference associated to the functionally validated clock mutants?

      We addressed this point directly and added this analysis as supplementary Figure 9H-H'. While pcdh10b mutants do not show any detectable difference in PSM size, we find a small, statistically significant reduction in PSM size (area but not length) in mespb mutants. All this data is now included in the revised manuscript.

      -Ref 27 shows a lack of correlation between body size and the segmentation period in various species of mammals. The work supports their findings, and it would be good to see this discussed in the text.

      We are not certain how best to compare our in-vivo results in externally developing fish embryos to in-vitro mammalian 2-D cell cultures. In our view, the correlation of embryo size, larval and adult size that we find in Oryzias might not necessarily hold in mammalian species, which would make a comparison more difficult. We do cite the work mentioned so the reader is pointed towards this interesting, complementary literature.

      Significance

      The work is quite remarkable in terms of the multigenerational genetic analysis performed. The authors have analysed >600 embryos from three separate generations to obtain quantitative data to answer their question (herculean task!). Moreover, they have associated this characterization to specific SNPs. Then, to go beyond the association, they have generated mutant lines and identified specific genes associated to the traits they set out to decipher.

      To my knowledge, this is the first project that aims to identify the genetic determinants for developmental timing. Recent work on developmental timing in mammals has focused on interspecies comparisons and does not provide genetic evidence or insight into how tempo is regulated in the genome. As for vertebrates, recent work from zebrafish has profiled temperature effects on cell proportions and developmental timing. However, the genetic approach of this work is quite elegant and neat.

      Conceptually, it is quite important and unexpected that overall size and tempo are not related. Body size, lifespan, basal metabolic rates and gestational period correlate positively and we tend to think that mechanistically they would all be connected to one another. This paper and Lazaro et al. 2023 (ref 27) are one of the first in which this preconception is challenged in a very methodical and conclusive manner. I believe the work is a breakthrough for the field and this work would be interesting for the field of biological timing, for the segmentation clock community and more broadly for all developmental biologists.

      My field is quantitative stem cell biology and I work on developmental timing myself, so I acknowledge that I am biased in the enthusiasm for the work. It should be noted that as an expert on the field, I have identified instances where other work hasn't been as insightful or well developed in comparison to this piece. It is also worth noting that I am not an expert in fish development, phylogenetic studies or GWAS analyses, so I am not capable to asses any pitfalls in that respect.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      __Summary: __

      This manuscript explores the temporal and spatial regulation of vertebrate body axis development and patterning. In the early stages of vertebrate embryo development, the axial mesoderm (presomitic mesoderm - PSM) undergoes segmentation, forming structures known as somites. The exact genetic regulation governing somite and PSM size, and their relationship to the periodicity of somite formation remains unclear.

      To address this, the authors used two evolutionarily closely related Medaka species, Oryzias sakaizumii and Oryzias latipes, which, although having distinct characteristics, can produce viable offspring. Through analysis spanning parental (generation F0) and offspring (generations F1 and F2) generations, the authors observed a correlation between PSM and somite size. However, they found that size scaling does not correlate with the timing of somitogenesis.

      Furthermore, employing developmental quantitative trait loci (devQTL) mapping, the authors identified several new candidate loci that may play a role during somitogenesis, influencing timing of segment formation or segment size. The significance of these loci was confirmed through an innovative CRISPR-Cas9 gene editing approach.

      This study highlights that the spatial and temporal aspects of vertebrate segmentation are independently controlled by distinct genetic modular mechanisms.

      __Major comments: __

      1) In the main text page 3, lines 11 and 12, the authors state that the periodicity of the embryo clock of the F1 generation is the intermediate between the parental F0 lineages. However, the authors look only at the periodicity of the Cab strain (Oryzias latipes) segmentation clock. The authors should have a reporter fish line for the Kaga strain (Oryzias sakaizumii) to compare the segmentation clock of both parental strains and their offspring. Since it could be time consuming and laborious, I advise to alternatively rephrase the text of the manuscript.

      We agree a careful distinction between segment forming rate (measured based on morphology) and clock period (measured using the novel reporter we generated) is essential. We show that both measures correlate very well in Cab, in both F0 and F1 and F2 carrying the Cab allele. For Kaga F0, we indeed can only provide the rate of somite formation, which nevertheless allows comparison due to the strong correlation to the clock period we have found. We have rephrased the text accordingly.

      2) It is evident that only a few F0 and F1 animals were analyzed in comparison with the F2 generation. Could the authors kindly explain whether and how this could bias or skew the observed results?

      We provide statistical evidence through the F-test of equality that the variances between the F0, F1 and F2 samples are equal. Additionally if we sub-sample and separate the F2 data into groups of 100 embryos (instead of all 638) we get the same distribution of the F2s. We therefore believe that this is sufficient evidence against a bias or skew in the results.

      3) It would be interesting to create fish lines with the validated CRISPR-Cas9 gene manipulations in different genetic contexts (Cab or Kaga) to analyze the true impact on the segmentation clock and/or PSM & somite sizes.

      We agree with the reviewer this would in principle be of interest indeed, please see our response to reviewer 1 earlier.

      4) Please add the results of the Go Analysis as supplementary material.

      We have added the GO analysis in new Supplementary Figure 7E.

      __Minor comments: __

      1) In the main text, page 2, line 29, Supplementary Figure 1D should be referenced.

      We have added a clearer phylogeny and geographical location of the different species in new Figure 1 A-B. And reference it at the requested location.

      2) In the main text, page 2, line 32, the authors refer to Figure 1B, but it should be 1C.

      We have corrected the information.

      3) Regarding the topic "Correlation of segmentation timing and size in the Oryzias genus" the authors should also give information on the total time of development of the different Oryzias species, as well as the total number of formed somites.

      We follow this recommendation and have added this information in new Supplementary Figure 5. We also now include segment number measured in F2 embryos. We indeed view segmentation rate as a proxy for developmental rate, which however needs to be distinguished from total developmental time. The latter can be measured for instance by quantifying hatching time, which we did. These measurements show that Kaga, Cab and O.hubbsi embryos kept at constant 28 degrees started hatching on the same day while O.minutillus and O.mekongensis embryos started hatching one day earlier. We have not included this data in the manuscript because we think a distinction should be made between rate of development and total development time.

      4) In Figures 3A and B, please add info on the F1 lines for comparison.

      The information on F1 lines is provided in Supplementary Figure 3

      5) Supplementary Figures 2F shows that the generation F1 PSM is similar to Cab F0, and not an intermediate between Kaga F0 and Cab F0. This is interesting and should be discussed.

      We show that the F1 PSM is indeed closer to the PSM of Cab than it is to the Kaga PSM. This is indeed intriguing and we have now commented on this point directly in the text.

      6) Supplementary Figures 6C to H are not mentioned either in the main text or in the extended information. Please add/mention accordingly.

      We have added references to both in the text

      7) The order of Supplementary Figure 8 E to H and A to D appears to be not correct and not following the flow of the text. Please update/correct accordingly.

      We have updated the text accordingly.

      8) The authors should choose between "Fig.", "Fig", "fig.", "fig" or "Figure". All 'variants' can be found in the text.

      Noted, and updated. Fig. is used for main figures and fig. is used for supplementary figures.

      9) The color scheme of several figures (graphs with colored dots) should be revised. Several appear to be difficult to discern and analyze.

      We have enhanced the colours and increased the font on the figure panels. The colour panel was chosen to be colour-blind friendly.

      10) Please address/discuss following questions: What are the known somitogenesis regulating genes in Medaka? How do they correlate with the new candidates?

      The candidates we found and tested had not been implicated in regulating the tempo of segmentation or PSM size, while for some a role in somite formation had been previously established, hence the enrichment in GO analysis Somitogenesis.

      Reviewer #3 (Significance (Required)):

      General assessment:

      This interesting manuscript describes a novel approach to study and find new players relevant to the regulation of vertebrate segmentation. By employing this innovative methodology, the authors could elegantly demonstrate that the segmentation clock periodicity is independent from the sizes of the PSM and forming somites. The authors were further able to find new genes that may be involved in the regulation of the segmentation clock periodicity and/or the size of the PSM & somites. A limitation of this study is the fact that the results mainly rely on differences between the two species. The integration of additional Medaka species would be beneficial and may help uncover relevant genes and genetic contexts.

      Advance:

      To my best knowledge this is the first time that such a methodology was employed to study the segmentation clock and axial development. Although the topic has been extensively studied in several model organisms, such as mice, chicken, and zebrafish, none of them correlated the size of the embryonic tissues and the periodicity of the embryo clock. This study brings novel technological and functional advances to the study of vertebrate axial development.

      Audience:

      This work is particularly interesting to basic researchers, especially in the field of developmental biology and represents a fresh new approach to study a core developmental process. This study further opens the exciting possibility of using a similar methodology to investigate other aspects of vertebrate development. It is a timely and important manuscript which could be of interest to a wider scientific audience and readership.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:* In this paper the authors explore the function of Syndecan in Drosophila stem cells focussing primarily on the intestinal stem cells. They use RNAi knockdown to conclude that Syndecan is required for long term stem cell maintenance as its knockdown results in apoptosis. They suggest that this effect is independent of LINC complex proteins but is associated with changes to nuclear morphology and DNA damage. They go on to show that a similar impact on nuclear shape can be seen in larval neuroblasts but not in stem cells of the female germline. *

      Major Comments: *The key conclusion that underpins the paper is that reduced Syndecan causes loss of stem cells. This is based entirely on evidence from cell-type specific RNAi using 3 independent RNAi lines. Overexpression has no phenotype and there is no analysis of loss of function mutants. SdcRNAi3 gives strong phenotypes that are statistically significant and is used throughout the paper. SdcRNAi2 gives comparatively moderate phenotypes which trend in the same direction but it is not clear if these are statistically significant (Fig S1). SdcRNAi line 1 appears to have very little effect (and if anything trends in the opposite direction in S1A). In addition, the knockdown efficiency of the three lines has not been assessed. Another possible concern given the dependence on RNAi3 is that the RNAi control line used is not an ideal match for the VDRC GD RNAi lines as it is in a different genetic background. In order to robustly draw conclusions: the phenotypes with RNAi lines 1 and 2 should be tested for significance; the extent of knockdown in each should be quantified either by qPCR in whole tissue knockdown, or by staining for protein levels if possible, to assess whether the variation in phenotypes is due to different knockdown levels. The use of a loss of function mutant in clones or tissue specific CRISPR-Cas9 KO or KD would also significantly increase confidence in the findings. *

      • Our qPCR data indicate that SdcRNAi3 produces the most efficient knockdown, whilst SdcRNAi1 generates the weakest knockdown. The new manuscript version will incorporate this data in figure S1. Knockdown efficacy of SdcRNAi 3 has also been previously reported (Eveland et al., 2016).

      • We apologise for omitting to add the statistical tests on phenotypic categories in figure S1A, this will be revised. We confirm that all Sdc RNAi phenotypic distributions are significantly different to that seen for age-matched controls (p- It should also be noted that despite weaker knockdowns with SdcRNAi1 and 2, we still observed statistically significant ISC depletion after 28 days of RNAi expression - we will add this data in figure S1. Overall, we are confident about Sdc’s role in maintaining intestinal stem cells.

      *Similarly, the evidence for a lack of LINC protein role in the phenotype relies on single RNAi lines without validation of knockdowns. The authors should ideally validate these lines in this system or reference other studies that have validated the lines in this or other contexts. *

      • The klarsicht RNAi line (BDSC 36721) and klaroid RNAi line (BDSC 40924) used in this study have been validated and used in other studies. (Falo-Sanjuan & Bray, 2022; Collins et al., 2017)

      • For Msp300 RNAi knockdown we have used two independent RNAi lines which gave similar results. We will amend the text to clarify these points. In addition, the line reported in the manuscript was previously validated (Dondi et al., 2021; Frost et al., 2016).

      Minor Comments: *The figures are generally very clear but some of the IF image panels are very small and require significant on-screen enlargement to be legible. In particular in Figure 1B the cross section views make it difficult to assess expression in the different cell types (and don't show very many cells), could this be shown in wholemount or as separated channels in a supplementary figure? In addition, it would strengthen the argument to include counterstains for markers of the different cell types (particularly to distinguish ISC/EB from EE). This could include esg-lacZ to mark ISC/EBs or prospero for EEs. However, if a broader view of these panels makes it clearer that all epithelial cells are expressing Syndecan this may not be essential. *

      • We are happy to incorporate larger fields of view, and co-immunostaining with different cell type markers.

      *Syndecan is referred to throughout as a stem cell regulator. This implies that in certain contexts or in response to certain stimuli its expression may be altered to elicit a stem cell response but no examples of this are shown. Moreover, only knockdown and not overexpression gives phenotypes suggesting its role may be as a required protein than a regulator. Either examples of its expression being modulated in homeostasis or in response to a challenge could be included or the wording could be amended. *

      • We agree with the reviewer and will amend the wording.

      *Expression of Syndecan in neuroblasts is described as data not shown, it would be better to include this for completeness. *

      • We will add this data in figure 4.

      *In addition to the intestinal validation of the Syndecan RNAi lines, validation of knockdown in the germline would be valuable to support the conclusions of Fig S4 given differences of knockdown in the germline with some RNAi lines (although inclusion of Dicer in the driver line should have overcome this). *

      • Sdc expression is very low in the germline, compared to the surrounding somatic cells, therefore we are not confident that we can detect differences in expression level after knockdown. We suggest adding a panel in figure S4 to show the low expression and adding a comment in the text. Reviewer #1 (Significance (Required)): *The study describes a potentially very interesting, novel link between Syndecan, nuclear shape and apoptosis in cycling cells that could have broad relevance. If fully validated this could have implications for other stem cell populations, including those in mammals and disease relevance in the context of cancer. The paper is fundamentally descriptive in nature and so the level of significance hinges on the strength of evidence and how interesting the phenotype itself is. At this stage the audience will be primarily in the areas of fundamental research in biology of the nucleus and cytoskeleton. Defining the mechanistic link between Syndecan and nuclear morphology will be a critical next step and while not essential for this study would significantly increase the likely interest in the paper. *

      • We thank the reviewer for these constructive comments. We agree that discovering the mechanistic links between Syndecan and nuclear morphology in future studies, in this and other model systems, will be relevant to many areas of biological research.

      *In terms of significance in stem cell biology the distinction between a regulator and a requirement to prevent stem cell apoptosis is important and the lack of evidence for a context in which Syndecan plays a regulatory role somewhat detracts from the breadth of impact. My field of expertise is in epithelial stem cell biology. *

      • We agree and will amend our wording.

      Reviewer #2 *(Evidence, reproducibility and clarity (Required)): ** Summary: Stem cell (SC) maintenance and proliferation are necessary for tissue morphogenesis and homeostasis. The basement membrane (BM) has been shown to play a key role in regulating stem cell behavior. In this work, the authors unravel a new connection between the receptor for BM components Syndecan (Sdc) and SC behavior, using Drosophila as model system. They show that Sdc is required for intestine stem cell (ISC) maintenance, as Sdc depletion results in their progressive loss. At a cellular level, they also find that Sdc depletion in ISCs affects cell survival, cell and nuclear shape, nuclear lamina and DNA damage. In addition, they show that the defects in shape are not related to cell death. They also find that Sdc depletion in neural stem cells also results in nuclear envelope remodeling during cell division. This is in contrast to what happens in female germline stem cells where Sdc does not seem to be required for their survival or maintenance. In general, I believe that this work unravels a connection between Sdc and stem cell behavior. However, I think the study is still at a preliminary stage, as how Sdc regulates different facets of stem cell behavior remains unclear.

      Major comments: 1. To clearly show that the cellular changes produced by loss of Sdc are not due to cell death, one should quantify the ISC area and shape of Sdc-depleted ISCs expressing DIAP1 and compare it to that of Sdc-depleted ISCs. As DIAP1 overexpression only partially rescues ISC loss due to Sdc depletion, one should show that the Sdc-depleted ISCs expressing DIAP1 that still show cellular changes are not dying, as overexpression of Diap1 might not be sufficient to completely rescue cell death in all Sdc-depleted ISCs. In fact, apoptosis in Sdc depleted guts and the ability of Diap1 overexpression to rescue cell death should be analyzed using markers of caspase activity, this will provide a better idea of the contribution of apoptosis to the phenotypes associated to Sdc depletion. *

      • We can, as suggested by the reviewer, quantify the area and shape of Sdc-depleted ISCs expressing DIAP1 and compare it to that of Sdc-depleted ISCs. However, our immunostainings with anti-Caspase 3 or Drice do not pick up apoptotic cells in the fly gut. This is not entirely unexpected, as apoptosis is unfortunately not easily detected in this tissue. In the absence of a positive readout of apoptosis, we will not be able to discriminate between apoptotic and non-apoptotic stem cells when quantifying area and shape and will only have global quantifications.

      • The authors show that ISC loss is associated with reduced cell density, suggesting that this is most likely due to failure in new cell production. What do they mean with cell production? Is this related to a problem in regulating cell division or to the fact that as some ISCs are lost by apoptosis there is progressively less ISCs or to a combination of both? I think that cell division should be monitored throughout time as well as cell death in ISCs.*

      • Based on esgF/O experiments (fig. 1D-F and S1C) where we can trace the production of new cells with GFP, we know that Sdc RNAi expression (i) impairs the appearance of newly differentiated cells in the tissue and (ii) results in the disappearance of progenitor cells (fig. S1C). Supporting these points, (i) we have observed PH3+ mitotic stem cells upon Sdc RNAi, so we are confident the cells are able to initiate cell division (see also fig. 2G), and (ii) we have occasionally noted in fixed samples stem cells looking like they were in the process of delaminating. Overall, the failure of cell production is likely related to problems with both completion of cell division and progressive stem cell loss. High resolution live imaging will in future give us a better insight into stem cell division dynamics/behaviour, however, the technical improvements required are beyond the scope of this project. In the meantime, we propose to clarify our statement in the text.

      • The authors report that in contrast to what happens when Sdc is eliminated from ISCs, its elimination from EEs results in an increase in the number of these cells. An explanation for this result is missing.*

      • Based on known roles of Syndecan in other Drosophila tissues (Johnson et al., 2004; Steigemann et al., 2004; Chanana et al., 2009; Schulz et al., 2011), we speculate that Syndecan may contribute to robo/slit signalling, which is an important regulator of EE activity in the Drosophila gut (Biteau & Jasper 2014; Zeng et al., 2015). We propose to amend the text to express this hypothesis.

      • The authors suggest that "Sdc function is unlikely to be fully accounted for by individual LINC complex proteins, although these proteins might act redundantly". Checking redundancy seems a straight forward experiment, which only requires the simultaneous expression of RNAis against several of these proteins. This would help to settle the implication of LINC complex proteins on Sdc function.*

      • To check redundancy, we propose to combine Klaroid RNAi with Msp300 or Klarsicht RNAis, and express two RNAis at a time in ISCs. We will then measure stem cell proportions and the proportion of ISCs with DNA damage.

      • Although quantification of DNA damage, by immunolabelling with gH2Av, reveals that knockdown of individual LINC complex components did not recapitulate the damage observed upon Sdc depletion (Fig.3G), the image shown in Fig.3F reflects much higher levels of gH2Av in Msp300 RNAi cells compared to Sdc RNAi cells. Authors should clarify this. *

      • Like the reviewer, we are intrigued by the higher levels of H2Av staining in the tissue, despite Msp300 knockdown in stem cells only (fig. 3F). It is worth noting that we observed this with two independent RNAi lines (we showed only one RNAi in the manuscript, but we will amend the text to indicate this). In fig. 3F, we will indicate with an arrow the only ISC that is H2Av positive, and mention in the text that the majority of DNA damage signal observed in the Msp300 RNAi condition is in enterocytes, not ISCs. We currently do not have an explanation for why loss of Msp300 in ISCs should cause DNA damage in neighboring cells.

      *In addition, the consequences of the simultaneous elimination of more than one component of the LINC complex on DNA damage should be analyzed. *

      • We agree, and as we check for redundancy (as in point 4), we will also immunostain the tissues for H2Av.

      • The authors claim that the fact that "DNA damage was found more frequently in Sdc-depleted ISCs with lamina invaginations compared to those without (Figure 3H), supports a model whereby the development of nuclear lamina invaginations precedes the acquisition of DNA damage". However, to me, these results show that there is a relation between these two phenotypes, but not that one precedes the other. In order to show which one is the possible cause and which the consequence, the authors should perform a time course of the appearance of each of these phenotypes.*

      • We agree with the reviewer that we should rephrase our statement to indicate a relationship between lamina invaginations and DNA damage, rather than a causality (as stated in fig. 3H).

      (In terms of performing a time course analysis, the difficulty is that after 3 days of Sdc RNAi expression, the apparent DNA damage (fig. 3G) corresponds to a very small proportion of stem cells, meaning that an exceptionally large sample size would be required to achieve robust statistical analysis.)

      • When studying the role of Sdc in neural stem cells, the authors show that elimination of Sdc in neuroblasts also affect nuclear envelope and shape. Furthermore, in this case, they also show that Sdc elimination affects cell division. To look for a more conserved role of Sdc in stem cell behavior, I believe the authors should also analyze whether Sdc elimination in neural stem cells results in an increase in DNA damage, as it is the case in ISCs.*

      • We will stain larval brains for H2Av to see if DNA damage is also observed following Sdc knockdown in neuroblasts.

      • When analyzing a possible role of Sdc in fGSCs, quantification of germline stem cells and gH2Av levels in control nosGal4 and nos>Sdc RNAi germaria should be done. In addition, it is not clear to me whether Sdc is in fact expressed in fGSCs.*

      • *

      • As mentioned in comments to reviewer 1, we will add a panel in figure S4 to show the low Sdc expression in fGSCs. We will also clarify in the text that we do not see any H2Av staining in the fGSCs (thus, there is nothing to quantify in this case).

      * The authors should show presence of Sdc in neuroblasts.*

      • Yes, we agree, as also mentioned in comments to reviewer 1.

      Reviewer #2 (Significance (Required)): *In general, although this work reveals that elimination of Sdc affects different aspects of intestinal and neural stem cell behavior, including cell survival, cell production, nuclear shape, nuclear lamina or DNA damage, their contribution to stem cell loss and interactions between them have not been analyzed in detail. The role of the basement membrane in stem cell behavior has been extensively studied. In particular, the role of syndecan in stem cell regulation has been primarily confined to cancer, muscle, neural and hematopoietic stem cells. Thus, the study here presented could extend the role of Sdc to intestinal stem cells and could potentially reveals a conserved role for Sdc in neural stem cell behavior. However, the problem with the data mentioned above, hinders the assessment of the significance of this work. *

      • We thank the reviewer for their assessment and are glad that they also find that our study provides novel connections between Syndecan and the regulation of intestinal and neural stem cell behaviors. To strengthen our conclusions, we will include additional experiments or amend the text, as indicated above.

      Reviewer #3* (Evidence, reproducibility and clarity (Required)): ** Peer-review: The transmembrane protein Syndecan regulates stem cell nuclear properties and cell maintenance.

      In this work, the authors investigate the role of the transmembrane protein Syndecan (Sdc) in nuclear organisation and stem cell maintenance. Theys show that Sdc knockdown in intestinal stem cells (ISCs) results in a reduction of the ISC pool as well as of their progeny. They hypothesise that these ISCs might get eliminated via cell death, however, expression of the apoptotic inhibitor DIAP1 only rescued ISC loss by 50%. Hence, they suggest that apoptosis can not account for the total decrease in ISCs observed upon Sdc loss. ISCs depleted from Sdc exhibited abnormal cytoplasmic and nuclear morphologies. As Sdc has previously been implicated in the abscission machinery in mammalian cultured cells, they tested if Sdc could be playing a similar role in the abscission of ISCs. However, ISCs were capable of undergoing cytokinesis. Next, they tested if Sdc depletion could be altering the linkage between the plasma membrane and the nucleus mediated by the Linker of Nucleoskeleton and Cytoskeleton (LINC) complex. However, individual knockdowns of the different components of the complex did not disrupt the nuclear morphology to the same extent as Sdc knockdown, suggesting that Sdc function may be independent of the LINC complex. Finally, they observed that Sdc-depleted ISCs exhibited DNA damage, suggesting that Sdc may play a role in DNA protection. The authors next tested if Sdc played similar roles in other stem cell types such as the female germline stem cells (fGSCs) and larval neural stem cells (NSCs). While Sdc depletion appeared dispensable for fGSC maintenance, it prolonged NSC divisions and altered the nuclear morphology of NSCs. Upon further investigations, they observed that the NSC's nuclear envelope was disrupted upon division, hence causing defects in the nuclear size ratio of NSC and their progeny. This study provides with interesting findings in the field and proves a new role for Sdc in the regulation of intestinal and neural stem cell maintenance. I would recommend this manuscript to be accepted if the authors address the following comments.

      __Major comments: __ 1. In Figure 2 A-B, Sdc RNAi should ideally have a UAS control transgene to match the number of UAS being expressed to that of Sdc RNAi, DIAP1. Otherwise, it is plausible that reduced RNAi expression of Sdc RNAi, DIAP1 animals is the cause of the partial rescue. Staining against cell death markers such as Dcp-1 or TUNEL might also quantify the number of cells undergoing cell death in each of the genotypes. *

      • As mentioned in comments to reviewer 2 (point 1), it is difficult to label apoptotic cells in the fly gut. However, we could set up an additional control to test that the partial rescue observed upon DIAP1 expression is not a result of Gal4 dilution.

      • " These phenotypes were observed both with and without DIAP1 expression (Figure 2C), indicating that these cell shapes are not caused by apoptosis."Misleading, as DIAP overexpression in Sdc knockdown background only rescued apoptosis by 50%. Hence, it is possible that those cells undergoing morphological defects, protrusions and blebbing might still undergo death - also considering those morphological changes are typically observed in apoptotic cells...Therefore, to rule apoptosis out, these cells should be shown to be negative for cell death markers. *

      • We agree, however, it is difficult to label apoptotic cells. We think that the quantification of shape and area (as suggested by reviewer 2, point 1) will clearly show that the cell shapes resulting from Sdc depletion are not caused by apoptosis.

      • Show if Sdc is expressed in fGSCs - the lack of phenotype caused by Sdc knockdown might be due to lack of expression of Sdc.*

      • As mentioned in comments to reviewers 1&2, we will add a panel in figure S4 to show the low sdc expression in fGSCs.

      • "After confirming the presence of Sdc in neuroblasts (data not shown)."Data should be shown. It would be of great interest for researchers if you showed a staining of different brain cell types (NBs, glia, neurons) and the Sdc expression patterns.*

      • As mentioned in comments to reviewers 1&2, we will add a panel in figure 4 to show sdc expression in NBs and the overall expression pattern.

      • You show how Slc-depleted NBs have disrupted nuclear morphologies. However, does Slc KD in NB lineages affect their ability to self-renew and generate differentiated progeny? Is the number of NBs and of their progeny cells altered as it is for ISCs?*

      • We propose to knockdown Sdc in NBs and quantify brain size in 3rd instar larvae to test if the ability to generate progeny is affected.

      • Does protection against DNA damage in an Slc knockdown background prevent the defects observed with the single knockdown and ISC elimination?*

      • This is a good question, and we should emphasize this point in the discussion. However, because of the multiple routes of DNA damage response, and the multiple lines needed to explore this connection, we feel that investigating this question is beyond this project.

      • Any idea the similarities between ISC and NBs that can account for why Sdc knockdown has effects in those systems, while no effect was observed in the germ cells?*

      • Besides the differences in expression level, we speculate that GSCs may have a different nuclear / lamina architecture which might reflect differences in how GSCs control the physical integrity of their nuclei. It is also possible that the differences observed between tissues reflect the way stem cells connect to their microenvironment. Notably, fGSCs rely extensively on E-Cadherin mediated adhesion with neighbouring cells, and it is possible that contact with the extracellular matrix is dispensable. We will consider these possibilities in the discussion.

      Minor comments:* ** 8. Lamina invaginations, for example in Figure 3 A, could be indicated with an arrow for easier detection. *

      • Thanks for this suggestion, we will amend the figure.

      Specify the type and location of NB imaged during live cell experiments.

      • The NBs were imaged in the brain lobes, and we did not distinguish between type I and II NBs. We will add a sentence in the method section to clarify.

      *Reviewer #3 (Significance (Required)): Expertise: Drosophila stem cells *

      • Many thanks for the constructive comments.
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The data strongly suggest that iron depletion in urine leads to conditional essentiality of some genes. It would be informative to test the single gene deletions (Figure 3G) for growth in urine supplemented with iron, to determine how many of those genes support growth in urine due to iron limitation.

      We appreciate this suggestion. We have now included this suggested experiment as a new panel (Figure 5G).

      (2) Line 641. The authors raise the intriguing possibility that some mutants can "cheat" by benefitting from the surrounding cells that are phenotypically wild-type. Growing a fepA deletion strain in urine, either alone or mixed with wild-type cells, would address this question. Given that other mutants may be similarly "masked", it is important to know whether this phenomenon occurs.

      We thank the reviewer for this suggestion but believe that this would be very difficult to ascertain in K. pneumoniae as several redundant iron uptake systems exist. This would require significantly more time to construct sequential/combinatorial iron-uptake mutants to exactly determine this “cheating” and “masking” phenomenon and such work is beyond the scope of the current study.

      (3) In cases where there are disparities between studies, e.g., for genes inferred to be essential for serum resistance, it would be informative to test individual deletions for genes described as essential in only one study.

      We thank the reviewer for this suggestion, and we agree that deleting conditionally essential genes (i.e. serum resistance) could help identify discrepancies in methodology with other studies but this is beyond the scope of this study. Furthermore, we do not have these other strains readily available to us and importing these strains into Australia is challenging due to the strict import/quarantine laws.

      Reviewer #1 (Recommendations For The Authors)

      (4) Line 529. Why was 50 chosen as the read count threshold?

      This was chosen as the minimum threshold needed to exclude essential genes from the comparative analysis, as these can contribute false positive results where a change from, for example, 2 to 5 reads between conditions is considered a >2-fold change. We have updated the manuscript text to highlight this: “were removed from downstream analysis to exclude confounding essential genes and minimize the effect of stochastic mutant loss” (line 539

      (5) The titles for Figure 5 and Figure 6 appear to be switched.

      Thank you, we have now corrected this error.

      (6) Line 381. "Forty-six of these regions contain potential open reading frames that could encode proteins". How is a potential ORF defined?

      This was based on submitting the selected 145bp regions to BLASTx using default parameters and listing the top hit (if one was found). We have now edited the manuscript text to make this clearer. (Line 394)

      (7) Two previous TnSeq studies looking at Escherichia coli and Vibrio cholerae suggest that H-NS can prevent transposon insertion, leading to false positive essentiality calls. Is there any evidence of this phenomenon here? A/T content could be used as a proxy for H-NS occupancy.

      We thank the reviewer for this point and also agree that H-NS or other DNA-binding proteins could indeed lead to false-positive essentiality calls using TraDIS. Based on this, we have now included a sentence in the conclusion section mentioning this methodological caveat (Line 631). We believe that A/T content could potentially be used as a proxy for H-NS occupancy,

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors may wish to reformat the manuscript by decanting a number of panels and figures as supplementary material. These include the panels related to the description of TraDIS (for example Fig 1D, 1E, 1F. 1G, Fig 2A, Fig 3C, 3D, 3E, 3F, Fig 5C, Fig 6D). This is a well-established method.

      We thank the reviewer for this suggestion but believe that these panels allow the methodology and resulting insertion plots to be more followable and allow other researchers, of varying expertise, to better understand this functional genetic screen technique.

      (2) The authors need to indicate how relevant the strain they have probed is. Is it a good reference strain of the KpI group?

      This is a great suggestion and we have now included a new figure illustrating the genetic context and relatedness of K. pneumoniae ECL8 within the KpI phylogroup (New Figure 3).

      (3) The authors need to provide an extensive comparison between the data obtained and those reported testing other Klebsiella strains. A Table identifying the common and different genes, as well as a figure, may suffice. I would encourage authors to compare also their data against E. coli and Salmonella. For example, igaA seems to be not essential in Kebsiella although data indicates it is in Salmonella.

      We thank the reviewer for their comment and appreciate that our data could be extended and compared to other relevant Enterobacteriaceae members. However, we believe this is beyond the scope of this study as the focus is more on K. pneumoniae.

      (4) None of the mutants tested further are complemented. Without these experiments, it cannot be rigorously claimed that these loci play any role in the phenotypes investigated.

      We agree that complementation is an important tenet for validation of mutant gene phenotypes to specific gene loci, in this case wbbY has already been complemented and believe complementation for an already known molecular mechanism would be redundant. Please refer to our response in point 6.

      We complemented isolated transposon mutants hns7::Tn5 and hns18::Tn5 with a mid-copy IPTG inducible . We observed a slight increase in serum susceptibility but not full rescue of the WT phenotype (i.e. serum susceptibility). We suspect that the imperfect rescue of the serum-resistance phenotype observed could be due to the expression levels and copy number of the complement hns plasmid used. As hns is a known global regulator its possible pleiotropic role is complex as many aspects of stress response, metabolism or capsule could be affected in Klebsiella (doi.org/10.1186/1471-2180-6-72, doi.org/10.3389/fcimb.2016.00013). We have now included in the text our efforts in complementation and have included a new supplementary figure (Figure S11).

      (5) The contribution of siderophores to survival in urine is not conclusively established. Authors may wish to test the transcription of relevant genes, and to assess whether the expression is fur dependent in urine. Also, authors may wish to identify the main siderophore needed for survival in urine by probing a number of mutants; this will allow us to assess whether there is a degree of selection and redundancy.

      We thank the reviewer for their comment and agree siderophore uptake is important. We have now included an additional panel (Figure 5G) interrogating the importance of iron-uptake genes grown in urine which is iron limited. We do appreciate that further experiments looking into the Fur regulon and siderophore biosynthesis would be interesting but believe this is outside the scope of this study.

      (6) The role of wbbY is intriguing, pointing towards the importance of high molecular weight O-polysaccharide. In this mutant background, the authors need to assess whether the expression of the capsule, and ECA is affected. Authors need also to complement the mutant. Which is the mechanism conferring resistance?

      We thank the reviewer for their comment and would like to mention that wbbY has already been shown to play a role in LPS profile/biosynthesis and serum-resistance (10.3389/fmicb.2014.00608 ). Furthermore, blast analysis shows that the wbbY gene between the NTUH-K2044 (strain used in aforementioned study) and ECL8 shares 100% sequence identity and also shares lps operon structure. Hence, we do not find it pertinent to complement this mutant as we believe its molecular mechanism has already been established. We have now in the text more prominently highlighted the results of this study and how our screen was robust enough to also identify this gene for serum resistance.

      (7) hns and gnd mutants most likely will have their capsule affected. The authors need to assess whether this is the case. Which is the mechanism conferring resistance?

      As mentioned in point 6, we believe that the serum resistance phenotype is attributable to the LPS phenotype. Previous studies have listed hns and gnd mutants would likely have differences in capsule but due to hns being pleiotropic and gnd being intercalated/adjacent to the LPS/O-antigen biosynthesis it would be difficult to exactly delineate which cellular surface structure is involved.

      (8) The conclusion section can be shortened significantly as much of the text is a repetition of the results/discussion section.

      We thank the reviewer for their suggestion and have made edits to limit repetition in the conclusion section.

      Reviewer #3 (Public Review):

      Below I include several comments regarding potential weaknesses in the methodology used:

      • The study was done with biological duplicates. In vitro studies usually require 3 samples for performing statistical robust analysis. Thus, are two duplicates enough to reach reproducible results? This is important because many genes are analyzed which could lead to false positives. That said, I acknowledge that genes that were confirmed through targeted mutagenesis led to similar phenotypic results. However, what about all those genes with higher p and q values that were not confirmed? Will those differences be real or represent false positives? Could this explain the differences obtained between this and other studies?

      We thank the reviewer for their comment and apologize for the confusion, data were only pooled for the statistical analysis of gene essentiality. Here, two technical replicates of the input library were sequenced and the number of insertions per gene quantified (insertion index scores). These replicates had a correlation coefficient of r2 = 0.955, and the insertions per gene data were pooled to give total insertions index scores to predict gene essentiality. For conditional analyses (growth in urine or serum), replicate data were not combined. As mentioned previously, differences between this and other studies could also be attributed to inherent genomic differences or due to differences in experimental methodology, computational approaches, or the stringency of analysis used to categorize these genes.

      • Two approaches are performed to investigate genes required for K. pneumoniae resistance to serum. In the first approach, the resistance to complement in serum is investigated. And here a total of 356 genes were identified to be relevant. In contrast, when genes required for overall resistance to serum are studied, only 52 genes seem to be involved. In principle, one would expect to see more genes required for overall resistance to serum and within them identify the genes required for resistance to complement. So this result is unexpected. In addition, it seems unlikely that 356 genes are involved in resistance to complement. Thus, is it possible false positives account for some of the results obtained?

      We thank the reviewer for their comment and do believe false positives may account for some of the identified genes. Specifically, to the large contrast in genes, we believe this is due to the methodology as alluded to in our conclusion section. For overall resistance to serum, we used a longer time point (180 min exposure) where fewer surviving mutants are recovered hence fewer overall genes will be identified, whereas strains with short killing windows will have more (i.e. complement-mediated killing, 90 minute exposure).

      Reviewer #3 (Recommendations For The Authors):

      • In Figure 4 it is shown that genes important for growth in urine include several that are required for enterobactin uptake. Moreover, an in vitro experiment shows that the complementation of urine with iron increases K. pneumoniae growth. It would have been informative to do a competition experiment between the WT and Fep mutants in urine supplemented with iron. This could demonstrate that the genes identified are only necessary for conditions in which iron is in limiting concentrations and confirm that the defect of the mutants is not due to other characteristics of urine.

      We appreciate this suggestion. We have now included a new panel (Figure 5G) addressing the supplementation of iron in urine for these select mutants.

      • Considering the results section, the title for Figure 6 seems to be more appropriate for Figure 5.

      Thank you, this has now been corrected.

      Other points:

      • Line 44: treat instead of treating

      Thank you, this has now been corrected.

      • Line 63: found that only 3 genes played a role instead of "found only 3 genes played a role"

      Thank you, this has now been corrected.

      • Line 105: is there any reason for only using males? Since UTIs are frequent in women? Why not use urine from women volunteers?

      Due to accessibility of willing volunteers and human ethic application processes, only male samples were available. We are currently undertaking further studies to understand how male and female urine influences growth of uropathogens.

      • Line 105: since the urine was filter-sterilized, maybe the authors can comment that another point that is missing in urine - and that it may be important to study - will be the presence of the urine microbiome and how this affects growth of K. pneumoniae.

      We again thank the reviewer for this comment and have now edited the manuscript discussing how the absence of urine microbiome could affect growth (Line 659). As an aside, future studies in our lab are interested in looking at the role of commensal/microbiome co-interactions for essentiality/pathogenesis using TraDIS.

      • Line 116: I understand that the 8 healthy volunteers combined males and females

      Thank you, we have now edited this methods line to make this clearer.

      • Line 120: incubate in serum 90 min and 180 RPM shaking: any reasons for using these conditions, any reference supporting these conditions?

      Thank you for pointing this out, we were mirroring a previous K. pneumoniae serum-resistance study (doi.org/10.1128/iai.00043-).

      • Line 156: space after the dot.

      Thank you, we have now corrected this in the manuscript.

      • Line 164: resulting reads were mapped to the K. pneumoniae: what are the parameters used for mapping (e.g. % of identity...)?

      Thank you for bringing this to our attention, we have now included in our manuscript that we used the default parameters of BWA-MEM for mapping for minimum seed length (default -k =20bp exact match)

      • Line 180: it will be good to upload to a repository the In-house scripts used or indicate the link beside the reference for those scripts.

      Our scripts are derived from the pioneering TraDIS study (doi: 10.1101/gr.097097.109). We are currently still optimizing our scripts and intend to upload these to be publicly available. However, in the meantime we are more than happy to share them with other parties upon request.

      • Line 191: why were genes classified as 12 times more likely to be situated in the left mode? Any particular reason for using this threshold?

      We opted for a more-stringent threshold for classifying essential genes, in keeping with previous and comparable studies (doi.org/10.1371/journal.pgen.1003834).

      • Line 209: do you mean Q-value of <0.05 instead of >0.05 ? How is this Q value is calculated, and which specific tests are applied?

      Thank you for pointing out this Q value error, we have now corrected this in the manuscript. These values were generated using the biotradis tradis_comparison.R script which uses the EdgeR package. For further reading please see DOI: 10.1093/bioinformatics/btp616. The Q-values are from P values corrected for multiple testing by the Benjamini-Hochberg method.

      • Line 212: again, which type of test is used? What about the urine growth analysis? The same type of tests were applied?

      Thank you for bringing this to our attention, we have now indicated in the referenced method section the use of which package for which datasets (i.e. or serum). Line 212 refers to our use of the AlbaTraDIS package, which builds on the biotradis toolkit, to identify gene commonalities/differences in the selected growth conditions again using multiple testing by the Benjamini-Hochberg methods. For further reading, please refer to DOI: 10.1371/journal.pcbi.1007980

      • Line 226: do the authors mean Sanger sequencing instead of SangerSanger sequencing?

      Thank you, we have now corrected this in the manuscript.

      • Line 239: does the WT strain contain another marker for differentiating this strain from the mutant? Or is the calculation of the number of WT CFUs done by subtracting the number of CFUs in media with antibiotics from the total number of CFUs in media without antibiotics? The former will be a more accurate method.

      The calculation was based on the latter assumption, “number of WT CFUs done by subtracting the number of CFUs in media with antibiotics from the total number of CFUs in media without antibiotics”. We have now updated the methods section to make this clearer.

      • Line 266: can you indicate approximately how many CFUs you have in this OD?

      Thank you, we have now also indicated an approximate CFU for this mentioned OD600 (OD600 1 = 7 × 108 cells).

      • Line 309: besides indicating Figure 1D please indicate here Dataset S1 (the table where one can see the list of essential and non-essential genes). This table is shown afterwards but I think it will be more appropriate to show it at the begging of the section.

      Thank you, we have now taken on this recommendation and have now edited the manuscript to also indicate Dataset S1 earlier.

      • Table 3. regarding the comparison of essential genes between different strains. I think it will be more clear if a Venn diagram was drawn including only genes that have homologs in all the studied strains (i.e. defining the core genome essentially).

      We would like to thank the reviewer for suggesting a venn diagram and have now removed Table 3 which has been replaced with a new Figure 3.

      • Line 461: replicates were combined for downstream analyses? But are replicates combined for doing the statistical analysis? If so, how is the statistical analysis performed? How is it taken into account the potential variability in the abundance in each library? An r of 0.9 is high but not perfect.

      Technical replicates of the sequenced input library were combined following identification of a correlation coefficient of r2 = 0.955, for the calculation of insertion index scores used in gene essentiality analysis. While r2 = 0.955 is not perfect, discrepancies here can be attributed to higher variance in insertion index scores when sampling small genes, as these are represented by fewer insertions and the stochastic absence of a single insertion event has a greater effect on the overall IIS. Replicate data were not pooled for statistical analysis of mutant fitness (growth in urine and serum).

      • Line 487: is there any control strain containing the kanamycin gene in a part of the genome that does not affect the growth of K. pneumoniae? This could be used to show that having the kanamycin gene does not provide any defect in urine growth.

      We thank the reviewer for this suggestion but argue that introduction of the kanamycin gene into each unique loci may result in various levels of gene fitness that would be incomparable to a single control strain. Instead, we culture the ECL8 mutant library in urine and ensure that its kinetics are comparable to the wildtype. As the library contains thousands of kanamycin cassettes uniquely positioned across most of the genome with no observable growth defect, we do not anticipate the presence or expression of the cassette to have an appreciable impact.

      • Line 569: in the methodology it was indicated that control cells were incubated in PBS for the same amount of time. I think this is an important control that is not cited in the results section. Please can you indicate?

      We apologise for this misunderstanding due to how the methodology was written. The experiment did not sequence the PBS incubated samples as this was solely used a check for viability of the used K. pneumoniae ECL8 stock solution.

      • Line 597: "Mutants in igaA are enriched in our experiments". Can you show this data?

      We have now included this as a supplementary (Figure S11A)

      • Line 615: when doing this calculation, I guess the authors take into account only genes that are also present in the other strains.

      That is correct, we were aiming to highlight the high conservation of “essential genes” among all the selected strains.

      • Line 627: why surprisingly? Because is too low. Then indicate.

      Thank you, we have now edited this sentence to indicate that.

      • Figure 4: please, for clarity, can you indicate the meaning of the colors in the figure itself besides indicating it in the figure legend?

      Thank you, we have now included a color legend in these figure panels for clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors build upon prior data implicating the secreted peptidoglycan hydrolase SagA produced by Enterococcus faecium in immunotherapy. Leveraging new strains with sagA deletion/complementation constructs, the investigators reveal that sagA is non-essential, with sagA deletion leading to a marked growth defect due to impaired cell division, and sagA being necessary for the immunogenic and anti-tumor effects of E. faecium. In aggregate, the study utilizes compelling methods to provide both fundamental new insights into E. faecium biology and host interactions and a proof-of-concept for identifying the bacterial effectors of immunotherapy response.

      We thank the Reviewers for their positive feedback on our manuscript. We also appreciate their helpful comments/critiques and have revised the manuscript as indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Klupt, Fam, Zhang, Hang, and colleagues present a novel study examining the function of sagA in E. faecium, including impacts on growth, peptidoglycan cleavage, cell separation, antibiotic sensitivity, NOD2 activation, and modulation of cancer immunotherapy. This manuscript represents a substantial advance over their prior work, where they found that sagA-expressing strains (including naturally-expressing strains and versions of non-expressing strains forced to overexpress sagA) were superior in activating NOD2 and improving cancer immunotherapy. Prior to the current study, an examination of sagA mutant E. faecium was not possible and sagA was thought to be an essential gene.

      The study is overall very carefully performed with appropriate controls and experimental checks, including confirmation of similar densities of ΔsagA throughout. Results are overall interpreted cautiously and appropriately.

      I have only two comments that I think addressing would strengthen what is already an excellent manuscript.

      In the experiments depicted in Figure 3, the authors should clarify the quantification of peptidoglycans from cellular material vs supernatants. It should also be clarified whether the sagA need to be expressed endogenously within E. faecium, and whether ambient endopeptidases (perhaps expressed by other nearby bacteria or recombinant enzymes added) can enzymatically work on ΔsagA cell wall products to produce NOD2 ligands?

      We mentioned in the main text that peptidoglycan was isolated from bacterial sacculi and digested with mutanolysin for LC-MS analysis. We have now also included “mutanolysin-digested” sacculi in the Figure 3 legend as well.

      We have added the following text “We next evaluated live bacterial cultures with mammalian cells to determine their ability to activate the peptidoglycan pattern recognition receptor NOD2” and “our analysis of these bacterial strains” to indicate live cultures were evaluated for NOD2 activation.

      We have also added the following text “Our results also demonstrated that while many enzymes are required for the biosynthesis and remodeling of peptidoglycan in E. faecium, SagA is essential for generating NOD2 activating muropeptides ex vivo.”

      In the murine experiments depicted in Figure 4, because the bacterial intervention is being performed continuously in the drinking water, the investigators have not distinguished between colonization vs continuous oral dosing of the mice peptidoglycans. While I do not think additional experimentation is required to distinguish the individual contributions of these 2 components in their therapeutic intervention, I do think the interpretation of their results should include this perspective.

      We have added the following text “We note that by continuous oral administration in the drinking water, live E. faecium and soluble muropeptides that are released into the media during bacterial growth may both contribute to NOD2 activation in vivo.” and revised the following text “Nonetheless, these results demonstrate SagA is not essential for E. faecium colonization, but required for promoting the ICI antitumor activity through NOD2 in vivo.

      Reviewer #2 (Public Review):

      Summary:

      The gut microbiome contributes to variation in the efficacy of immune checkpoint blockade in cancer therapy; however, the mechanisms responsible remain unclear. Klupt et al. build upon prior data implicating the secreted peptidoglycan hydrolase SagA produced by Enterococcus faecium in immunotherapy, leveraging novel strains with sagA deleted and complemented. They find that sagA is non-essential, but sagA deletion leads to a marked growth defect due to impaired cell division. Furthermore, sagA is necessary for the immunogenic and anti-tumor effects of E. faecium. Together, this study utilizes compelling methods to provide fundamental new insights into E. faecium biology and host interactions, and a proof-of-concept for identifying the bacterial effectors of immunotherapy response.

      Strengths:

      Klupt et al. provide a well-written manuscript with clear and compelling main and supplemental figures. The methods used are state-of-the-art, including various imaging modalities, bacterial genetics, mass spectrometry, sequencing, flow cytometry, and mouse models of immunotherapy response. Overall, the data supports the conclusions, which are a valuable addition to the literature.

      Weaknesses:

      Only minor revision recommendations were noted.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      General comments - the number/type of replicates and statistics are missing from some of the figure panels. Please be sure to add these throughout - all main figure panels should have replicates. I've also noted some specific cases below.

      Abstract - sagA is non-essential, need to edit text at "essential functions".

      This change has been made.

      "small number of mutations" - specify how many in the text.

      We revised the text. “Small number” is changed to “11”.

      "under control of its native promoter" - what was the plasmid copy number? It looks clearly overexpressed in Figure 1d despite using a native promoter, although it's a bit hard to know for sure without a loading control.

      pAM401 has p15A origin of replication, therefore the plasmid copy number ~20-30 copies (Lutz R. et al Nucleic Acids Res. 1997). Total protein was visualized by Stain-Free™ imaging technology (BioRad) and serves as protein loading control and has been relabeled accordingly.

      "decrease levels of small muropeptides" - the asterisks are missing from Figure 3a.

      Green asterisks for peaks 2, 3, 7 and purple asterisks for peaks 13, 14 were added.

      The use of "Com 15 WT" in the figures is confusing - just replace it with "wt" and specify the strain in the text. Presumably, all of the strains are on the Com 15 background.

      “Com15 WT” was replaced to “WT” in figures and main text.

      Change 1d to 1b so that the panels are in order (reading left to right and then top to bottom).

      Figure 1 legend is missing a number of replicates and statistics for 1a.

      Number of replicates were added.

      Figure 1b - it's unclear to me what to look at here, could add arrows indicating the feature or interest and expand the relevant text.

      Arrows pointing to cell clusters were added.

      Figure 1d - what is "stain free"? It would be preferable to show a loading control using an antibody against a constitutive protein to allow for normalization of the loading control.

      Stain-Free Imaging technology (BioRad) utilizes gel-containing trihalo compound to make proteins fluorescent directly in the gel with a short photoactivation, allowing the immediate visualization of proteins at any point during electrophoresis and western blotting. Stain-Free total protein measurement serves as a reliable loading control comparable to Coomassie Blue Staining. This has been relabeled a “Total protein” in the Figure and Stain-free imaging technology is noted in the legend.

      ED Figure 1 - representative of how many biological replicates?

      Legends are updated.

      ED Figure 2a - I would replace this with a table, it's not necessary to show the strip images. Also, please specify the number of replicates per group.

      Additional Extended Data Table 2 was added.

      ED Figure 2b - This data was not that convincing since the sagA KO has a marked growth defect and the time points are cut off too soon to know if growth would occur later. The MIC definition is potentially misleading. Should specific a % growth cutoff (i.e. <10% of vehicle control) and the metric used (carrying capacity or AUC). Then assign MIC to the tested concentration, not a range. The empty vector also seems to impact MIC, which is concerning and complicates the interpretation. Specify the number of replicates and add statistics. Given these various concerns, I might suggest removing this figure, as it doesn't really add much to the story.

      We appreciate this comment from the Reviewer, but believe this data is helpful for paper and have included longer time points for the growth data. The definition of MIC for ED Fig. 2b has been included in the legend.

      Figure 2 - specify the type of replicate. Number of cells? Number of slices? Number of independent cultures?

      For Cryo-ET experiments single bacterial cultures were prepared. Number of cells and slices for analysis are indicated in the legend. Legends are updated.

      Figure 4e - missing the water group, was it measured?

      Water (αPD-L1) group was not included in immune profiling of tumor infiltrating lymphocytes (TILs) experiment, as we have previously demonstrated limited impact on ICI anti-tumor activity and T cell activation in this setting (Griffin M et al Science 2021).

      Figure 4d - is this media specific to your strains? If not, qPCR may be a better method using strain-specific primers.

      Yes, HiCrome™ Enterococcus faecium agar plates (HIMEDIA 1580) are selective for Enterococcus species, moreover the agar is chromogenic allowing to identify E. faecium as yellow colonies among other Enterococcus species.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      RC-2023-02105R: Brunetta et al.,

      IF1 is a cold-regulated switch of ATP synthase to support thermogenesis in brown fat

      We are happy to submit our revised manuscript after considering the suggestions made by reviewers. The comments were overall positive, and the changes requested were mostly editorial. We have, nevertheless, added new experiments as quality controls. These experiments did not affect the main conclusions of our work. In addition, we also included two in vivo experimental models of gain and loss-of-function, to further address the physiological relevance of IF1 in BAT thermogenesis. We believe with these additional experiments, quality controls as well as in vivo models, our study has improved considerably. We hope our efforts will be appreciated by the reviewers and we make ourselves available to answer any further questions.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: In the present manuscript, the authors present data in support of their primary discovery that "IF1 controls UCP1-dependent mitochondrial bioenergetics in brown adipocytes". The opening figure convincingly demonstrates that IF1 expression is cold-exposure dependent. They then go on to show that loss of IF1 has functional consequences that would be predicted based on IF1's know role as a regulator of ATP hydrolysis by CV. They go on to make a few additional claims, succinctly detailed in the Discussion section. Specific claims include the following: 1) IF1 is downregulated in cold-adapted BAT, allowing greater hydrolytic activity of ATP synthase by operating in the reverse mode; 2) when IF1 is upregulated in brown adipocytes in vitro mitochondria unable to sustain the MMP upon adrenergic stimulation, 3) IF1 ablation in brown adipocytes phenocopies the metabolic adaptation of BAT to cold, and 4) IF1 overexpression blunts mitochondrial respiration without any apparent compensator response in glycolytic activity. The claims described above are well supported by the evidence. The manuscript is very well written, figures are clear and succinct. Overall, the quality of the work is very high. Given that IF1 is implicated across many fields of study, the novel discovery of IF1 as a regulator of brown adipose mitochondrial bioenergetics will be of significance across several fields. That said, a few areas of concern were apparent. Concerns are detailed in the "Major" and "Minor" comments section below. Additional experiments do not appear to be required, assuming the authors adequately acknowledge the limitations of the study and either remove or qualify speculative claims.

      Major Comments:

      1. The authors convincingly demonstrate that IF1 expression is specifically down-regulated in BAT upon cold-exposure. These data strongly implicate a role for IF1 in BAT bioenergetics, a major claim of the authors and a novel finding herein. Additional major strengths of the paper, which provide excellent scientific rigor include the use of both loss of function and gain of function approaches for IF1. In addition, the mutant IF1 experiments are excellent, as they convincingly show that the effects of IF1 are dependent on its ability to bind CV. RESPONSE: We thank the reviewer for the positive feedback on our work.

      Regarding Figure 1 - Did the content of ATP synthase change? In figure 1A-B, the authors show that ATPase activity of CV is higher in cold-adapted mice. While this result could be due to a loss of IF1, it could also be due to a higher expression of CV. To control for this, the authors should consider blotting for CV, which would allow for ATPase activity to be normalized to expression.

      RESPONSE: Thank you for this suggestion. We have now determined complex V subunit A in our experimental protocol. We found that cold exposure does not impact complex V protein levels. Given the importance of this control, we have now included it in Figure 1 (Please, see the revised version) alongside the IF1/complex V ratio. In addition, we have now performed WBs in the BAT from mice exposed for 3 and 7 days to thermoneutrality (~28°C). We found that IF1 is not reduced following whitening of BAT by this approach whilst UCP1 and other mitochondrial proteins are reduced. This set of data is now included in Figure 1I,K,L.

      Regarding MMP generated specifically by ATP hydrolysis at CV, the reversal potential for ANT occurs at a more negative MMP than that of CV (PMID: 21486564). Because reverse transport of ATP (cytosol to matrix) via ANT will also generate a MMP, it is speculative to state that the MMP in the assay is driven by ATP hydrolysis at CV. It is possible and maybe even likely that the majority of the MMP is driven by ANT flux, which in turn limits the amount of ATP hydrolyzed by CV. Admittedly, it is very challenging to different MMP from ANT vs that from CV, thus the authors simply need to acknowledge that the specific contribution of ATP hydrolysis to MMP remains to be fully determined. That said, the fact that ATP-dependent MMP tracks with IF1 expression does certainly implicate a role for ATP hydrolysis in the process. The authors should consider including a discussion of the ambiguity of the assay to avoid confusion. A role for ANT likely should be incorporated in the Fig. 1J cartoon.

      RESPONSE: Thank you for bringing the ANT contribution to MMP to our attention. The effects of ATP in the real-time MMP measurements were totally abolished by the addition of oligomycin in BAT-derived isolated mitochondria, thus suggesting dependency of complex V in this process. However, the assessment of MMP in intact cells is much more challenging given cytosolic vs. mitochondrial contribution to ATP pool, and ATP synthase vs. ANT reversal capacity depending on MMP. Nevertheless, we have addressed these points in the discussion section as well as added to our schematic cartoon in Figure 1m.

      Regarding the lack of effect of IF1 silencing on MMP, it is possible that IF1 total protein levels are simply lower in cultured brown fat cells relative to tissue? The authors could consider testing this by blotting for IF1 and CV in BAT and brown fat cells. The ratio of IF1/ATP5A1 in tissue versus cells may provide some amount of mechanistic evidence as to their findings.

      RESPONSE: We have now blotted for complex V and IF1 in both differentiated primary brown adipocytes and BAT homogenates derived from mice kept at room temperature (~22°C). We found the levels of complex V in primary brown adipocytes are higher than BAT homogenates. Therefore, IF1/complex V ratio is different between these two systems. This has indeed the potential to influence our gain and loss-of-function experiments. We have added these results alongside their interpretation in the revised manuscript.

      The calculation of ATP synthesis from respiration sensitive to oligomycin has many conceptual flaws. Unlike glycolysis, where ATP is produced via substrate level phosphorylation, during OXPHOS, the stoichiometry of ATP produced per 2e transfer is not known in intact brown adipose cells. This is a major limitation of this "calculated ATP synthesis" approach that is beginning to become common. Such claims are speculative and thus likely do more harm than good. In addition to ANT and CV, there are many proton-consuming reactions driven by the proton motive force (e.g., metabolite transport, Ca2+ cycling, NADPH synthesis). Although it remains unclear how much proton conductance is diverted to non-ATP synthesis dependent processes, it seems highly likely that these processes contribute to respiratory demand inside living cells. Moreover, just as occurs with UCP1 in response to adrenergic stimuli, proton conductance across the various proton-dependent processes likely changes depending on the cellular context, which is another reason why using a fixed stoichiometry to calculate how much ATP is produced from oxygen consumption is so highly flawed. Maximal P/O values that are often used for NAD/FAD linked flux are generated using experimental conditions that favor near complete flux through the ATP synthesis system (supraphysiological substrate and ADP levels). The true P/O value inside living cells is likely to be lower.

      RESPONSE: We agree with the reviewer regarding the limitations on calculating ATP production in intact cells based on respiration and proton flux. However, this was only one experiment on which we based our conclusions, as these were also supported by i.e. ATP/ADP ratio measurements and oxygen consumption using different substrates. Therefore, we do not rely exclusively on the ATP production estimative, rather we use this experiment to support complementary methodologies. Nevertheless, we have now better detailed our experimental protocol as well as acknowledged the limitations of the method, so the reader is aware of our procedure and its limitations. We hope the reviewer understands our motivation to perform these experiments and the contribution to our study.

      Why are the results in Figure 3K expressed as a % of basal? Could the authors please normalize the OCR data to protein and/or provide a justification for why different normalization strategies were used between 3K and 3M?

      RESPONSE: We apologize for the lack of consistency. We have now updated Figure 3 to show all the data in absolute values divided by protein content. This change does not affect the overall interpretation of the findings.

      The authors claim that IF1 overexpression lowers ATP production via OXPHOS. However, given the major limitations of this assay (ass discussed above), these claims should be viewed as speculation. This needs to be addressed by the authors as a major limitation. The fact that the ATP/ADP levels did not change do not support of reduction in ATP production, as claimed in the title of Figure 4.

      RESPONSE: The reduction in ATP levels and mitochondrial respiration (independent of the substrate offered) suggests a reduction in ATP production rather than an increase in ATP consumption. Moreover, the maintenance of ATP/ADP ratio suggests the existence of a compensatory mechanism to avoid cellular energy crises, which we interpreted as reduced metabolic activity of the cells. Nevertheless, we have now reworded our statements to address the limitations of the methods and our interpretation of the data.

      In the discussion, the authors state "However, considering that IF1 inhibits F1-ATP synthase in a 1:1 stoichiometric ratio, the relatively higher expression of IF1 in BAT at room temperature could represent an additional inhibitory factor for ATP synthesis in this tissue." This does not appear to be correct. Although IF1 has been suggested to partially lower maximal rates of ATP synthesis rates, most of this evidence comes from over-expression experiments. According to the current understanding of IF1-CV interaction, the protein is expelled from the complex during rotation in favor of ATP synthesis (PMID: 37002198). It is far more likely that ATP synthesis is low in BAT mitochondria due to the low CV expression. Relative to heart and when normalized to mitochondrial content, CV expression in BAT mitochondria is about 10% that of heart (PMID: 33077793).

      RESPONSE: We agree with the reviewer and removed this sentence.

      The last sentence of the manuscript states, "Given the importance of IF1 to control brown adipocyte energy metabolism, lowering IF1 levels therapeutically might enhance approaches to enhance NST for improving cardiometabolic health in humans." This sentence seems at odds with the evidence that IF1 levels go up, not down, in human BAT upon cold exposure.

      RESPONSE: In light of our new experiments, we have now updated our conclusions.

      Minor Comments:

      The term "anaerobic glycolysis" is used throughout. All experiments were performed under normoxic conditions, thus the correct term is "aerobic glycolysis.

      RESPONSE: Thank you for this comment and we have replaced this term as suggested.

      Only male mice were used in the study, could the authors please provide a justification for this?

      RESPONSE: Given we devoted most of our efforts to the manipulation of IF1 in vitro, we have used the mouse model as a proof-of-principle on the impact of IF1 in adrenergic-induced thermogenesis. We have now included IF1 KO male and female mice to address the role of IF1 in adrenergic-induced thermogenesis. However, due to the limitation of material, we could only perform AAV in vivo gain-of-function in male mice, therefore, our results cannot be immediately transferred to both sexes, unfortunately.

      Reviewer #1 (Significance (Required)):

      Overall, the quality of the work is very high. Given that IF1 is implicated across many fields of study, the novel discovery of IF1 as a regulator of brown adipose mitochondrial bioenergetics will be of significance across several fields.

      My expertise is in mitochondrial thermodynamics; thus, I do not feel there are any parts of the paper that I do not have sufficient expertise to evaluate.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary

      The manuscript by Brunetta and colleagues conveys the message that the ATPase inhibitory factor 1 (IF1) protein, a physiological inhibitor of mitochondrial ATP synthase, is expressed in BAT of C57BL/6J mice. Moreover, upon cold-adaption of mice they report that the content of IF1 in BAT is downregulated to sustain the mitochondrial membrane potential (MMP) as a result of reverse functioning of the enzyme. In experiments of loss and gain of function of IF1 in cultured brown adipocytes and WT cells they further stress that IF1 silencing promotes metabolic reprogramming to an enhanced glycolysis and lipid oxidation, whereas IF1 overexpression blunts ATP production rendering a quiescent cellular state of the adipocytes.

      RESPONSE: We appreciate the time the reviewer invested in our work. Please, see our responses below in a point-by-point manner.

      Reviewer #2 (Significance (Required)):

      Claims and conclusions:

      I have been surprised by the claim that IF1 protein is expressed in BAT under basal conditions and that its expression is downregulated in the cold-adapted tissue. In a previously published work by Forner et al., (2009) Cell Metab 10, 324-335 (reference 43), using a quantitative proteomic approach, it is reported that the mitochondrial proteome of mouse BAT under basal conditions contains a low content of IF1 (at level comparable to the background of the analysis). Remarkably, in the same study they show that there is roughly a 2-fold increase in the content of IF1 protein in mitochondria of BAT at 4d and 24d of cold-adaptation of mice. In other words, just the opposite of what is being reported in the Brunetta study.

      RESPONSE: We are aware of the inconsistencies between our findings and Forner et al. (2009). We would like to point out that we have determined IF1 levels in BAT in two separate cohorts with the same findings, and in a third cohort, we observed IF1 mRNA levels to be downregulated in a much shorter timeframe. Our functional analysis is line with this pattern of regulation. A closer look at the supplementary table provided by Forner et al. (2009), shows that the increase in IF1 content following cold exposure is not supported and since we do not have further insight into the methods and analysis employed by the Forner et al. group, we believe a direct comparison should be avoided at the moment. Regarding the baseline levels of IF1 in BAT, the relatively high abundance of IF1 in BAT was also found by another independent group (https://doi.org/10.1101/2020.09.24.311076).

      Importantly, the last paragraph of the discussion needs to be amended when mentioning the work of Forner et al. (ref.43). The mentioned reference studied changes in the mouse mitochondrial proteome not in human mitochondria, as it is stated in the alluded paragraph.

      RESPONSE: We apologize for this overlook; we have now reworded our statement.

      More puzzling are the western blots in Figures 1E, 1H, Supp. Fig. 1C, D were IF1 (ATP5IF1) is identified by a 17kDa band. However, in other Figures (Fig. 2, Fig. 3, Fig. 4, Supp Fig. 2) IF1 is identified by its well-known 12kDa band. What is the reason for this change in labeling of the IF1 band? The reactivity of the anti-IF1 antibody used? It has been previously documented that liver of C57BL/6J and FVB mouse strains do not express IF1 to a significant level when compared to heart IF1 levels (Esparza-Molto (2019) FASEB J. 33, 1836-1851). However, in Fig. 1E they show opposite findings, much higher levels of IF1 in liver than in heart as reveal by the 17kDa band. Moreover, in Fig. 1H they show the vanishing of the 17 kDa band under cold adaptation, which is not the migration of IF1 in gels as shown in their own figures (see Fig. 2, Fig. 3, Fig. 4, Supp Fig. 2). I am certainly reluctant to accept that the 17kDa band shown in Figures 1E, 1H, Supp. Fig. 1C, D is indeed IF1. Most likely it represents a non-specific protein recognized by the antibody in the tissue extracts analyzed. Cellular overexpression experiments of IF1 in WT1 cells (Fig. 2E) and primary brown adipocytes (Fig. 4B) also support this argument. Overall, I do not support publication of this study for the reasons stated above.

      RESPONSE: We understand the concerns raised by the reviewer and apologize for the lack of details in our experimental procedures. While we used the same antibody in the study (Cell Sig. cat. Num. 8528, 1:500), we used two different types of gels. The difference in the molecular weight appearance of IF1 is likely through the migration of the protein in the agarose gel. By using custom-made gels, we observe the protein ~17kDa (Fig. 1 and 5), whereas by using commercial gels (Fig. 2, 3, and 4), we observe the protein closer to the predicted molecular weight (i.e. ~12kDa). Of note, gain and loss-of-function experiments, both in vivo as well as in vitro confirm this statement and the specificity of the antibody (Fig. 2, 3, 4, 5, Fig. EV2). In addition, when we ran a custom-made gel with primary BAT cells, we observed again the ~17kDa band (see Figure for the reviewer below). These experiments alongside the absence of other bands in the gels (see uncropped membranes in Supplementary Figure 1) make us conclude that the band we observe is indeed IF1. Nevertheless, we have now updated our methods section, so the reader is aware of our approaches. We hope the reviewer is satisfied with our additional experiments and editions throughout the manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript, Brunneta et al describe the role of IF1 in brown adipose tissue activation using in vivo and in vitro experimental models. They observed that cold adaptation promotes a reduction in IF1 expression and an increase in the reverse activity of mitochondrial ATPase or Complex V. Based on these results, the authors explore the contribution of IF1 in this metabolic pathway by modeling the thermogenic process in differentiated primary brown adipocytes. They silenced and overexpressed IF1 in culture and studied their adrenergic stimulation under norepinephrine.

      Major comments:

      The experiments are well explained and the manuscript flows very well. There are several comments that should be addressed.

      RESPONSE: We thank the reviewer for the kind words regarding our work.

      1. The authors measure ATP hydrolysis in isolated mitochondria from BAT in Figure 1. They observed that IF1 is decreased upon cold exposure and that ATP hydrolysis is increased. They assess protein levels of different OXPHOS proteins, including IF1 but not other proteins of Complex V (ATP5A) as they do in Figures 3 and 4. It is important to see that cold exposure only affects IF1 levels but not other proteins from Complex V. Does IF1/Complex V ratio change? RESPONSE: We thank the reviewer for this suggestion which was also raised by Reviewer #1. We have now measured complex V subunit A in our experimental protocol. We found that cold exposure does not impact complex V protein levels. Given the importance of this information, we have now included it in Figure 1 (Please, see the revised version) alongside the IF1/complex V ratio. In addition, we have now performed WBs in the BAT exposed for 3 and 7 days to thermoneutrality (~28°C) where we found that IF1 is not reduced following whitening of BAT by this approach whilst UCP1 and other mitochondrial proteins are reduced.

      This set of data is now included in Figure 1I,K,L.

      In Figure 2J, the drop in MMP is lower upon adrenergic stimulation than in Figure 2E. The same observation applies to other results when the reduction in MMP after NE addition is minimal. Why do the authors remove TMRM for the measurements of membrane potential? TMRM imaging is normally done in the presence of the dye in non-quenching mode. Treatments should be done prior to the addition of the dye and then TMRM should be added and left during the imaging analysis and measure in non-quenching mode. This might explain some of the above-mentioned points regarding the MMP data. Alternatively, if the dye is removed before the measurements, they should let the cells to adapt and so the dye equilibrates between mitochondria and cytosol. A more elegant method to measure membrane potential could be live-cell imaging. In addition, authors propose that mitochondrial membrane potential upon NE stimulation is maintained by reversal of ATP synthase. If this is the case, one would expect that addition of oligomycin in NE treated adipocytes would cause depolarization. However, in FigS2A this is not the case. Authors should comment on this in addition to considering more elegant approach to measure MMP.

      RESPONSE: We apologize for the lack of details in the methods. All treatments (i.e., transfection and norepinephrine stimulation) were performed before the addition of TMRM. Indeed, this approach does not have the resolution compared to safranine in isolated mitochondria (Fig. 1D), which limits our interpretation regarding the dynamic role of IF1 on MMP in brown adipocytes. We have taken care to state the limitations of our method throughout the entire paper to avoid overinterpretation of our data. Regarding the removal of the dye before the measurements, our internal controls indicate that this procedure does not change the ability of our method to detect fluctuations in MMP (i.e., oligomycin and FCCP as internal controls). Nevertheless, as suggested by the reviewer, to test the time effect of the probe equilibrium (i.e., mitochondria versus cytosol) in our method, we loaded cells with TMRM 20 nM for 30 min and measured the fluorescence right after the removal of the probe/washing steps for another 10 min. We were not able to detect differences in the fluorescence in a time-dependent manner (see below). Therefore, we conclude the removal of TMRM does not influence the fluorescence of the probe in differentiated brown adipocytes.

      +NE

      -NE

      In addition, we performed a similar experiment using TMRM in the quenching mode (200 nM), however, after the removal of TMRM, we added FCCP (1 mM) to the cells for 10 min under constant agitations at 37°C. This approach aimed to expel all TMRM that accumulated within the mitochondria in an MMP-dependent manner. Therefore, excluding the dynamic Brownian movement that we could have caused by the removal of the dye before the measurement mentioned by the reviewer. By doing this, we found the same effect of IF1 overexpression in the reduction of MMP in the presence of norepinephrine.

      Protocol:

      • Transfection (24h) on day 4 of differentiation + 24h just normal media

      • 30 min norepinephrine 10 µM

      • 200 nM TMRM on top of NE

      • Washing step

      • Add FCCP 1 µM for 10 min, and read (The aim here was to release all TMRM accumulated inside of mitochondria in a MMP-dependent manner)

      In summary, the data suggests the removal of the dye from the cells does not influence the fluorescence of TMRM, therefore, enabling us to make conclusions regarding the biological effects of IF1 manipulation in the MMP of brown adipocytes. Regarding the reverse mode of ATP synthase and the absence of effects with oligomycin, given oligomycin inhibits both rotation of ATP synthase and even uncoupled brown adipocytes respond to oligomycin (i.e. reduction in O2 consumption), the prediction of lowering MMP in the presence of oligomycin due to inhibition of the reserve mode of ATP synthase is more complicated than anticipated. Nevertheless, we have now addressed this topic in the discussion section. Lastly, we generally observe a reduction in MMP around 10-25% in differentiated adipocytes upon NE treatment (30 minutes, 10mM). However, due to the differentiation state of the cells, MMP response from norepinephrine fluctuated from experiment to experiment. Therefore, we did not compare experiments performed on different days or batches, but only within the same differentiation batch to reduce variability.

      In Figure 2, in the model of siIF1, there is baseline more phosphorylation of AMPK than in the scramble control (pAMPK). However, this is not the case of p-p38MAPK. Do the authors have any explanation for those differences in baseline activation of the stress kinases when IF1 is silenced? In the same experimental group, addition of NE seems to have more effect in the scrambled than in siIF1, but the plotted data does not reflect these differences. In contrast, increase in pAMPK upon NE is higher in IF1 overexpressing cells compared to EV (Figure 2H), but again this is not reflected in western blot quantification (Figure 2I).

      RESPONSE: Although some differences in pAMPK in the treatments were observed as gathered by the representative blots, these changes were not confirmed later in different biological replicates, therefore, the overall effect of IF1 manipulation in pAMPK does not change. Given we used this approach as quality control for our experiments to guarantee norepinephrine treatment works, we removed the pAMPK data from the study and kept p38 as a marker of adrenergic signaling activation (please see revised Fig. 2 in the main file).

      Does NE promote decrease of IF1 expression in control (siScramble and EV) adipocytes? The authors should test it and see whether it goes in the same direction as the observations derived from the experiments in cold exposed mice. This is very important point, as it could explain the lack of an additional effect of IF1 silencing on NE-induced depolarization (Figure 2E).

      RESPONSE: We thank the reviewer for this suggestion. In line, with the in vivo data, acute NE treatment in differentiated brown adipocytes does not change IF1 mRNA and protein levels. We have now added this information and the corresponding interpretation to the updated manuscript.

      Does NE promote decrease of IF1 expression in the scramble and EV adipocytes? The authors should test it and see whether it goes in the same direction as the observations derived from the experiments in cold exposed mice.

      RESPONSE: As this question is the same as #4, we believe the reviewer may have erroneously pasted this here.

      For MMP data in Fig2, they should include significance between non treated and NE-treated groups. They say: "While UCP1 ablation did not cause any effect on MMP upon adrenergic stimulation...", but NE caused (probably significant) depolarization in siUCP1, which seems even stronger than depolarization in EV. This is opposite to what you would expect. They also didn't confirm UCP1 silencing with western blot.

      RESPONSE: We thank the reviewer for this suggestion. We have now included the expected statistical main effect of NE upon MMP. Although the effects of IF1 overexpression were blunted when Ucp1 was silenced, we indeed still observed the same degree of reduction in MMP in brown adipocytes. This finding has two possible explanations, one is the effectiveness of the silencing protocol, therefore, residual Ucp1 expression may still play a role in this experiment; second, other ATP-consuming processes are able to lower MMP in a UCP1-independent manner. We have added this information to the updated manuscript to make the reader aware of our findings as well as the limitations of the method. Unfortunately, we were not able to detect UCP1 protein levels due to technical issues. Given the effects of IF1 overexpression were blunted when Ucp1 was silenced, we believe this functional outcome is sufficient, alongside mRNA levels, to demonstrate the effectiveness of our silencing protocol.

      It has been established that decreased expression of IF1 promotes increase in the reverse activity of Complex V, ATP hydrolytic activity. Increase in ATP hydrolysis also affects ECAR. The authors should consider this when calculating the contribution of ATP glycolysis versus ATP OXPHOS since the ATP hydrolysis is also playing a role in the ECAR increase. The data should be reinterpreted. ATP hydrolysis should be measured in the situation where IF1 is silenced and overexpressed. These measurements can be done in cells using the seahorse.

      RESPONSE: The only differences we observed in MMP are in the presence of norepinephrine (i.e. UCP-1-dependent proton conductance), which is not present during the estimation of ATP production by Seahorse analysis. Nevertheless, we have now improved the description of our experimental protocol and limitations to estimate ATP production to make it as clear as possible to the reader. Lastly, given the addition of in vivo gain-of-function experiments, we have now determined the ATP hydrolytic activity in this model, which offers a better understanding of the in vivo modulation of IF1 levels affecting ATP synthase activity (reverse mode). We hope the reviewer understands our motivation to focus on the in vivo model of gain-of-function regarding ATP synthase activity.

      The authors use GAPDH as loading control in western blots. They should use another protein since GAPDH is part of the intermediary metabolism and plays a role in glycolysis.

      RESPONSE: We understand the concern of the reviewer regarding the use of GAPDH as a loading control for the studies of metabolism. However, as can be observed by the western blot images, GAPDH levels do not change in our experimental models, therefore, we feel confident that our loading is homogeneous throughout our gels.

      The authors show that reduction of IF1 involves more lipid utilization. They should include more experiments showing the connection of the metabolic adaptation in the absence of IF1 and some lipid imaging.

      RESPONSE: We appreciate this suggestion. We have now performed Oil Red O staining in differentiated adipocytes following ablation of IF1. However, we did not observe any effect on lipid accumulation in primary brown adipocytes following IF1 knockdown. Therefore, the effects of IF1 ablation on lipid mobilization are not due to lipid content or reflected in lipid accumulation. We have now added this new information to the manuscript (please, see the revised form Fig. EV3).

      In the text, "Despite this adjustment of experimental conditions, we did not detect any effect of IF1 ablation on mitochondrial oxygen consumption (Supplementary Fig. 3A,B)", this is true for baseline, NE-driven and ATP-linked respiration, but what about maximal respiration? There is a huge increase in IF1 knockdown... They should explain these results.

      RESPONSE: We perform this experiment to address the question of whether the lipid mobilization induced by norepinephrine would uncouple mitochondria in a UCP1-independent manner. Given the absence of effect between scrambled and IF1 ablated cells in mitochondrial respiration in the presence of norepinephrine and following the addition of oligomycin, we concluded no effect of lipolysis-induced UCP1-independent uncoupling. However, as observed by the reviewer and consistent with other data within the study, the interaction between lipid metabolism and IF1 knockdown seems to affect maximal electron transport chain activity, which although interesting, was not the focus of the present study. Nevertheless, we have now acknowledged these findings and a possible explanation for them in the revised manuscript.

      In Figure 3K they present OCR as % of baseline, but in a similar experiment in Figire 4G it is OCR/protein, they should make the Y axis consistent across experiments.

      RESPONSE: We apologize for this overlook. We have now edited all the axes and labels for consistency.

      The graphical abstract is confusing. In BAT there are two populations of mitochondria, the cytosolic and the mitochondria attached to the lipid droplet, peridroplet mitochondria (PDM). Upon adrenergic stimulation, PDM leave the lipid droplet and lipolysis takes place. The authors propose that upon adrenergic stimulation, IF1 is reduced and there is lipid mobilization. The part of the scheme where it says "fully recruited" should be removed or rewritten, since adrenergic stimulation is not compatible with mitochondria recruitment around the lipid droplet.

      RESPONSE: Thank you for this input. Given the addition of new experiments and interpretation, we have now redrawn the graphical abstract and addressed this topic in the discussion section.

      The title should be rewritten to better reflect the research presented in the manuscript.

      RESPONSE: Thank you for this input. Given the addition of new experiments, we have now rewritten the title accordingly.

      Minor comments:

      Some of the Y axis should be corrected. For example, in Figure 2J, L and M should say % of EV untreated, Similarly, in Figure 2E, it should say % of scramble untreated. In Figure 3N, the Y axis is misspelled. All the Y axis referring to percentages should have the same scale for comparison purposes.

      RESPONSE: Thank you for the proofreading. We have now edited the scales and labels to keep consistency.

      The authors should describe better the results corresponding to Figure 2. There is a lot of information and they should improve the description pertaining the connection between the different pieces of data relating the different signaling pathways that are shown. For westerns in this Figure, they should provide some rationale (one to two sentences in the results section) as to why they are checking the expression of pAMPK and p38-MAPK.

      RESPONSE: We have now edited the description of our results to make them as clear as possible.

      Here are some comments referring to the methods section:

      For Complex V hydrolytic activity, the reaction buffer contains 10mM Na-azide. I guess this is to inhibit respiration, but wouldn't azide also inhibit complex V at this concentration?

      RESPONSE: We thank the reviewer for this question. To test that, we performed complex V activity in buffers containing or not 10 mM sodium azide. As demonstrated below, the presence of sodium azide in the buffer does not influence complex V activity in two different tissues with low and high complex V activity (BAT and heart, respectively).

      Table 1. ATP synthase hydrolytic activity in the presence or absence of Na-azide.

      BAT

      Heart

      +Na-azide

      100 ± 43.01

      100 ± 39.36

      -Na-azide

      82.6 ± 4.33

      111.3 ± 43.32

      +Na-azide + oligomycin

      15.3 ± 4.32*

      13.8 ± 14.01*

      -Na-azide + oligomycin

      14.2 ± 3.53*

      11.9 ± 2.88*

      Data presented as % of control (i.e. presence of Na-azide and absence of oligomycin) for both tissues independently. N = 2-3/condition. Statistical test: two-way ANOVA. * main effect of oligomycin (p In the mitochondrial isolation protocol, they say "mitochondria were centrifuged at 800g for 10min..." Will this speed pellet the mitochondria? I think this is a mistake in writing.

      RESPONSE: We apologize for the lack of clarity. What was centrifuged at 800 g was the whole-tissue homogenate to discard cellular debris, before pelleting mitochondria at 5000 g. We have now corrected this mistake in the methods section.

      For the safranin-O experiment, they don't mention mitochondrial substrate used, probably it's in the reference that they provide, but I think it should be included in the text.

      RESPONSE: We did not use any substrate because our goal was to test the contribution of ATP synthase to mitochondrial membrane potential. For that, we inhibited proton movement within the ETC with antimycin A and through UCP1 with GDP (see Methods). We have now edited our Method’s description to make sure the reader is aware of our approach.

      Reviewer #3 (Significance (Required)):

      The manuscript is well written, and it flows well when reading. However, there are some additional experiments that need to be performed to reach the conclusions the authors claim.

      RESPONSE: We thank the reviewer for the positive commentaries regarding our work and hope to have answered the open questions with the edits and new experiments.

      The role of ATP hydrolysis in BAT thermogenesis is novel and interesting as it can sed some light onto potential approaches to promotes BAT activation.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      This is an interesting investigation into the activity of IF1 in brown adipocytes. The findings are innovative and the conclusion is well-supported by the data. The conclusion is in line with previous reports on IF1 activities in other cell types, particularly in terms of its regulation of FoF1-ATPase. The authors have executed an exceptional job in designing the study, preparing the figures, and writing the manuscript. Overall, this study significantly contributes to the understanding of IF1 activity in brown adipocytes and its role in thermogenesis.

      RESPONSE: We thank the reviewer for the kind words. Please, find below our answers in a point-by-point manner.

      Reviewer #4 (Significance (Required)):

      The study demonstrates involvement of IF1 in regulating thermogenesis in brown adipocytes, which is a unique aspect not covered in existing literature. Advantage of the study is well-designed cellular studies. The major weakness is lack of proof of conclusion in vivo. There are a few minor concerns that should be addressed to further enhance quality of the manuscript.

      RESPONSE: We have now included two in vivo models, whole-body IF1 KO mice and BAT-injected IF1 overexpression to test the role of IF1 in BAT biology. The whole dataset is included in the main manuscript, where we conclude the BAT IF1 overexpression partially suppresses b3-adrenergic induction of thermogenesis alongside a reduction (overall and UCP1 dependent) in mitochondrial oxygen consumption. Also, similar to our in vitro experiments, IF1 KO mice did not present any difference in adrenergic-stimulated oxygen consumption.

      1. Current discussion does not mention the regulation of IF1 protein by the cAMP/PKA pathway. This point should be included to provide a comprehensive understanding of the regulatory mechanisms of IF1 protein. RESPONSE: Thank you for this suggestion. We have now added this topic to the discussion.

      It has been reported that IF1 also influences the structure of mitochondrial crista. Considering the observed changes with IF1 knockdown, it would be valuable to discuss this activity in relation to the findings of the study.

      RESPONSE: We discussed the implications of IF1 modulation in mitochondrial morphology in the revised manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02218R

      Corresponding author(s): Steven, McMahon

      1. General Statements [optional]

      *We were pleased to receive the encouraging critiques and very much appreciate the Reviewer's specific comments and suggestions. In this revised version of our manuscript, we have made a number of substantive additions and modifications in response to these comments/suggestions. We hope you agree that the study is now improved to the point where it is suitable for publication. *

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary This study describes efforts to characterize differences in the roles of the two related human decapping factors Dcp1a and Dcp1b by assessing mRNA decay and protein associations in knockdown and knockout cell lines. The authors conclude that these proteins are non-redundant based on the observations that loss of DCp1a versus Dcp1b impacts the decapping complex (interactome) and the transcriptome differentially.

      Major comments • While the experiments appear to be well designed and executed and the data of generally high quality, the conclusions are drawn without sufficient consideration for the fact that these two proteins form a heterotrimeric complex. The authors assume that there are distinct homotrimeric complexes rather than a single complex with both proteins in. Homotrimers may have new/different functions not normally seen when both proteins are expressed. Thus while it is acceptable to infer that the functions of these two proteins within the decapping complex are distinct, it is not clear that they act separately, or that complexes naturally exist without one or the other. A careful evaluation of the relative ratios of Dcp1a and b overall and in decapping complexes would be informative if the authors want to make stronger statements about the roles of these two factors.

      RESPONSE: Thank you for this valuable comment. We have substantially edited the manuscript to incorporate these points. Examples include a detailed analysis of iBAQ values for the DDX6, DCP1a, and DCP1b interactomes (which now allows us to estimate the ratios of DCP1a and DCP1b in these complexes) and cellular fractionation to interrogate complex integrity (using Superose 6).

      • The concept of buffering is not adequately introduced and the interpretation of observations that RNAs with increased half life do not show increased protein abundance - that Dcp1a/b are involved in transcript buffering is nebulous. In order to support this interpretation, the mRNA abundances (NOT protein abundances) should be assessed, and even then, there is no way to rule out indirect effects. RESPONSE: Thank you for this comment. In the revised version of the manuscript, we introduced the concept of transcript buffering at an earlier stage as one of the potential explanations for our findings. We were also able to use a new algorithm (grandR) to estimate half-lives and synthesis rates from our data. These new data add strength to the argument that DCP1a and DCP1b are linked to transcript buffering pathways.

      • It might be interesting to see what happens when both factors are depleted to get an idea of the overall importance of each one.

      RESPONSE: In our work we tried to emphasize the differences between the two paralogs. We believe that doing double knockout or knockdown would mask the distinct impacts of the paralogs. In data not included in this study, we have shown that cells lacking both DCP1a and DCP1b are viable. We did check PARP cleavage in the CRISPR generated cell pools of DCP1a KO, DCP1b KO, and the double KO. The WB measuring the PARP cleavage is shown in the supplemental material (Supplementary Material: Replicates)

      • The algorithms etc used for data analysis should be included at the time of publication. Version number and settings used for SMART to define protein domains, and webgestalt should be indicated

      RESPONSE: We apologize for this oversight. Version number and settings used for the webtools (SMART, Webgestalt) are now included. The analysis pipeline for half-lives and synthesis rates estimation as well as all the files and the code needed to generate the figures in the paper are available on zenodo (https://zenodo.org/records/10725429).

      • Statistical analysis is not provided for the IP experiments, the number of replicates performed is not indicated and quantification of KD efficiency are not provided.

      RESPONSE: The number of replicates performed in each experiment is now clearly indicated and quantifications of knockdown efficiency are provided (Supplemental Figure 3A and 3B, Figure 3A, Figure 3B).

      • The possibility that the IP Antibody interferes with protein-protein interactions is not mentioned.

      RESPONSE: Thank you for this comment. The revised manuscript includes a discussion of the antibody epitope location and the potential for impact on protein-protein interactions.

      Minor comments • P4 - "This translational repression of mRNA associated with decapping can be reversed, providing another point at which gene expression can be regulated (21)" - implies that decapping can be reversed or that decapped RNAs are translated. I don't think this is technically true.

      RESPONSE: There have been several studies that document the reversal of decapping. These findings are summarized in the following reviews.

      Schoenberg, D. R., & Maquat, L. E. (2009). Re-capping the message. Trends in biochemical sciences, 34(9), 435-442.

      Trotman, J. B., & Schoenberg, D. R. (2019). A recap of RNA recapping. Wiley Interdisciplinary Reviews: RNA, 10(1), e1504.

      • P11 - how common is it for higher eukaryotes to have 2 DCP genes? *RESPONSE: Metazoans have 2 DCP1 genes. *

      • Fig S1 - says "mammalian tissues" in the text but the data is all human. The statement that "expression analyses revealed that DCP1a and DCP1b have concordant rather than reciprocal expression patterns across different mammalian tissues (Supplemental Figure 1)" is a bit misleading as no evidence for correlation or anti-correlation is provided. Also co-expression is not strong support for the idea that these genes have non-redundant functions. Both genes are just expressed in all tissues - there's no evidence provided that they are concordantly expressed. In bone marrow it may be worth noting that one is high and the other low - i.e. reciprocal. *RESPONSE: We appreciate this comment. We have corrected the interpretation of the aforementioned dataset. We have also incorporated a more detailed discussion in the text of the paper. As the Reviewer pointed out, there are a subset of tissues where their expression appears to be reciprocal. *

      • Fig 1A - it is not clear what the different colors mean. Does Sc DCP1 have 1 larger EVH or 2 distinct ones. Are the low complexity regions in Sc DCP2 the SLiMs. *RESPONSE: Thank you for this comment. We have corrected this ambiguity to reflect that Sc DCP1 has one EVH1 domain that is interconnected by a flexible hinge. The low-complexity regions typically contain short linear motifs (SLIMs), however, not all low-complexity regions have been verified to contain them. In the figure, only low-complexity regions are shown. The text of the paper refers only to verified SLIMs . *

      • P11 - why were HCT116 cells selected? RESPONSE: HCT116 cells are an easily transfectable human cell line and have been widely used in biochemical and molecular studies, including studies of mRNA decapping (see references below). Since decapping is impacted by viral proteins we avoided the use of other commonly used cell models such as HEK293T or HeLa.

      https://pubmed.ncbi.nlm.nih.gov/?term=decapping+hct116&sort=date&size=200

      • Fig 1B - what are the asterisks by the RNA names? Might be worth noting that over-expression of DCP1b reduced IP of DCP1a. There's no quantification and no indication of the number of times this experiment was repeated. Data from replicates and quantification of the knockdown efficiency in each replicate would be nice to see. *RESPONSE: Thank you for this comment. Asterisks indicate that those bands were from a second gel, as DCP1a and DCP1b run at approximately the same molecular weight. We have now included a note in our figure legend to indicate this. The knockdown efficiency is provided (Figure 3 and Supplemental Figure 3). We also noted the number of replicas for each IP in figure 1. The replicas are provided as supplementary material (Supplementary Materials: Replicates). *

      • Fig 1C/1D - why are there 3 bands in the DCP1a blot? Quantification of the IP bands is necessary to say whether there is an effect or not of over-expression/KO. RESPONSE: The additional bands in DCP1a blots are background. When we stained the whole blot for DCP1a, in cells which with complete DCP1a KO cells (clone A3), these bands still appear (Supplementary Material: Validation of the KO clones). Quantifications of the bands in the overexpression experiments is now provided.

      • Fig 3 - is it possible that differences are due to epitope positions for the antibodies used for IP? RESPONSE: We do not believe so. DCP1a antibody binds roughly 300-400 residues on DCP1a, and DCP1b antibody binds around Val202. Antibodies therefore do not bind DCP1a or DCP1b low-complexity regions (which are largely responsible for interacting with the decapping complex interactome). Antibodies don't bind the EVH1 domains or the trimerization domain, which are needed for their interaction with DCP2 and each other.

      • Fig 5A - the legend doesn't match the colors in the figure. It is not clear how the pRESPONSE: Thank you for this comment. We have corrected this issue in the revised version of the paper. High-confidence proteins are those with pRESPONSE: Thank you for this comment. We have corrected this issue in the revised version of the paper.*

      • There are a few more recent studies on buffering that should be cited and more discussion of this in the introduction is necessary if conclusions are going to be drawn about buffering. *RESPONSE: We have included a discussion of transcript buffering in the introduction. *

      • The heatmaps in figure 2 are hard to interpret. RESPONSE: To clarify the heatmaps, we included a more detailed description in the figure legends, have enlarged the heatmaps themselves, and have added more extensive labeling.

      Reviewer #1 (Significance (Required)):

      • Strengths: The experiments appear to be done well and the datasets should be useful for the field. • Limitations: The results are overinterpreted - different genes are affected by knocking down one or other of these two similar proteins but this does not really tell us all that much about how the two proteins are functioning in a cell where both are expressed. • Audience: This study will appeal most to a specialized audience consisting of those interested in the basic mechanisms of mRNA decay. Others may find the dataset useful. • This study might complement and/or be informed by another recent study in BioRXiv - https://doi.org/10.1101/2023.09.04.556219 • My field of expertise is mRNA decay - I am qualified to evaluate the findings within the context of this field. I do not have much experience of LC-MS-MS and therefore cannot evaluate the methods/analysis of this part of the study.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors provide evidence that Dcp1a and Dcp1b - two paralogous proteins of the mRNA decapping complex - may have divergent functions in a cancer cell line. In the first part, the authors show that interaction of Dcp2 with EDC4 is diminished upon depletion of Dcp1a but not affected by depletion of Dcp1b. The results have been controlled by overexpression of Dcp1b as it may be limiting factor (i.e. expression levels too low to compensate for depletion of Dcp1a reduced interaction with EDC3/4 while depletion of Dcp1b lead to opposite and increase interactions). They then defined the protein interactome of DDX6 in parental and Dcp1a or Dcp1b depleted cells. Here, the authors show some differential association with EDC4 again, which is along results shown in the first part. The authors further performed SLAM-seq and identified subsets of mRNA whose decay rates are common but also different upon depletion with Dcp1a and Dcp1b. Interestingly, it seems that Dcp1a preferentially targets mRNAs for proteins regulating lymphocyte differentiation. To further test whether changes in RNA decay rates are also reflected at the protein levels, they finally performed an MS analysis with Dcp1a/b depleted cells. However no significant overlap with mRNAs showing altered stability could be observed; and the authors suggested that the lack of congruence reflects translational repression.

      Major comments: 1. While functional difference between Dcp1a and Dcp1b are interesting and likely true, there are overinterpretations that need correction or further evidence for support. Sentences like "DCP1a regulates RNA cap binding proteins association with the decapping complex and DCP1b controls translational initiation factors interactions (Figure 2E)" sound misleading. While differential association with proteins has been recognised with MS-data, it does not necessary implement an active process of control/regulation. To make the claim on 'control/regulation', and inducible system or introduction of mutants would be required.

      RESPONSE: This set of comments were particularly useful in helping us refine the presentation of our findings. We have edited our manuscript to be more specific about the limits of our data.

      1. The MS analysis is not clearly described in the text and it is unclear how authors selected high-confident proteins. The reader needs to consider the supplemental tables to find out what controls were used. Furthermore, the authors should show correlation plots of MS data between replicates. For instance, there seems to be limited correlation among some of the replicates (e.g. Dcp1b_ko3 sample, Fig. 2c). Any explanation in this variance?

      *RESPONSE: We have now included a clear description of how all high-confidence proteins were selected in the Methods and Results sections. The revised manuscript also includes a more thorough description of the controls used and the number of replicates for individual experiments. The PCA plots have now been included where appropriate. The variance in this sample is likely technical. *

      1. GO analysis for the proteome analysis should consider the proteome and not the genome as the background. The authors should also indicate the corrected P-values (multiple testing) FDRs.

      *RESPONSE: Webgestalt uses a reference set of IDs to recognize the input IDs, and it does not use it for the background analysis in the classical sense. We repeated a subset of our proteome analyses using the 'genome-protein coding' as background and obtained the same result as in our original analysis. All ontology analyses now include raw p-values and/or FDRs when appropriate. *

      1. Fig 2E. The figures display GO enrichments needs better explanation and additional data can be added. The enrichment ratio is not explained (is this normalised?) and p-values and FDRs, number of proteins in respective GO category should be added. *RESPONSE: More thorough explanations of the GO enrichments are now included. The supplemental data contains all p-values (raw and adjusted), as well as the number of proteins in each GO category. The Enrichment ratio is normalized and contains information about the number of proteins that are redundant in multiple groups. GO Ontology analyses are now displayed with p-values and/or FDR values, and in this case the enrichment ratio contains information regarding the number of proteins found in our input set and the number of expected proteins in the GO group. The network analysis shows the FDR values and the number of proteins found in the groups compared. *

      Minor: 5. These studies were performed in a colorectal carcinoma cell line (HCT116). The authors should justify the choice of this specialised cell line. Furthermore, one wonders whether similar conclusions can be drawn with other cell lines or whether findings are specific to this cancer line.

      RESPONSE: The study that is currently in pre-print in BioRxiv (https://doi.org/10.1101/2023.09.04.556219*) utilized HEK293Ts and found similar results to ours when examining the various relationships between the core decapping core members. *

      1. Fig. 1B. It is unclear what DCP1b* refers to? There are bands of different size that are not mentioned by the authors - are those protein isoforms or what are those referring to? A molecular marker should be added to each Blots. Uncropped Western images and markers should be provided in the Supplement. *RESPONSE: The asterisk indicates that these images came from a second western blot gel (DCP1a and DCP1b have a similar molecular weight and cannot be probed on the same membrane). Uncropped western blot images and markers (as available) are provided in the supplement. *

      2. MS data submitted to public repository with access. No. indicated in the manuscript.

      RESPONSE: MS data is submitted as supplementary datasets to the paper. It contains the analyzed data as well as the LCMSMS output. We are in the process of submitting the raw LSMSMS data to a public repository.

      Fig 3. A Venn Diagram displaying the overlap of identified proteins should be added. GO analysis should be done considering the proteome as background (as mentioned above).

      *RESPONSE: A Venn diagram showing the overlap among the proteins identified is now included in the revised version. *

      Reviewer #2 (Significance (Required)):

      Overall, this is a large-scale integrative -omics study that suggest functional difference between Dcp1 paralogues. While it seems clear that both paralogous have some different functions and impact, there are overinterpretations in place and further evidence would to be provided to substantiate conclusions made in the paper. For instance, while the interactions with Dcp2/Ddx6 in the absence of Dcp1a,b with EDC4/3 may be altered (Fig. 1, 2), the functional implications of this changed associations remains unresolved and not further discussed. As such, it remains somehow disconnected with the following experiments and compromises the flow of the study. The observed differences in decay-rates for distinct functionally related sets of mRNAs is interesting; however, it remains unclear whether those are direct or rather indirect effects. This is further obscured by the absence of any correlation to changes in protein levels, which the authors interpreted as 'transcriptional buffering'. In this regard, it is puzzling how the authors can make a statement about transcriptional buffering? While this may be an interesting aspect and concept of the discussion, there is no primary data showing such a functional impact.

      As such, the study is interesting as it claims functional differences between DCP1a/b paralogous in a cancer cell line. Nevertheless, I am not sure how trustful the MS analysis and decay measurements are as there is not further validation. It woudl be interesting if the authors could go a bit further and draw some hypothesis how the selectivty could be achieved i.e interaction with RNA-binding proteins that may add some specificity towards the target RNAs for differential decay. As such, the study remains unfortunately rather descriptive without further functional insight.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Review on "Non-redundant roles for the human mRNA decapping cofactor paralogs DCP1a and DCP1b" by Steven McMahon and co-workers mRNA decay is a critical step in the regulation of gene expression. In eukaryotes, mRNA turnover typically begins with the removal of the poly(A) tail, followed by either removal of the 5' cap structure or exonucleolytic 3'-5' decay catalyzed by the exosome. The decapping enzyme DCP2 forms a complex with its co-activator DCP1, which enhances decapping activity. Mammals are equipped with two DCP1 paralogs, namely DCP1a and DCP1b. Metazoans' decapping complexes feature additional components, such as enhancer of decapping 4 (EDC4), which supports the interaction between DCP1 and DCP2, thereby amplifying the efficiency of decapping. This work focuses on DCP1a and DCP1b and investigates their distinct functions. Using DCP1a- and DCP1a-specific knockdowns as well as K.O. cell lines, the authors find surprising differences between the DCP1 paralogs. While DCP1a is essential for the assembly of EDC4-containig decapping complexes and interactions with mRNA cap binding proteins, DCP1b mediates interactions with the translational machinery. Furthermore, DCP1a and DCP1b target different mRNAs for degradation, indicating that they execute non-overlapping functions. The findings reported here expand our understanding of mRNA decapping in human cells, shedding light on the unique contributions of DCP1a and DCP1b to mRNA metabolism. The manuscript tackles an interesting subject. Historically, the emphasis has been on studying DCP1a, while DCP1b has been deemed a functionally redundant homolog of DCP1a. Therefore, it is commendable that the authors have taken on this topic and, with the help of knockout cell lines, aimed to dissect the function of DCP1a and DCP1b. Despite recognizing the significance of the subject and approach, the manuscript falls short of persuading me. Following a promising start in Figure 1 (which still has room for improvement), there is a distinct decline in overall quality, with only relatively standard analyses being conducted. However, I do not want to give the authors a detailed advice on maximizing the potential of their data and presenting it convincingly. So, here are just a few key points for improvement: Figure 1C: Upon closer examination, a faint band is still visible at the size of DCP1a in the DCP1a knockout cells. Could this be leaky expression of DCP1a? The authors should provide an in-depth characterization of their cells (possibly as supplementary material), including identification of genomic changes (e.g. by sequencing of the locus) and Western blots with longer exposure, etc.

      *RESPONSE: Thank you for this comment. The in-depth characterization of our cells is now included in the Supplementary Material. DCP1a KO cells and DCP1b KO cells indicated as single cell clones have been confirmed to have no DCP1a or DCP1b expression. In Figure 1D and Figure 3, polyclonal pool cells were used as indicated (only for DCP1a KO). *

      Figure 2: It is great to see that the effects of the KOs are also visible in the DDX6 immunoprecipitation. However, I wonder if the IP clearly confirms that the KO cells indeed do not express DCP1a or DCP1b. In the heatmap in Figure 2B, it appears as if the proteins are only reduced by a log2-fold change of approximately 1.5? Additionally, Figure 2 shows a problem that persists in the subsequent figures. The visual presentation is not particularly appealing, and essential details, such as the scale of the heatmap in 2B (is it log2 fold?), are lacking.

      *RESPONSE: The in-depth characterization of our cells is included in the Supplementary Materials and confirms the presence of single-cell clones where indicated. As noted above, only Figure 1D and Figure 3 used DCP1a KO pooled cells. The heatmap in Figure 2B is scaled by row using the pheatfunction in R studio. The actual data for the heatmap comes from protein intensities from the LC-MS/MS analysis. We have improved the visual presentation in the revised manuscript. *

      Figure 3: I wonder why there are no primary data shown here, only processed GO analyses. Wouldn't one expect that DCP2 interacts mainly with DCP1a, but less with DCP1b? Is this visible in the data? Moreover, such analyses are rather uninformative (as reflected in the GO terms themselves, for instance, "oxoglutarate dehydrogenase complex" doesn't provide much meaningful insight). The authors should rather try to derive functional and mechanistic insights from their data.

      RESPONSE: We have now revised this Figure to include primary data as well as the IP of DCP1a in DCP1b KO cells (single cell clones) and the IP of DCP1b in DCP1a KO cells (pooled cells). We identified EDC3 in the high-confidence protein pool. The EDC3:DCP1a interaction is enhanced in DCP1b KO cells. We also found that the EDC3:DCP1b interaction is less abundant in DCP1a KO cells. This is consistent with our data in Figures 1 and 2. DCP2 was not identified in the interactomes of either DCP1a or DCP1b. This is not unusual as DCP2 is highly flexible and the association between DCP1s with DCP2 is transient and facilitated by other proteins.

      In Fig. 4 the potential of the approach is not fully exploited. Firstly, I would advocate for omitting the GO analyses, as, in my opinion, they offer little insight. Again, crucial information is missing to assess the results. While 75 nt reads are mentioned in the methods, the sequencing depth remains unspecified. Figure 4b should be included in the supplements. Furthermore, I strongly recommend concentrating on insights into the mechanisms of DCP1a and DCP1b-containing complexes. E.g. what characteristics distinguish DCP1a and DCP1b-dependent mRNAs? Are these targets inherently unstable? Why are they degraded? Are they known decapping substrates?

      *RESPONSE: Thank you for this comment. We have now revised this figure and have included information about sequencing depth and other pertinent information. We have been able to use a newly available algorithm (grandR) and were able to estimate half-lives and synthesis rates. This is a significant addition to the paper. We were also able to compare significantly impacted mRNAs (by DCP1a or DCP1b loss) to the established DCP2 target list. *

      In general, I suggest the authors revise the manuscript with a focus on the potential readers. Reduce Gene Ontology (GO) analyses and heatmaps, and instead, incorporate more analyses regarding the molecular processes associated with the different decapping complexes.

      *RESPONSE: We removed selected GO analyses and heatmaps from the main body of the manuscript (included as Supplementary Figures instead). For our LC-MS/MS datasets, we added iBAQ analyses of the DDX6 IP, DCP1a IP, and DCP1b IP in the control conditions. Cellular fractionation studies (using Superose 6 chromatography) were also added to the paper and allow us to interrogate decapping complex composition in more detail. The revised version of the manuscript includes a new 4SU labeling experiment (pulse-chase) as well as estimation of half-lives and synthesis rates in our conditions. Also included is relevant information about DCP1b transcriptional regulation. *

      Reviewer #3 (Significance (Required)):

      The manuscript in its current form could benefit from substantial revisions for it to be considered impactful for researchers in the field.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      I have trialled the package on my lab's data and it works as advertised. It was straightforward to use and did not require any special training. I am confident this is a tool that will be approachable even to users with limited computational experience. The use of artificial data to validate the approach - and to provide clear limits on applicability - is particularly helpful.

      The main limitation of the tool is that it requires the user to manually select regions. This somewhat limits the generalisability and is also more subjective - users can easily choose "nice" regions that better match with their hypothesis, rather than quantifying the data in an unbiased manner. However, given the inherent challenges in quantifying biological data, such problems are not easily circumventable.

      *

      * I have some comments to clarify the manuscript:

      1. A "straightforward installation" is mentioned. Given this is a Method paper, the means of installation should be clearly laid out.*

      __This sentence is now modified. In the revised manuscript we now describe how to install the toolset and we give the link to the toolset website if further information is needed. __On this website, we provide a full video tutorial and a user manual. The user manual is provided as a supplementary material of the manuscript.

      * It would be helpful if there was an option to generate an output with the regions analysed (i.e., a JPG image with the data and the drawn line(s) on top). There are two reasons for this: i) A major problem with user-driven quantification is accidental double counting of regions (e.g., a user quantifies a part of an image and then later quantifies the same region). ii) Allows other users to independently verify measurements at a later time.*

      We agree that it is helpful to save the analyzed regions. To answer this comment and the other two reviewers' comments pointing at a similar feature, we have now included an automatic saving of the regions of interest. The user will be able to reopen saved regions of interest using a new function we included in the new version of PatternJ.

      * 3. Related to the above point, it is highlighted that each time point would need to be analysed separately (line 361-362). It seems like it should be relatively straightforward to allow a function where the analysis line can be mapped onto the next time point. The user could then adjust slightly for changes in position, but still be starting from near the previous timepoint. Given how prevalent timelapse imaging is, this seems like (or something similar) a clear benefit to add to the software.*

      We agree that the analysis of time series images can be a useful addition. We have added the analysis of time-lapse series in the new version of PatternJ. The principles behind the analysis of time-lapse series and an example of such analysis are provided in Figure 1 - figure supplement 3 and Figure 5, with accompanying text lines 140-153 and 360-372. The analysis includes a semi-automated selection of regions of interest, which will make the analysis of such sequences more straightforward than having to draw a selection on each image of the series. The user is required to draw at least two regions of interest in two different frames, and the algorithm will automatically generate regions of interest in frames in which selections were not drawn. The algorithm generates the analysis immediately after selections are drawn by the user, which includes the tracking of the reference channel.

      * Line 134-135. The level of accuracy of the searching should be clarified here. This is discussed later in the manuscript, but it would be helpful to give readers an idea at this point what level of tolerance the software has to noise and aperiodicity.

      *

      We agree with the reviewer that a clarification of this part of the algorithm will help the user better understand the manuscript.__ We have modified the sentence to clarify the range of search used and the resulting limits in aperiodicity (now lines 176-181). __Regarding the tolerance to noise, it is difficult to estimate it a priori from the choice made at the algorithm stage, so we prefer to leave it to the validation part of the manuscript. We hope this solution satisfies the reviewer and future users.

      *

      **Referees cross-commenting**

      I think the other reviewer comments are very pertinent. The authors have a fair bit to do, but they are reasonable requests. So, they should be encouraged to do the revisions fully so that the final software tool is as useful as possible.

      Reviewer #1 (Significance (Required)):

      Developing software tools for quantifying biological data that are approachable for a wide range of users remains a longstanding challenge. This challenge is due to: (1) the inherent problem of variability in biological systems; (2) the complexity of defining clearly quantifiable measurables; and (3) the broad spread of computational skills amongst likely users of such software.

      In this work, Blin et al., develop a simple plugin for ImageJ designed to quickly and easily quantify regular repeating units within biological systems - e.g., muscle fibre structure. They clearly and fairly discuss existing tools, with their pros and cons. The motivation for PatternJ is properly justified (which is sadly not always the case with such software tools).

      Overall, the paper is well written and accessible. The tool has limitations but it is clearly useful and easy to use. Therefore, this work is publishable with only minor corrections.

      *We thank the reviewer for the positive evaluation of PatternJ and for pointing out its accessibility to the users.

      *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      # Summary

      The authors present an ImageJ Macro GUI tool set for the quantification of one-dimensional repeated patterns that are commonly occurring in microscopy images of muscles.

      # Major comments

      In our view the article and also software could be improved in terms of defining the scope of its applicability and user-ship. In many parts the article and software suggest that general biological patterns can be analysed, but then in other parts very specific muscle actin wordings are used. We are pointing this out in the "Minor comments" sections below. We feel that the authors could improve their work by making a clear choice here. One option would be to clearly limit the scope of the tool to the analysis of actin structures in muscles. In this case we would recommend to also rename the tool, e.g. MusclePatternJ. The other option would be to make the tool about the generic analysis of one-dimensional patterns, maybe calling the tool LinePatternJ. In the latter case we would recommend to remove all actin specific wordings from the macro tool set and also the article should be in parts slightly re-written.

      *

      We agree with the reviewer that our initial manuscript used a mix of general and muscle-oriented vocabulary, which could make the use of PatternJ confusing especially outside of the muscle field. To make PatternJ useful for the largest community, we corrected the manuscript and the PatternJ toolset to provide the general vocabulary needed to make it understandable for every biologist. We modified the manuscript accordingly.

      * # Minor/detailed comments

      # Software

      We recommend considering the following suggestions for improving the software.

      ## File and folder selection dialogs

      In general, clicking on many of the buttons just opens up a file-browser dialog without any further information. For novel users it is not clear what the tool expects one to select here. It would be very good if the software could be rewritten such that there are always clear instructions displayed about which file or folder one should open for the different buttons.*

      We experienced with the current version of macOS that the file-browser dialog does not display any message; we suspect this is the issue raised by the reviewer. This is a known issue of Fiji on Mac and all applications on Mac since 2016. We provided guidelines in the user manual and on the tutorial video to correct this issue by changing a parameter in Fiji. Given the issues the reviewer had accessing the material on the PatternJ website, which we apologize for, we understand the issue raised. We added an extra warning on the PatternJ website to point at this problem and its solution. Additionally, we have limited the file-browser dialog appearance to what we thought was strictly necessary. Thus, the user will experience fewer prompts, speeding up the analysis.

      *

      ## Extract button

      The tool asks one to specify things like whether selections are drawn "M-line-to-M-line"; for users that are not experts in muscle morphology this is not understandable. It would be great to find more generally applicable formulations. *

      We agree that this muscle-oriented vocabulary can make the use of PatternJ confusing. We have now corrected the user interface to provide both general and muscle-specific vocabulary ("center-to-center or edge-to-edge (M-line-to-M-line or Z-disc-to-Z-disc)").*

      ## Manual selection accuracy

      The 1st step of the analysis is always to start from a user hand-drawn profile across intensity patterns in the image. However, this step can cause inaccuracy that varies with the shape and curve of the line profile drawn. If not strictly perpendicular to for example the M line patterns, the distance between intensity peaks will be different. This will be more problematic when dealing with non-straight and parallelly poised features in the image. If the structure is bended with a curve, the line drawn over it also needs to reproduce this curve, to precisely capture the intensity pattern. I found this limits the reproducibility and easy-usability of the software.*

      We understand the concern of the reviewer. On curved selections this will be an issue that is difficult to solve, especially on "S" curved or more complex selections. The user will have to be very careful in these situations. On non-curved samples, the issue may be concerning at first sight, but the errors go with the inverse of cosine and are therefore rather low. For example, if the user creates a selection off by 5 degrees, which is visually obvious, lengths will be affected by an increase of only 0.38%. The point raised by the reviewer is important to discuss, and we therefore added a paragraph to comment on the choice of selection (lines 94-98) and a supplementary figure to help make it clear (Figure 1 - figure supplement 1).*

      ### Reproducibility

      Since the line profile drawn on the image is the first step and very essential to the entire process, it should be considered to save together with the analysis result. For example, as ImageJ ROI or ROIset files that can be re-imported, correctly positioned, and visualized in the measured images. This would greatly improve the reproducibility of the proposed workflow. In the manuscript, only the extracted features are being saved (because the save button is also just asking for a folder containing images, so I cannot verify its functionality). *

      We agree that this is a very useful and important feature. We have added ROI automatic saving. Additionally, we now provide a simplified import function of all ROIs generated with PatternJ and the automated extraction and analysis of the list of ROIs. This can be done from ROIs generated previously in PatternJ or with ROIs generated from other ImageJ/Fiji algorithms. These new features are described in the manuscript in lines 120-121 and 130-132.

      *

      ## ? button

      It would be great if that button would open up some usage instructions.

      *

      We agree with the reviewer that the "?" button can be used in a better way. We have replaced this button with a Help menu, including a simple tutorial showing a series of images detailing the steps to follow by the user, a link to the user website, and a link to our video tutorial.

      * ## Easy improvement of workflow

      I would suggest a reasonable expansion of the current workflow, by fitting and displaying 2D lines to the band or line structure in the image, that form the "patterns" the author aims to address. Thus, it extracts geometry models from the image, and the inter-line distance, and even the curve formed by these sets of lines can be further analyzed and studied. These fitted 2D lines can be also well integrated into ImageJ as Line ROI, and thus be saved, imported back, and checked or being further modified. I think this can largely increase the usefulness and reproducibility of the software.

      *

      We hope that we understood this comment correctly. We had sent a clarification request to the editor, but unfortunately did not receive an answer within the requested 4 weeks of this revision. We understood the following: instead of using our 1D approach, in which we extract positions from a profile, the reviewer suggests extracting the positions of features not as a single point, but as a series of coordinates defining its shape. If this is the case, this is a major modification of the tool that is beyond the scope of PatternJ. We believe that keeping our tool simple, makes it robust. This is the major strength of PatternJ. Local fitting will not use line average for instance, which would make the tool less reliable.

      * # Manuscript

      We recommend considering the following suggestions for improving the manuscript. Abstract: The abstract suggests that general patterns can be quantified, however the actual tool quantifies specific subtypes of one-dimensional patterns. We recommend adapting the abstract accordingly.

      *

      We modified the abstract to make this point clearer.

      * Line 58: Gray-level co-occurrence matrix (GLCM) based feature extraction and analysis approach is not mentioned nor compared. At least there's a relatively recent study on Sarcomeres structure based on GLCM feature extraction: https://github.com/steinjm/SotaTool with publication: *https://doi.org/10.1002/cpz1.462

      • *

      We thank the reviewer for making us aware of this publication. We cite it now and have added it to our comparison of available approaches.

      * Line 75: "...these simple geometrical features will address most quantitative needs..." We feel that this may be an overstatement, e.g. we can imagine that there should be many relevant two-dimensional patterns in biology?!*

      We have modified this sentence to avoid potential confusion (lines 76-77).

      • *

      • Line 83: "After a straightforward installation by the user, ...". We think it would be convenient to add the installation steps at this place into the manuscript. *

      __This sentence is now modified. We now mention how to install the toolset and we provide the link to the toolset website, if further information is needed (lines 86-88). __On the website, we provide a full video tutorial and a user manual.

      * Line 87: "Multicolor images will give a graph with one profile per color." The 'Multicolor images' here should be more precisely stated as "multi-channel" images. Multi-color images could be confused with RGB images which will be treated as 8-bit gray value (type conversion first) images by profile plot in ImageJ. *

      We agree with the reviewer that this could create some confusion. We modified "multicolor" to "multi-channel".

      * Line 92: "...such as individual bands, blocks, or sarcomeric actin...". While bands and blocks are generic pattern terms, the biological term "sarcomeric actin" does not seem to fit in this list. Could a more generic wording be found, such as "block with spike"? *

      We agree with the reviewer that "sarcomeric actin" alone will not be clear to all readers. We modified the text to "block with a central band, as often observed in the muscle field for sarcomeric actin" (lines 103-104). The toolset was modified accordingly.

      * Line 95: "the algorithm defines one pattern by having the features of highest intensity in its centre". Could this be rephrased? We did not understand what that exactly means.*

      We agree with the reviewer that this was not clear. We rewrote this paragraph (lines 101-114) and provided a supplementary figure to illustrate these definitions (Figure 1 - figure supplement 2).

      * Line 124 - 147: This part the only description of the algorithm behind the feature extraction and analysis, but not clearly stated. Many details are missing or assumed known by the reader. For example, how it achieved sub-pixel resolution results is not clear. One can only assume that by fitting Gaussian to the band, the center position (peak) thus can be calculated from continuous curves other than pixels. *

      Note that the two sentences introducing this description are "Automated feature extraction is the core of the tool. The algorithm takes multiple steps to achieve this (Fig. S2):". We were hoping this statement was clear, but the reviewer may refer to something else. We agree that the description of some of the details of the steps was too quick. We have now expanded the description where needed.

      * Line 407: We think the availability of both the tool and the code could be improved. For Fiji tools it is common practice to create an Update Site and to make the code available on GitHub. In addition, downloading the example file (https://drive.google.com/file/d/1eMazyQJlisWPwmozvyb8VPVbfAgaH7Hz/view?usp=drive_link) required a Google login and access request, which is not very convenient; in fact, we asked for access but it was denied. It would be important for the download to be easier, e.g. from GitHub or Zenodo.

      *

      We are sorry for issues encountered when downloading the tool and additional material. We thank the reviewer for pointing out these issues that limited the accessibility of our tool. We simplified the downloading procedure on the website, which does not go through the google drive interface nor requires a google account. Additionally, for the coder community the code, user manual and examples are now available from GitHub at github.com/PierreMangeol/PatternJ, and are provided as supplementary material with the manuscript. To our knowledge, update sites work for plugins but not for macro toolsets. Having experience sharing our codes with non-specialists, a classical website with a tutorial video is more accessible than more coder-oriented websites, which deter many users.

      * Reviewer #2 (Significance (Required)):

      The strength of this study is that a tool for the analysis of one-dimensional repeated patterns occurring in muscle fibres is made available in the accessible open-source platform ImageJ/Fiji. In the introduction to the article the authors provide an extensive review of comparable existing tools. Their new tool fills a gap in terms of providing an easy-to-use software for users without computational skills that enables the analysis of muscle sarcomere patterns. We feel that if the below mentioned limitations could be addressed the tool could indeed be valuable to life scientists interested in muscle patterning without computational skills.

      In our view there are a few limitations, including the accessibility of example data and tutorials at sites.google.com/view/patternj, which we had trouble to access. In addition, we think that the workflow in Fiji, which currently requires pressing several buttons in the correct order, could be further simplified and streamlined by adopting some "wizard" approach, where the user is guided through the steps.

      *As answered above, the links on the PatternJ website are now corrected. Regarding the workflow, we now provide a Help menu with:

      1. __a basic set of instructions to use the tool, __
      2. a direct link to the tutorial video in the PatternJ toolset
      3. a direct link to the website on which both the tutorial video and a detailed user manual can be found. We hope this addresses the issues raised by this reviewer.

      *Another limitation is the reproducibility of the analysis; here we recommend enabling IJ Macro recording as well as saving of the drawn line ROIs. For more detailed suggestions for improvements please see the above sections of our review. *

      We agree that saving ROIs is very useful. It is now implemented in PatternJ.

      We are not sure what this reviewer means by "enabling IJ Macro recording". The ImageJ Macro Recorder is indeed very useful, but to our knowledge, it is limited to built-in functions. Our code is open and we hope this will be sufficient for advanced users to modify the code and make it fit their needs.*

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript, the authors present a new toolset for the analysis of repetitive patterns in biological images named PatternJ. One of the main advantages of this new tool over existing ones is that it is simple to install and run and does not require any coding skills whatsoever, since it runs on the ImageJ GUI. Another advantage is that it does not only provide the mean length of the pattern unit but also the subpixel localization of each unit and the distributions of lengths and that it does not require GPU processing to run, unlike other existing tools. The major disadvantage of the PatternJ is that it requires heavy, although very simple, user input in both the selection of the region to be analyzed and in the analysis steps. Another limitation is that, at least in its current version, PatternJ is not suitable for time-lapse imaging. The authors clearly explain the algorithm used by the tool to find the localization of pattern features and they thoroughly test the limits of their tool in conditions of varying SNR, periodicity and band intensity. Finally, they also show the performance of PatternJ across several biological models such as different kinds of muscle cells, neurons and fish embryonic somites, as well as different imaging modalities such as brightfield, fluorescence confocal microscopy, STORM and even electron microscopy.

      This manuscript is clearly written, and both the section and the figures are well organized and tell a cohesive story. By testing PatternJ, I can attest to its ease of installation and use. Overall, I consider that PatternJ is a useful tool for the analysis of patterned microscopy images and this article is fit for publication. However, i do have some minor suggestions and questions that I would like the authors to address, as I consider they could improve this manuscript and the tool:

      *We are grateful to this reviewer for this very positive assessment of PatternJ and of our manuscript.

      * Minor Suggestions: In the methodology section is missing a more detailed description about how the metric plotted was obtained: as normalized intensity or precision in pixels. *

      We agree with the reviewer that a more detailed description of the metric plotted was missing. We added this information in the method part and added information in the Figure captions where more details could help to clarify the value displayed.

      * The validation is based mostly on the SNR and patterns. They should include a dataset of real data to validate the algorithm in three of the standard patterns tested. *

      We validated our tool using computer-generated images, in which we know with certainty the localization of patterns. This allowed us to automatically analyze 30 000 images, and with varying settings, we sometimes analyzed 10 times the same image, leading to about 150 000 selections analyzed. From these analyses, we can provide with confidence an unbiased assessment of the tool precision and the tool capacity to extract patterns. We already provided examples of various biological data images in Figures 4-6, showing all possible features that can be extracted with PatternJ. In these examples, we can claim by eye that PatternJ extracts patterns efficiently, but we cannot know how precise these extractions are because of the nature of biological data: "real" positions of features are unknown in biological data. Such validation will be limited to assessing whether a pattern was found or not, which we believe we already provided with the examples in Figures 4-6.

      * The video tutorial available in the PatternJ website is very useful, maybe it would be worth it to include it as supplemental material for this manuscript, if the journal allows it. *

      As the video tutorial may have been missed by other reviewers, we agree it is important to make it more prominent to users. We have now added a Help menu in the toolset that opens the tutorial video. Having the video as supplementary material could indeed be a useful addition if the size of the video is compatible with the journal limits.

      * An example image is provided to test the macro. However, it would be useful to provide further example images for each of the three possible standard patterns suggested: Block, actin sarcomere or individual band.*

      We agree this can help users. We now provide another multi-channel example image on the PatternJ website including blocks and a pattern made of a linear intensity gradient that can be extracted with our simpler "single pattern" algorithm, which were missing in the first example. Additionally, we provide an example to be used with our new time-lapse analysis.

      * Access to both the manual and the sample images in the PatternJ website should be made publicly available. Right now they both sit in a private Drive account. *

      As mentioned above, we apologize for access issues that occurred during the review process. These files can now be downloaded directly on the website without any sort of authentication. Additionally, these files are now also available on GitHub.

      * Some common errors are not properly handled by the macro and could be confusing for the user: When there is no selection and one tries to run a Check or Extraction: "Selection required in line 307 (called from line 14). profile=getProfile( ;". A simple "a line selection is required" message would be useful there. When "band" or "block" is selected for a channel in the "Set parameters" window, yet a 0 value is entered into the corresponding "Number of bands or blocks" section, one gets this error when trying to Extract: "Empty array in line 842 (called from line 113). if ( ( subloc . length == 1 ) & ( subloc [ 0 == 0) ) {". This error is not too rare, since the "Number of bands or blocks" section is populated with a 0 after choosing "sarcomeric actin" (after accepting the settings) and stays that way when one changes back to "blocks" or "bands".*

      We thank the reviewer for pointing out these bugs. These bugs are now corrected in the revised version.

      * The fact that every time one clicks on the most used buttons, the getDirectory window appears is not only quite annoying but also, ultimately a waste of time. Isn't it possible to choose the directory in which to store the files only once, from the "Set parameters" window?*

      We have now found a solution to avoid this step. The user is only prompted to provide the image folder when pressing the "Set parameter" button. We kept the prompt for directory only when the user selects the time-lapse analysis or the analysis of multiple ROIs. The main reason is that it is very easy for the analysis to end up in the wrong folder otherwise.

      * The authors state that the outputs of the workflow are "user friendly text files". However, some of them lack descriptive headers (like the localisations and profiles) or even file names (like colors.txt). If there is something lacking in the manuscript, it is a brief description of all the output files generated during the workflow.*

      PatternJ generates multiple files, several of which are internal to the toolset. They are needed to keep track of which analyses were done, and which colors were used in the images, amongst others. From the user part, only the files obtained after the analysis All_localizations.channel_X.txt and sarcomere_lengths.txt are useful. To improve the user experience, we now moved all internal files to a folder named "internal", which we think will clarify which outputs are useful for further analysis, and which ones are not. We thank the reviewer for raising this point and we now mention it in our Tutorial.

      I don't really see the point in saving the localizations from the "Extraction" step, they are even named "temp".

      We thank the reviewer for this comment, this was indeed not necessary. We modified PatternJ to delete these files after they are used.

      * In the same line, I DO see the point of saving the profiles and localizations from the "Extract & Save" step, but I think they should be deleted during the "Analysis" step, since all their information is then grouped in a single file, with descriptive headers. This deleting could be optional and set in the "Set parameters" window.*

      We understand the point raised by the reviewer. However, the analysis depends on the reference channel picked, which is asked for when starting an analysis, and can be augmented with additional selections. If a user chooses to modify the reference channel or to add a new profile to the analysis, deleting all these files would mean that the user will have to start over again, which we believe will create frustration. An optional deletion at the analysis step is simple to implement, but it could create problems for users who do not understand what it means practically.

      * Moreover, I think it would be useful to also save the linear roi used for the "Extract & Save" step, and eventually combine them during the "Analysis step" into a single roi set file so that future re-analysis could be made on the same regions. This could be an optional feature set from the "Set parameters" window. *

      We agree with the reviewer that saving ROIs is very useful. ROIs are now saved into a single file each time the user extracts and saves positions from a selection. Additionally, the user can re-use previous ROIs and analyze an image or image series in a single step.

      * In the "PatternJ workflow" section of the manuscript, the authors state that after the "Extract & Save" step "(...) steps 1, 2, 4, and 5 can be repeated on other selections (...)". However, technically, only steps 1 and 5 are really necessary (alternatively 1, 4 and 5 if the user is unsure of the quality of the patterning). If a user follows this to the letter, I think it can lead to wasted time.

      *

      We agree with the reviewer and have corrected the manuscript accordingly (line 119-120).

      • *

      *I believe that the "Version Information" button, although important, has potential to be more useful if used as a "Help" button for the toolset. There could be links to useful sources like the manuscript or the PatternJ website but also some tips like "whenever possible, use a higher linewidth for your line selection" *

      We agree with the reviewer as pointed out in our previous answers to the other reviewers. This button is now replaced by a Help menu, including a simple tutorial in a series of images detailing the steps to follow, a link to the user website, and a link to our video tutorial.

      * It would be interesting to mention to what extent does the orientation of the line selection in relation to the patterned structure (i.e. perfectly parallel vs more diagonal) affect pattern length variability?*

      As answered to reviewer 1, we understand this concern, which needs to be clarified for readers. The issue may be concerning at first sight, but the errors grow only with the inverse of cosine and are therefore rather low. For example, if the user creates a selection off by 3 degrees, which is visually obvious, lengths will be affected by an increase of only 0.14%. The point raised by the reviewer is important to discuss, and we therefore have added a comment on the choice of selection (lines 94-98) as well as a supplementary figure (Figure 1 - figure supplement 1).

      * When "the algorithm uses the peak of highest intensity as a starting point and then searches for peak intensity values one spatial period away on each side of this starting point" (line 133-135), does that search have a range? If so, what is the range? *

      We agree that this information is useful to share with the reader. The range is one pattern size. We have modified the sentence to clarify the range of search used and the resulting limits in aperiodicity (now lines 176-181).

      * Line 144 states that the parameters of the fit are saved and given to the user, yet I could not find such information in the outputs. *

      The parameters of the fits are saved for blocks. We have now clarified this point by modifying the manuscript (lines 186-198) and modifying Figure 1 - figure supplement 5. We realized we made an error in the description of how edges of "block with middle band" are extracted. This is now corrected.

      * In line 286, authors finish by saying "More complex patterns from electron microscopy images may also be used with PatternJ.". Since this statement is not backed by evidence in the manuscript, I suggest deleting it (or at the very least, providing some examples of what more complex patterns the authors refer to). *

      This sentence is now deleted.

      * In the TEM image of the fly wing muscle in fig. 4 there is a subtle but clearly visible white stripe pattern in the original image. Since that pattern consists of 'dips', rather than 'peaks' in the profile of the inverted image, they do not get analyzed. I think it is worth mentioning that if the image of interest contains both "bright" and "dark" patterns, then the analysis should be performed in both the original and the inverted images because the nature of the algorithm does not allow it to detect "dark" patterns. *

      We agree with the reviewer's comment. We now mention this point in lines 337-339.

      * In line 283, the authors mention using background correction. They should explicit what method of background correction they used. If they used ImageJ's "subtract background' tool, then specify the radius.*

      We now describe this step in the method section.

      *

      Reviewer #3 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. Being a software paper, the advance proposed by the authors is technical in nature. The novelty and significance of this tool is that it offers quick and simple pattern analysis at the single unit level to a broad audience, since it runs on the ImageJ GUI and does not require any programming knowledge. Moreover, all the modules and steps are well described in the paper, which allows easy going through the analysis.
      • Place the work in the context of the existing literature (provide references, where appropriate). The authors themselves provide a good and thorough comparison of their tool with other existing ones, both in terms of ease of use and on the type of information extracted by each method. While PatternJ is not necessarily superior in all aspects, it succeeds at providing precise single pattern unit measurements in a user-friendly manner.
      • State what audience might be interested in and influenced by the reported findings. Most researchers working with microscopy images of muscle cells or fibers or any other patterned sample and interested in analyzing changes in that pattern in response to perturbations, time, development, etc. could use this tool to obtain useful, and otherwise laborious, information. *

      We thank the reviewer for these enthusiastic comments about how straightforward for biologists it is to use PatternJ and its broad applicability in the bio community.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The authors present an ImageJ Macro GUI tool set for the quantification of one-dimensional repeated patterns that are commonly occurring in microscopy images of muscles.

      Major comments

      In our view the article and also software could be improved in terms of defining the scope of its applicability and user-ship. In many parts the article and software suggest that general biological patterns can be analysed, but then in other parts very specific muscle actin wordings are used. We are pointing this out in the "Minor comments" sections below. We feel that the authors could improve their work by making a clear choice here. One option would be to clearly limit the scope of the tool to the analysis of actin structures in muscles. In this case we would recommend to also rename the tool, e.g. MusclePatternJ. The other option would be to make the tool about the generic analysis of one-dimensional patterns, maybe calling the tool LinePatternJ. In the latter case we would recommend to remove all actin specific wordings from the macro tool set and also the article should be in parts slightly re-written.

      Minor/detailed comments

      Software

      We recommend considering the following suggestions for improving the software.

      File and folder selection dialogs

      In general, clicking on many of the buttons just opens up a file-browser dialog without any further information. For novel users it is not clear what the tool expects one to select here. It would be very good if the software could be rewritten such that there are always clear instructions displayed about which file or folder one should open for the different buttons.

      Extract button

      The tool asks one to specify things like whether selections are drawn "M-line-to-M-line"; for users that are not experts in muscle morphology this is not understandable. It would be great to find more generally applicable formulations.

      Manual selection accuracy

      The 1st step of the analysis is always to start from a user hand-drawn profile across intensity patterns in the image. However, this step can cause inaccuracy that varies with the shape and curve of the line profile drawn. If not strictly perpendicular to for example the M line patterns, the distance between intensity peaks will be different. This will be more problematic when dealing with non-straight and parallelly poised features in the image. If the structure is bended with a curve, the line drawn over it also needs to reproduce this curve, to precisely capture the intensity pattern. I found this limits the reproducibility and easy-usability of the software.

      Reproducibility

      Since the line profile drawn on the image is the first step and very essential to the entire process, it should be considered to save together with the analysis result. For example, as ImageJ ROI or ROIset files that can be re-imported, correctly positioned, and visualized in the measured images. This would greatly improve the reproducibility of the proposed workflow. In the manuscript, only the extracted features are being saved (because the save button is also just asking for a folder containing images, so I cannot verify its functionality).

      ? button

      It would be great if that button would open up some usage instructions.

      Easy improvement of workflow

      I would suggest a reasonable expansion of the current workflow, by fitting and displaying 2D lines to the band or line structure in the image, that form the "patterns" the author aims to address. Thus, it extracts geometry models from the image, and the inter-line distance, and even the curve formed by these sets of lines can be further analyzed and studied. These fitted 2D lines can be also well integrated into ImageJ as Line ROI, and thus be saved, imported back, and checked or being further modified. I think this can largely increase the usefulness and reproducibility of the software.

      Manuscript

      We recommend considering the following suggestions for improving the manuscript. Abstract: The abstract suggests that general patterns can be quantified, however the actual tool quantifies specific subtypes of one-dimensional patterns. We recommend adapting the abstract accordingly.

      Line 58: Gray-level co-occurrence matrix (GLCM) based feature extraction and analysis approach is not mentioned nor compared. At least there's a relatively recent study on Sarcomeres structure based on GLCM feature extraction: https://github.com/steinjm/SotaTool with publication: https://doi.org/10.1002/cpz1.462

      Line 75: "...these simple geometrical features will address most quantitative needs..." We feel that this may be an overstatement, e.g. we can imagine that there should be many relevant two-dimensional patterns in biology?!

      Line 83: "After a straightforward installation by the user, ...". We think it would be convenient to add the installation steps at this place into the manuscript.

      Line 87: "Multicolor images will give a graph with one profile per color." The 'Multicolor images' here should be more precisely stated as "multi-channel" images. Multi-color images could be confused with RGB images which will be treated as 8-bit gray value (type conversion first) images by profile plot in ImageJ.

      Line 92: "...such as individual bands, blocks, or sarcomeric actin...". While bands and blocks are generic pattern terms, the biological term "sarcomeric actin" does not seem to fit in this list. Could a more generic wording be found, such as "block with spike"?

      Line 95: "the algorithm defines one pattern by having the features of highest intensity in its centre". Could this be rephrased? We did not understand what that exactly means.

      Line 124 - 147: This part the only description of the algorithm behind the feature extraction and analysis, but not clearly stated. Many details are missing or assumed known by the reader. For example, how it achieved sub-pixel resolution results is not clear. One can only assume that by fitting Gaussian to the band, the center position (peak) thus can be calculated from continuous curves other than pixels.

      Line 407: We think the availability of both the tool and the code could be improved. For Fiji tools it is common practice to create an Update Site and to make the code available on GitHub. In addition, downloading the example file (https://drive.google.com/file/d/1eMazyQJlisWPwmozvyb8VPVbfAgaH7Hz/view?usp=drive_link) required a Google login and access request, which is not very convenient; in fact, we asked for access but it was denied. It would be important for the download to be easier, e.g. from GitHub or Zenodo.

      Significance

      The strength of this study is that a tool for the analysis of one-dimensional repeated patterns occurring in muscle fibres is made available in the accessible open-source platform ImageJ/Fiji. In the introduction to the article the authors provide an extensive review of comparable existing tools. Their new tool fills a gap in terms of providing an easy-to-use software for users without computational skills that enables the analysis of muscle sarcomere patterns. We feel that if the below mentioned limitations could be addressed the tool could indeed be valuable to life scientists interested in muscle patterning without computational skills.

      In our view there are a few limitations, including the accessibility of example data and tutorials at sites.google.com/view/patternj, which we had trouble to access. In addition, we think that the workflow in Fiji, which currently requires pressing several buttons in the correct order, could be further simplified and streamlined by adopting some "wizard" approach, where the user is guided through the steps. Another limitation is the reproducibility of the analysis; here we recommend enabling IJ Macro recording as well as saving of the drawn line ROIs. For more detailed suggestions for improvements please see the above sections of our review.

  3. Apr 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Bell et al. provide an exhaustive and clear description of the diversity of a new class of predicted type IV restriction systems that the authors denote as CoCoNuTs, for their characteristic presence of coiled-coil segments and nuclease tandems. Along with a comprehensive analysis that includes phylogenetics, protein structure prediction, extensive protein domain annotations, and an in-depth investigation of encoding genomic contexts, they also provide detailed hypotheses about the biological activity and molecular functions of the members of this class of predicted systems. This work is highly relevant, it underscores the wide diversity of defence systems that are used by prokaryotes and demonstrates that there are still many systems to be discovered. The work is sound and backed-up by a clear and reasonable bioinformatics approach. I do not have any major issues with the manuscript, but only some minor comments.

      Strengths:

      The analysis provided by the authors is extensive and covers the three most important aspects that can be covered computationally when analysing a new family/superfamily: phylogenetics, genomic context analysis, and protein-structure-based domain content annotation. With this, one can directly have an idea about the superfamily of the predicted system and infer their biological role. The bioinformatics approach is sound and makes use of the most current advances in the fields of protein evolution and structural bioinformatics.

      Weaknesses:

      It is not clear how coiled-coil segments were assigned if only based on AF2-predicted models or also backed by sequence analysis, as no description is provided in the methods. The structure prediction quality assessment is based solely on the average pLDDT of the obtained models (with a threshold of 80 or better). However, this is not enough, particularly when multimeric models are used. The PAE matrix should be used to evaluate relative orientations, particularly in the case where there is a prediction that parts from 2 proteins are interacting. In the case of multimers, interface quality scores, such as the ipTM or pDockQ, should also be considered and, at minimum, reported.

      A description of the coiled-coil predictions has been added to the Methods. For multimeric models, PAE matrices and ipTM+pTM scores have been included in Supplementary Data File S1.

      Reviewer #2 (Public Review):

      Summary:

      In this work, using in-depth computational analysis, Bell et al. explore the diverse repertoire of type IV McrBC modification-dependent restriction systems. The prototypical two-component McrBC system has been structurally and functionally characterised and is known to act as a defence by restricting phage and foreign DNA containing methylated cytosines. Here, the authors find previously unanticipated complexity and versatility of these systems and focus on detailed analysis and classification of a distinct branch, the so-called CoCoNut, named after its composition of coiled-coil structures and tandem nucleases. These CoCoNut systems are predicted to target RNA as well as DNA and to utilise defence mechanisms with some similarity to type III CRISPR-Cas systems.

      Strengths:

      This work is enriched with a plethora of ideas and a myriad of compelling hypotheses that now await experimental verification. The study comes from the group that was amongst the first to describe, characterize, and classify CRISPR-Cas systems. By analogy, the findings described here can similarly promote ingenious experimental and conceptual research that could further drive technological advances. It could also instigate vigorous scientific debates that will ultimately benefit the community.

      Weaknesses:

      The multi-component systems described here function in the context of large oligomeric complexes. Some of the single chain AF2 predictions shown in this work are not compatible, for example, with homohexameric complex formation due to incompatible orientation of domains. The recent advances in protein structure prediction, in particular AlphaFold2 (AF2) multimer, now allow us to confidently probe potential protein-protein interactions and protein complex formation. This predictive power could be exploited here to produce a better glimpse of these multimeric protein systems. It can also provide a more sound explanation for some of the observed differences amongst different McrBC types.

      Hexameric CnuB complexes with CnuC stimulatory monomers for Type I-A, I-B, I-C, II, and III-A CoCoNuT systems have been modeled with AF2 and included in Supplementary Data File S1, albeit without the domains fused to the GTPase N-terminus (with the exception of Type I-B, which lacks the long coiled-coil domain fused to the GTPase and was modeled with its entire sequence). Attempts to model the other full-length CnuB hexamers did not lead to convincing results.

      Recommendations for the authors:

      Reviewing Editor:

      The detailed recommendations by the two reviewers will help the authors to further strengthen the manuscript, but two points seem particularly worth considering: 1. The methods are barely sketched in the manuscript, but it could be useful to detail them more closely. Particularly regarding the coiled-coil segments, which are currently just statists, useful mainly for the name of the family, more detail on their prediction, structural properties, and purpose would be very helpful. 2. Due to its encyclopedic nature, the wealth of material presented in the paper makes it hard to penetrate in one go. Any effort to make it more accessible would be very welcome. Reviewer 1 in particular has made a number of suggestions regarding the figures, which would make them provide more support for the findings described in the text.

      A description of the techniques used to identify coiled-coil segments has been added to the Methods. Our predictions ranged from near certainty in the coiled-coils detected in CnuB homologs, to shorter helices at the limit of detection in other factors. We chose to report all probable coiled-coils, as the extensive coiled-coils fused to CnuB, which are often the only domain present other than the GTPase, imply involvement in mediating complex formation by interacting with coiled-coils in other factors, particularly the other CoCoNuT factors. The suggestions made by Reviewer 1 were thoughtful and we made an effort to incorporate them.

      Reviewer #1 (Recommendations For The Authors):

      I do not have any major issues with the manuscript. I have however some minor comments, as described below.

      • The last sentence of the abstract at first reads as a fact and not a hypothesis resulting from the work described in the manuscript. After the second read, I noticed the nuances in the sentence. I would suggest a rephrasing to emphasize that the activity described is a theoretical hypothesis not backed-up by experiments.

      This sentence has been rephrased to make explicit the hypothetical nature of the statement.

      • In line 64, the authors rename DUF3578 as ADAM because indeed its function is not unknown. Did the authors consider reaching out to InterPro to add this designation to this DUF? A search in interpro with DUF3578 results in "MrcB-like, N-terminal domain" and if a name is suggested, it may be worthwhile to take it to the IntrePro team.

      We will suggest this nomenclature to InterPro.

      • I find Figure 1E hard to analyse and think it occupies too much space for the information it provides. The color scheme, the large amount of small slices, and the lack of numbers make its information content very small. I would suggest moving this to the supplementary and making it instead a bar plot. If removed from Figure 1, more space is made available for the other panels, particularly the structural superpositions, which in my opinion are much more important.

      We have removed Figure 1E from the paper as it adds little information beyond the abundance and phyletic distribution of sequenced prokaryotes, in which McrBC systems are plentiful.

      • In Figure 2, it is not clear due to the presence of many colorful "operon schemes" that the tree is for a single gene and not for the full operon segment. Highlighting the target gene in the operons or signalling it somehow would make the figure easy to understand even in the absence of the text and legend. The same applies to Supplementary Figure 1.

      The legend has been modified to show more clearly that this is a tree of McrB-like GTPases.

      • In line 146, the authors write "AlphaFold-predicted endonucelase fold" to say that a protein contains a region that AF2 predicts to fold like an endonuclease. This is a weird way of writing it and can be confusing to non-expert readers. I would suggest rephrasing for increased clarity.

      This sentence has been rephrased for greater clarity.

      • In line 167, there is a [47]. I believe this is probably due to a previous reference formatting.

      Indeed, this was a reference formatting error and has been fixed.

      • In most figures, the color palette and the use of very similar color palettes for taxonomy pie charts, genomic context composition schemes, and domain composition diagrams make it really hard to have a good understanding of the image at first. Legends are often close to each other, and it is not obvious at first which belong to what. I would suggest changing the layouts and maybe some color schemes to make it easier to extract the information that these figures want to convey.

      It seemed that Figure 4 was the most glaring example of these issues, and it has been rearranged for easier comprehension.

      • In the paragraph that starts at line 199, the authors mention an Ig-like domain that is often found at the N-terminus of Type I CoCoNuTs. Are they all related to each other? How conserved are these domains?

      These domains are all predicted to adopt a similar beta-sandwich fold and are found at the N-terminus of most CoCoNuT CnuC homologs, suggesting they are part of the same family, but we did not undertake a more detailed sequenced-based analysis of these regions.

      We also find comparable domains in the CnuC/McrC-like partners of the abundant McrB-like NxD motif GTPases that are not part of CoCoNuT systems, and given the similarity of some of their predicted structures to Rho GDP-dissociation inhibitor 1, we suspect that they have coevolved as regulators of the non-canonical NxD motif GTPase type. Our CnuBC multimer models showing consistent proximity between these domains in CnuC and CnuB GTPase domains suggest this could indeed be the case. We plan to explore these findings further in a forthcoming publication.

      • In line 210, the authors write "suggesting a role in overcrowding-induced stress response". Why so? In >all other cases, the authors justify their hypothesis, which I really appreciated, but not here.

      A supplementary note justifying this hypothesis has been added to Supplementary Data File S1.

      • At the end of the paragraph that starts in line 264, the authors mention that they constructed AF2 multimeric models to predict if 2 proteins would interact. However, no quality scores were provided, particularly the PAE matrix. This would allow for a better judgement of this prediction, and I would suggest adding the PAE matrix as another panel in the figure where the 3D model of the complex is displayed.

      The PAE matrix and ipTM+pTM scores for this and other multimer models have been added to Supplementary Data File S1. For this model in particular, the surface charge distribution of the model has been presented to support the role of the domains that have a higher PAE in RNA binding.

      • In line 306, "(supplementary data)" refers to what part of the file?

      This file has been renamed Supplementary Table S3 and referenced as such.

      • In line 464, the authors suggest that ShdA could interact with CoCoNuTs. Why not model the complex as done for other cases? what would co-folding suggest?

      As we were not able to convincingly model full-length CnuB hexamers with N-terminal coiled-coils, we did not attempt modeling of this hypothetical complex with another protein with a long coiled-coil, but it remains an interesting possibility.

      • In line 528, why and how were some genes additionally analyzed with HHPred?

      Justification for this analysis has been added to the Methods, but briefly, these genes were additionally analyzed if there were no BLAST hits or to confirm the hits that were obtained.

      • In the first section of the methods, the first and second (particularly the second) paragraphs are extremely long. I would suggest breaking them to facilitate reading.

      This change has been made.

      • In line 545, what do the authors mean by "the alignment (...) were analyzed with HHPred"?

      A more detailed description of this step has been added to the Methods.

      • The authors provide the models they produced as well as extensive supplementary tables that make their data reusable, but they do not provide the code for the automated steps, as to excise target sequence sections out of multiple sequence alignments, for example.

      The code used for these steps has been in use in our group at the NCBI for many years. It will be difficult to utilize outside of the NCBI software environment, but for full disclosure, we have included a zipped repository with the scripts and custom-code dependencies, although there are external dependencies as well such as FastTree and BLAST. In brief, it involves PSI-BLAST detection of regions with the most significant homology to one of a set of provided alignments (seals-2-master/bin/wrappers/cog_psicognitor). In this case, the reference alignments of McrB-like GTPases and DUF2357 were generated manually using HHpred to analyze alignments of clustered PSI-BLAST results. This step provided an output of coordinates defining domain footprints in each query sequence, which were then combined and/or extended using scripts based on manual analysis of many examples with HHpred (footprint_finders/get_GTPase_frags.py and footprint_finders/get_DUF2357_frags.py), then these coordinates were used to excise such regions from the query amino acid sequence with a final script (seals-2-master/bin/misc/fa2frag).

      Reviewer #2 (Recommendations For The Authors):

      (1) Page 4, line 77 - 'PUA superfamily domains' could be more appropriate to use instead of "EVE superfamily".

      While this statement could perhaps be applied to PUA superfamily domains, our previous work we refer to, which strongly supports the assertion, was restricted to the EVE-like domains and we prefer to retain the original language.

      (2) Page 5. lines 128-130 - AF2 multimer prediction model could provide a more sound explanation for these differences.

      Our AF2 multimer predictions added in this revision indeed show that the NxD motif McrB-like CoCoNuT GTPases interact with their respective McrC-like partners such that an immunoglobulin-like beta-sandwich domain, fused to the N-termini of the McrC homologs and similar to Rho GDP-dissociation inhibitor 1, has the potential to physically interact with the GTPase variants. However, we did not probe this in greater detail, as it is beyond the scope of this already highly complex article, but we plan to study it in the future.

      (3) Page 8, line 252 - The surface charge distribution of CnuH OB fold domain looks very different from SmpB (pdb3iyr). In fact, the regions that are in contact with RNA in SmpB are highly acidic in CoCoNut CnuH. Although it looks likely that this domain is involved in RNA binding, the mode of interaction should be very different.

      We did not detect a strong similarity between the CnuH SmpB-like SPB domain and PDB 3IYR, but when we compare the surface charge distribution of PDB 1WJX and the SPB domain, while there is a significant area that is positively charged in 1WJX that is negatively charged in SPB, there is much that overlaps with the same charge in both domains.

      The similarity between SmpB and the SPB domain is significant, but definitely not exact. An important question for future studies is: If the domains are indeed related due to an ancient fusion of SmpB to an ancestor of CnuH, would this degree of divergence be expected?

      In other words, can we say anything about how the function of a stand-alone tmRNA-binding protein could evolve after being fused to a complex predicted RNA helicase with other predicted RNA binding domains already present? Experimental validation will ultimately be necessary to resolve these kinds of questions, but for now, it may be safe to say that the presence of this domain, especially in conjunction with the neighboring RelE-like RTL domain and UPF1-like helicase domain, signals a likely interaction with the A-site of the ribosome, and perhaps restriction of aberrant/viral mRNA.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      The manuscript by Barba-Aliaga and colleagues describe a potential function of eIF5A for the control of TIM50 translation. The authors showed that in temperature-sensitive mutants of eIF5A several mitochondrial proteins are decreased including OXPHOS subunits, proteins of the TCA cycle and some components of protein translocases. Some precursor proteins appear to localize into the cytosol. As consequent of mitochondrial dysfunction, the expression of some stress components is induced. The idea is that eIF5A ribosome-stalling of the proline-rich Tim50 of the TIM23 complex and thereby controls mitochondrial protein set-up.

      The findings are potentially interesting. However, some control experiments are required to substantiate the findings.

      1. To support their conclusion the authors should show whether Tim50 levels are affected in the eIF5A-ts mutants used. Tim50 protein half-life is approximately 9.6 h (Christiano et al, 2014), which makes difficult to measure large differences in new protein synthesis upon eIF5A depletion. However, we used different approaches to show that reduction in eIF5A provokes a reduction in Tim50 protein levels and synthesis. 1) The steady-state levels of Tim50 protein (genomic HA-tagged version) are shown by western blotting analysis in Fig. S4B and confirm a significant drop of approximately 20% in the tif51A-1 mutant at restrictive temperature. 2) The use of a construct in which Tim50 is fused to a nanoluciferase reporter under the control of a tetO7 inducible promoter shows a significant 3-fold reduction in Tim50 protein synthesis in the tif51A-1 mutant compared to wild-type (Fig. 4C). In addition, the protein synthesis time is calculated and indicates that it takes the double time for the tif51A-1 strain to synthesize Tim50 protein than the wild-type (Fig. 4E). 3) The expression of a FLAG-TIM50-GFP version under a GAL inducible system also shows a significant reduction in Tim50 protein synthesis in the two eIF5A temperature-sensitive strains (Fig. S4C). 4) The proteomic analysis performed at 41ºC showed a 20% reduction in Tim50 protein levels in the two eIF5A temperature-sensitive strains, although not being statistically significant (Table S1). Furthermore, TIM50 mRNA levels were determined by RT-qPCR across all the experiments mentioned to confirm that the low levels of Tim50 protein were not due to decreased transcription or increased mRNA degradation. 5) An additional experiment of polysome profiling has been included in Fig. R1 (Figure for Reviewers) showing a higher TIM50 mRNA abundance at low polysomal fractions and a lower mRNA abundance at heavy polysomal fractions upon eIF5A depletion. This indicates that the TIM50 mRNA abundance is significantly shifted to earlier fractions and translation of Tim50 is reduced in the tif51A-1 mutant at restrictive temperature but not at permissive temperature. Altoghether, all these experiments confirm a significant reduction of Tim50 protein levels upon eIF5A depletion and conclusions are supported on these results.

      How are the levels of TOM and TIM23 subunits?

      Response: Our proteomic analysis shows that the protein levels of Tom70 and Tom20 receptor subunits of the TOM complex are significantly decreased in the two eIF5A temperature-sensitive strains (Table S1). These results are in agreement with the polysome profiling results, where it is seen a significant reduction of TOM70 and TOM20 mRNAs in the heavy polysomal fractions while a significant increase of these mRNAs is observed in the light fractions of eIF5A-depleted cells (Fig. 2C and Fig. S2D). Apart from Tim50, no other proteins of the Tim23 translocase complex were detected in the proteomic analysis.

      Furthermore, how are the levels of the Tim50 variant that lack the proline residues? Is the stability or function of Tim50 affected by these mutations?

      Although we did not specifically analysed the Tim50ΔPro protein levels, a quantification of the Tim50ΔPro fluorescent signal has been performed to address this matter and is shown in Fig. R2 and mentioned in the corresponding Results section. Results indicate that the Tim50 variant lacking the proline residues has similar protein levels to the wild-type version and therefore, it is tempting to say that its stability should also be similar. However, if Reviewers consider this to be essential for publishing, additional experiments using cycloheximide could be conducted in order to better assess the stability and half-life of this Tim50 version.

      Additionally, functional levels of Tim50ΔPro protein is shown by the fact that wild-type cells carrying this Tim50 protein version as the only copy of Tim50 grew well in glycerol media, where Tim50 is essential for the mitochondrial function (Fig. 5A). However, we suspect that Tim50ΔPro is a bit less efficient protein since a double mutant tif51A-1 Tim50ΔPro shows even reduced growth than the single tif51A-1 mutant (Fig. 5A). This information also responds to the comments made by Reviewer #2.

      How specific is the effect of eIF5A on Tim50? Is there any other mitochondrial substrate of eIF5A? It is not so clear to the reviewer why the authors focused on Tim50.

      Response: eIF5A has been shown to be necessary for the translation of mRNA codons encoding for consecutive prolines and, consequently, lack of eIF5A causes ribosome stalling in these polyproline motifs (Gutierrez et al., 2013; Pelechano and Alepuz, 2017; Schuller et al., 2017). In our manuscript we showed: 1) using an artificial tetO7-TIM50-nanoLuc genomic construct we demonstrate that the synthesis of Tim50 protein (measured as appearance of luciferase activity upon induction of tetO promoter) is significantly reduced by 3-fold under eIF5A depletion only when Tim50 contains the stretch of 7 consecutive prolines (Fig. 4A-D); 2) genomic Tim50-HA and plasmid FLAG-TIM50-GFP protein levels are significantly reduced upon eIF5A depletion (Fig. S4B); 3) calculation of the time for translation elongation of Tim50 mRNA shows that this time is double in cells with eIF5A depletion than in cells containing normal eIF5A levels (Fig. 4E); and 4) analysis of published ribosome profiling data shows a precipitous drop-off in ribosome density exactly where the stretch of polyprolines is located in Tim50 (540-561bp) upon eIF5A depletion but not in the control strain (Fig. 4F). This result is indicative of ribosome stalling at Tim50 polyproline motif upon eIF5A depletion. Altogether, our results strongly support a direct and specific role of eIF5A in Tim50 protein synthesis. However, as we discuss in relation to Fig. 5 and in the Discussion section, Tim50 does not seem to be the only mitochondrial substrate of eIF5A, since recovery of Tim50 protein synthesis does not rescue the growth of eIF5A mutants under respiratory conditions. In this line, we have added further data pointing to ribosome stalling for other co-translationally inserted mitoproteins which are potential substrates of eIF5A (Table S6). Accordingly, this has also been included in the Discussion section. This information also responds to the comments made by Reviewer #4.

      Our focus on Tim50 in this manuscript resides in that we found a global downregulation of mitochondrial protein synthesis (Fig.1 and 2) in parallel to the accumulation of mitochondrial precursor proteins in the cytoplasm and induction of the mitoCPR response (Fig.3). All these data were pointing to a mitochondrial protein import defect. Since Tim50 is an essential component of the Tim23 translocase complex, its protein levels are reduced in eIF5A mutants and Tim50 contains a polyproline motif, all these data were pointing towards a Tim50-dependent effect in mitochondrial protein import upon eIF5A depletion, which we addressed in the manuscript.

      Figure 1A: Which tif51A strain was used?

      Response: The proteomic analysis was performed with tif51A-1 and tif51A-3 temperature-sensitive strains (see Table S1) and Fig.1A shows the average of the values obtained for the two mutants (proteins detected as down-regulated in these two samples and from 3 different biological replicates). This is now clarified in the Figure 1A legend. A similar approach was also followed in Pelechano and Alepuz, 2017. Additionally, the ratios between the protein level in the temperature-sensitive mutant respect wild-type for each protein and for each eIF5A mutant are also shown in Table S1. This information also responds to the comments made by Reviewer #2.

      Figure 1C: The authors should show the steady state levels of some OXPHOS/TCA components to confirm the findings of Figure 1A.

      Response: Proteomic findings have been confirmed for several proteins. The steady state levels of Por1 and Hsp60 proteins were investigated by western blotting (Figs. 1C,D) and results show a significant down-regulation on the two eIF5A temperature-sensitive strains at 41ºC, which confirms the findings of Fig. 1A. Additionally, we have included the same experiment performed at 37ºC (Fig. S1E), which also confirms the same conclusion.

      Furthermore, the steady-state levels of Tim50 protein were also investigated by western blotting (Fig. S4B), and results also showed a significant down-regulation in the tif51A-1 mutant at restrictive temperature (37ºC), compared to wild-type. This result also confirms the findings of Fig. 1A.

      However, if Reviewers consider that additional confirmation for OXPHOS/TCA proteins to be essential for publishing, additional experiments could be conducted to assess the protein levels of other OXPHOS/TCA proteins.

      The manuscript contains several quantifications. However, central information like number of repeats or whether a standard deviation or S.E.M. is depicted are missing.

      Response: Clear information on the number of repeats, type of graphical representation and statistical analysis is now included for all figures in the corresponding figure legends and also detailed in the Materials and Methods section. This information also responds to the comments made by Reviewer #2.

      Figure 3: The authors propose that precursor form aggregates outside mitochondria. To assess the data, a quantification should address in how many cells are protein aggregates.

      Response: The quantification of cytoplasmic Yta12 aggregates is now included in Fig.3E, which shows significant differences between the tif51A-1 mutant and the wild-type strain. In addition, quantification of cytosolic Tim50 aggregates was already included in Fig. 4H, which also shows significant differences between the tif51A-1 mutant and the wild-type strain. These two figures include the individual values from three biological replicates (at least 150 cells were analyzed), mean, standard deviation and statistical analysis.

      Do the observed aggregated proteins interact with Hsp104? recycled?

      Response: Yes, the cytoplasmic mitochondrial precursor aggregated proteins co-localize with Hsp104 as shown in Fig. 3I for Cyc1 and in Fig. 4J for Tim50. The quantification of Cyc1 and Tim50 co-localization with Hsp104 is shown in Fig S5D.


      Significance

      See above


      Reviewer #2

      Evidence, reproducibility and clarity

      The authors report here novel findings concerning the role of eIF5A in mediating protein import to mitochondria in the model eukaryote Saccharomyces cerevisiae. It was previously known from structural and other studies that the translation factor eIF5A binds to the E-site of stalled ribosomes to help promote peptide bond formation. It was inferred by ribosome footprinting and reporter studies assessing the impact of eIF5A depletion that eIF5A is particularly needed to translate several specific amino acid motifs including polyproline stretches. However additional target sequences are known.

      Here a proteomics approach reveals clear evidence that mitochondrially targeted proteins are impacted by temperature sensitive mutations in eIF5A that deplete the factor, including those without polyprolines. The authors then use a range of molecular and cell biology to focus on the role of mitochondrial signal sequences/mitochondrial protein import and the mitochondrial stress response, before highlighting a role for poly-prolines in Tim50, a major mitochondrial protein import factor. Consistent with the ribosome footprinting done previously it is shown that a stretch of 7 prolines limit its translation when eIF5A is depleted and studies shown here are consistent with the idea that this has wider consequences for mitochondrial protein import and hence translation/stability of other proteins. However improved Tim50 translation alone, by eliminating the poly-proline motif, is not sufficient to overcome all consequences of eIF5A depletion for mitochondrial protein import and for viability, suggesting a wider role.

      In general the text flows nicely, this could be a study that explains why a large number of mitochondrially targeted proteins are impacted by depletion of eIF5A in yeast. As the poly Pro sequence in Tim50 is not conserved in higher eukaryotes it is unclear how this observation will scale to other systems, but it provides an example of how studies in a relatively simple system can trace wide-spread impact of the loss of one component of a central pathway-here protein synthesis to altered translation of a key component of another process-mitochondrial protein import. Given that eIF5A and its hypusine modifying enzymes are mutated in rare human disorders, it is likely there will be interest in this study.

      However, while the conclusions may be justified, there are significant deficiencies in how the experiments have been analysed and presented in this version of the manuscript that impact every figure shown, coupled with deficiencies in the methods section that all need to be addressed. Thus, we have here the basis of what should be a very interesting paper here, but there is a lot of work to do to remedy perceived weaknesses. It may be that the overall conclusions are entirely sound and appropriate, but I suspect that performing the statistics in less biased ways may change some of the significant differences claimed. Some explanations concerning how data analyses were conducted and the reasons for specific analysis decisions being made would also improve the narrative. These points are expanded on below.

      All the edits suggested here are aimed at improving the rigor of reporting in this study. Depending on how they are answered some may become major issues, or they could all be minor.

      1 Figure 1 shows proteomic data for response to heat shock at 41{degree sign}C. In the text it is made clear that two different temperature sensitive missense alleles the 51A-1 and 51A-3 were analysed, but the single volcano plot in Figure 1A does not say whether it is reporting one of these experiments compared to WT (which one) or some other analysis (ie have data from the 2 mutants been amalgamated somehow?). I would assume only one, but which one, and why only one plot? How different is the other experiment? Why does the Figure title say the experiment is an eIF5A deletion when it is not this?

      Response: The data shown in Figure 1A corresponds to the average values obtained in the proteomic analysis for the two temperature-sensitive mutants tif51A-1 and tif51A-3 (with data for each mutant obtained from 3 different biological replicates). Highly reproducible proteomic results and similar between the two mutants were obtained (see in Fig. S1A the MDS-plot showing all replicates for each strain and condition studied in the proteomic analysis). In addition, the proteomic data showing the protein 41°C/25°C ratio for each eIF5A temperature-sensitive mutant with respect to wild-type is shown in the Table S1. This is now clarified in the Figure 1A legend. A similar approach using the mean values of the two mutants was followed in the analysis of ribosome footprintings made in Pelechano and Alepuz, 2017. Additionally, the ratios between the protein level in the temperature-sensitive mutant respect wild-type for each protein and for each eIF5A mutant are also shown in Table S1. This information also responds to the comments made by Reviewer #1.

      Reviewer #2 is right with his/her comment and there was a mistake in the Fig.1 title. Now it is corrected and written “depletion” instead of the wrong “deletion”.

      2 Why were the experiments shown in Figure 1 done at 41{degree sign}C when all other experiments are done at 37{degree sign}C? This experimental difference is ignored in the text and no comparison of the impact of 37 vs 41 is made anywhere in the manuscript. For example it would be straightforward to perform a comparison of eIF5A depletion (by western blot), polyribosome profiles, strain growth/inhibition at both temperatures.

      Response: Our aim carrying out a proteomic experiment after 4 hours of incubation of the temperature-sensitive strains at 41°C was to get a more profound depletion of the eIF5A protein, which is very abundant and stable at normal conditions, in order to get clear proteomic results. The proteomic results were pointing to a reduction in the levels of many mitochondrial proteins, corroborating previous results obtained in murine embryonic fibroblasts upon depletion of active eIF5A conditions (https://doi.org/10.1016/j.cmet.2019.05.003). From this starting point we tried to find out the molecular mechanism involved and all the rest of experiments are done with temperature sensitive eIF5A mutants under restrictive temperature of 37°C that is the most common conditions used in yeast by us and others, and in which wild-type yeast cells still grow vigorously.

      In our previous manuscript version, the depletion of eIF5A after growing the cells at 41ºC for 4 h was shown in Fig. 1C. These data has been expanded and we have now included in Fig. S1E a western blotting analysis that shows the depletion of eIF5A after incubating the cells at 37ºC and 41 ºC for 4 h (Fig. S1E). The steady state level of the mitochondrial Por1 protein was investigated by western blotting (Figs. 1C,D) and results show a significant down-regulation in the two eIF5A temperature-sensitive strains at 41ºC. We have now included the same experiment performed at 37ºC (Fig. S1E), which also confirms the same conclusion. In addition, following Reviewer #2 suggestions, growth of the wild-type and tif51A-1 strains was tested by serial drop assays conducted at 25ºC, 37ºC and 41ºC and results confirm that both 37ºC and 41ºC temperatures impair the growth of the tif51A-1 strain but not the wild-type (Fig.S1B). The new information included in Figure S1 is now explained in the Results section. This information also responds to the comments made by Reviewer #4.

      3 Western blot quantification. In Figure 1D and E the authors present western blot quantification. However they have chosen to normalise every panel to the signal in lane 1. This means that there is no variation at all in that sample as every replicate is =1. This completely skews the statistical assumptions made (because there will be variation in that sample) and effectively invalidates all the statistics shown. An appropriate approach to use is to normalise the signal in each lane to the mean signal across all lanes in a single blot. That way if all are identical they remain at 1, but importantly variation across all samples is captured. This should be done to the loading controls as well before working out ratios or performing any statistical analyses.

      Response: Following Reviewer #2 suggestions we have changed the normalization methodology for the Western blots and we have now normalized the signal in each lane to the mean signal across all lanes in each single blot, and do so also for the loading controls. We have conducted this analysis in every western blotting experiment shown in the manuscript (Figs. 1D, S4B and S4C) and statistical analyses have been performed again to capture variation across all samples. In addition, this is also included in the Materials and Methods section (“Western blotting” subsection). Results obtain are similar to previous ones but we agree that this new approach improves the data presentation.

      For this type of experiment it is more appropriate to use Anova than a T-test. This advice applies to every western data analysis figure in the whole manuscript and so all associated statistics need to be done again from the original quantification values. If T-test is justified then a correction for multiple hypothesis testing should be applied.

      Response: After reviewing a large number of publications analysing similar data, and also following the recommendations of our statistical department, we have retained the statistics used in our previous version (with the new data normalisation as explained above, following the recommendations of Reviewer #2). This is because for each western blot figure shown, we have performed experiments with two different biological samples, wild-type cells and eIF5A mutant cells, and compared results for a single variable (Por1 protein level; eIF5A protein level or Hsp60 protein level) using three or more biological replicates. In this context, we compare the mean of the protein levels obtained from the biological replicate for two groups: wild-type and eIF5A mutant. Therefore, we believe that the statistical T-test is more appropriate. However, we could repeat the statistic if it is finally considered more appropriate.

      In all bar chart figures in addition to showing the mean and SD, each replicate value should be shown (eg as done in Fig 2C). Graphpad allows individual points to be plotted easily.

      Response: All Figures along the manuscript now include individual values from each replicate, in addition to showing the mean, SD and statistical analysis. All figure legends have been corrected accordingly.

      5 Figure 2. Polysome profiles. The impact of translation elongation stalls on global polysome profiles is complex, but a global run off is highly unlikely. Stalls later in the coding region would be anticipated to cause an increase in ribosome density as more ribosomes accumulate (like cars queueing held at a red light). However where a stall is early in a longer ORF, for example at a signal sequence, then there is less opportunity for ribosomes to join and so for those mRNAs moving to lighter points in the gradient may be observed. This may also cause knock on effects on AUG clearance and initiation which the authors appear to see as there may be an increased 60S peak in the traces shown. Are there differences in overall -low vs high polysomes, the traces shown suggest there may be? Discussion of these points is merited in the results section given the subsequent qPCR experiment.

      Response: The comments made by the Reviewer #2 are very interesting and we have made changes accordingly. First, we now show in Fig. 2A,B and Fig.S2B,C the quantification of polysomal and monosomal fractions in wild-type and tif51A-1 mutants at permissive and restrictive temperatures. It can be appreciated that there is no impact on global polysomal and monosomal fractions under eIF5A depletion. This result does not support a global stall at 3’ region of the ORF, because then an increase in polysomal fractions should be detected; nor a global stall at the 5’ region of the ORF, because then a decrease in polysomal fractions should be detected. However, with respect to individual mRNAs, our data show a significant reduction in the heavier polysomal fractions and a significant increase in lighter polysomal fractions for mRNAs encoding mitochondrial proteins, while no significant changes were observed for mRNAs encoding cytoplasmic proteins (Fig. 2C and Fig. S2D-I). These results could be interpreted as a result of ribosome stalls in the 5’ ORF regions, for example at the signal sequence, according to Reviewer #2 comments.

      We have now introduced this comment in the Results and Discussion sections.

      Figure 2 qPCR. Using qPCR to analyse RNA levels across polysome gradients is tricky for multiple reasons including that the total RNA level varies across fractions that can impact recovery efficiencies following precipitation of gradient fractions. Often investigators use a spike in control to act as a normalising factor. Here it is completely unclear what analysis was done because details are not stated anywhere. How were primers optimized, was amplification efficiency determined? Or are they assumed to be 100%, which they will not be? A detailed description or reference to a study where that is written is needed.

      Response: The RNA extraction and analyses by RT-qPCR of the mRNA levels in the polysomal gradients was done as in previous studies of our lab (Romero et al. Sci Rep. 2020;10(1):233. doi: 10.1038/s41598-019-57132-0; Ramos-Alonso et al. PLoS Genet. 2018;14(6):e1007476. doi: 10.1371/journal.pgen.1007476; van Wijlick et al. PLoS Genet. 2016;12(10):e1006395. doi: 10.1371/journal.pgen.1006395; Garre et al., 2012 Mol Biol Cell. ;23(1):137-50. doi: 10.1091/mbc.E11-05-0419.). Three independent replicates were analyzed and results were reproducible and statistically significant, as shown in Fig. S2. Total RNA was extracted from each fraction using the SpeedTools Total RNA Extraction kit (Biotools B&M Labs). In the first replicate a spike in RNA control (Phenylalanine) was added and tested that no significant differences in the results were obtained when using or not the spike in control (see below Figure R3 for referees). mRNA relative values are always obtained from qPCR using a calibrating efficiency standard curve for each pair of oligos, after the initial set up of the qPCR for this specific pair of oligos. Therefore, slight differences in amplification efficiencies for each oligo pair are taken into account. More details about qPCR are now included in the Materials and Methods section (“Polyribosome profile analysis” subsection) and one additional reference is also included for the processing of polysomal gradient fractions.

      It would be helpful to state how long CDS are for these mRNAs and where 2-3/2-8 cut off made is what for determining what is 'short' vs 'long' and the scientific basis for selecting 2-3 vs 2-8, why 8? Were M fractions also used in qPCR, they appear to be ignored in the analysis as currently presented?

      Response: The CDS lengths of the mRNAs analyzed by polysome profiling and other important features are now included in new Table S5. We decided to classify as short length mRNAs those with a length below 600 bp, while mRNAs with lengths above 600 bp were classified as long length mRNAs. This classification was made on the basis of specific mRNA profiles obtained by qPCR analysis. mRNAs with short lengths behaved similarly and we selected 2n-3n fractions since the main polysomal peak under normal conditions appeared among 4n-5n fractions. In this line, long length mRNAs also behaved similarly between them, and we selected 2n to 8n fractions since the main polysomal peak under normal conditions appeared right after the 8n fraction. This information is now included in the Results and Materials and Methods sections.

      Regarding the use of the Monosomal fractions, yes, they were used as it can be seen in Fig. S2 which includes the distribution in Monosomal (M), lighter (2n-3n/2n-8n) or heavier (n>3/n>8, P) polysomal fractions. In the polysomal profiles we can be see that depletion of eIF5A causes a reduction in the amount of mitochondrial mRNAs in the heavier fractions and a corresponding increase in the amount of mRNAs in the lighter polysomal fractions, while no significant changes are found in the monosomal fractions. Therefore, the statistically significant change in the heavier/lighter polysomal fraction ratio is indicative of the translation down-regulation and these ratios are shown in Fig. 2C. As the Reviewer #2 commented in point 5, the change in mRNA distribution to lighter polysomal fractions may be indicative or ribosome stalling at the 5’ ORF region, compatible with a stall at the mitochondrial target signal (MTS), and this discussion is now included in the Results and Discussion section.

      Which transcripts studied here encode proteins with signal sequences? As Signal sequence pauses early in translation should impact ribosome loading this is potentially important here as discussed above.

      Response: Yes, we agree with Reviewer #2 that this information may be relevant according to the hypothesis of ribosome stall at the MTS. Therefore, a score value of probability of harbouring an MTS presequence (Fukasawa et al., 2015) is now included in Table S5 for each of the mRNAS analyzed by polysome profiling. The discussion of this point has also been included in the Results and Discussion sections.

      While it has been shown that SRP recognition is able to slow and even arrest translation of ER signal recognition peptides, there is currently no known direct SRP like correlate for mitochondrial signal sequences. We are therefore unaware of literature showing that mitochondrial signal sequences pause translation in a manner similar to ER signal sequences. We have previously found that downstream translational slowing is important for mitochondrial mRNA targeting (Tsuboi et al 2020, Arceo et al 2022), but we believe that to be distinct to what the Reviewer #2 is addressing.

      Figures 3-5. Microscopy. The false green color images in Figure 3B do not show up well. They may be better shown in grayscale, with only the multiple overlays in color.

      Response: False color for fluorescent microscopy images are widely used because they help to visualize the results to the readers and also facilitate the interpretation of multiple overlays. The use of false color is also suggested by Reviewer #4.

      Figure 3C should show the data spread for all 150 cells and normalise differently as discussed above for westerns. I do not believe that all 150 WT cells have exactly the same GFP intensity, which is what the present plot claims.

      Response: As answered to point 3 made by this Reviewer, now all figures, including Fig. 3C, are made with Graphpad and scatter plot with all individual points plotted, additionally to showing the mean, SD and statistical analysis. Results correspond to three independent experiments and show a statistically significant difference in Pdr5-GFP intensity signal between wild-type and tif51A-1 mutant. Figure legend has been corrected accordingly.

      For panels 3D-F image quantification should be shown so that the variation across a population is clear. Eg in violin plots, or showing every point. It should be clear what proportion of cells have GFP aggregates and what the variation in number of granules is.

      Response: The quantification of cytoplasmic Yta12 aggregates is now included in Fig.3E, which shows significant differences between the tif51A-1 mutant and the wild-type strain. Results show the individual values from three independent experiments with a minimum of 150 cells counted. We used a bar graph in which the values (% of cells with 0, 1, 2 or 3 aggregates) for each independent experiment are shown together with the mean, SD and statistical analysis. Figure legend has been corrected accordingly. This information also responds to the comments made by Reviewer #1.

      Figure 4H has no error bars.

      Response: New Fig.4H now shows the individual values of each of the three independent replicates, mean and error bars (SD). Figure legend has been corrected accordingly.

      Figure 5C normalises 2 WTs to 1 as in Figure 3C. Both would be better as violin plots.

      Response: Results in Fig. 5C are now shown using Graphpad and scatter plot in which all individual values are plotted (not normalized wild-type to 1), and also mean, SD and statistical significance. Results correspond to three independent replicates with the fluorescence intensity measured in more than 150 cells.

      Figure 5D/E shows 37{degree sign}C data only. Do tif51A-1 cells have aggregates at 25{degree sign}C?There are no error bars in Figure 5E or any indication of how many cells/replicates were quantified.

      Response: Figures 5D and 5E only show data at 37ºC since there are no Tim50-GFP aggregates, nor aggregates of other mitochondrial proteins, in tif51A-1 mutants at 25ºC, as shown in Fig. S3C-F and Fig. S5C.

      New Fig. 5E shows individual values from each of the three independent experiments, mean, SD and statistical significance. Results correspond to the measurement of Tim50 protein aggregates in more than 150 cells. Figure legend has been corrected accordingly.

      There are no sizing bars on any of the micrographs.

      Response: Now, all sets of microscopy figures contain a size bar and this is indicated in the corresponding Figure legend.

      The methods states that all quantification was done using ImageJ, but there is no detail given about how this was done. There are lots of ways to use ImageJ.

      Response: A detailed description of the quantifications made using ImageJ is now included in the Materials and Methods section (“Fluorescent microscopy and analysis” subsection).

      Figure 4. Luciferase assay. It is clear that there are differences in Tim50 vs Tim50∆7pro signal over time from the primary plots. It is not clear why the quantification plots on the right are from 2 selected time points. It is more typical to calculate the rate of increase in RLU per min in the linear portion of the plot, for these examples it would be approximately 30-40 mins.

      Response: As luciferase mRNA level is also increasing with time, the total amount of luciferase protein will increase exponentially. At some point mRNA levels will reach a steady state and for a brief period there could be a linear portion of RLU increase, but that will be different for each condition and reporter as ribosome quality control can have a direct impact on mRNA half-life. We have instead chosen two time points to show that statistical differences in Tim50 protein expression upon eIF5A depletion are not dependent on the time point chosen. We have also included the full data plots for readers to view the raw data.

      Figure 4F. The text on p6 states Fig 4F is evidence of RQC induction. This is an overstatement. There are no data presented relating to RQC.

      Response: Ribosome-associated quality control (RQC) is a mechanism by which elongation-stalled ribosomes are sensed in the cell, and then removed from the stall site by ribosomal subunit dissociation. This is the definition of RQC. With high levels of RQC this will cause a drop in ribosome density downstream of the stall site because of ribosome removal. While we would agree that most studies do not show actual buildup of ribosomes at ribosome stalls, and removal after the stall, we do. Our ribosome profiling analysis shows in vivo distribution of ribosome density across the TIM50 mRNA in wild-type and upon eIF5A depletion. We show that in the eIF5A depletion the ribosome density is similar to wild-type for the first ~200 bp, then there is a buildup of ribosomes for ~300 bps up to the stretch of polyproline residues, indicative of slowed ribosome movement. This slowed ribosome movement is further supported by our translation duration measurements in Fig. 4E. Then the transcript is almost completely devoid of ribosomes after the stretch of proline residues, indicating the ribosomes are removed at the proline stretch. This combination of ribosome stalling (Fig. 4E,4F) and subsequent ribosome removal (Fig 4F) is the textbook definition of RQC, so we indicate this as evidence for RQC.

      Figure 5G. It is not clear to this reviewer why the CYC1 reporter is impacted by Tim50∆pro at 25{degree sign}C. Can the authors comment?

      Response: This is also not clear to us, however, no differences are seen with and without eIF5A depletion, supporting the interpretation that Cyc1 translation is not affected by eIF5A depletion when Tim50 protein levels are restored in the Tim50∆pro strain. However, in order to clarify this point, we propose, if it is considered necessary, to remake the Tim50∆pro CYC1 reporter strain.

      Does ∆pro impact Tim50 function or is there possibly some other off target impact of integrating the reporter in this strain?

      Response: As answered to Reviewer #1 in her/his point 1, the functionality of Tim50ΔPro is shown by the fact that wild-type cells carrying this Tim50 protein version as the only copy of Tim50 grew well in glycerol media, where Tim50 is essential for the mitochondrial function (Fig. 5A). However, we suspect that Tim50ΔPro is a bit less efficient protein since a double mutant tif51A-1 Tim50ΔPro shows even reduced growth than the single tif51A-1 mutant (Fig. 5A). We do not expect off target impact in this Tim50ΔPro strains, although we cannot exclude this 100%, as in any other yeast strain obtained by transformation.

      Significance

      Strengths and Limitations:

      Strengths are that the study uses a wide range of molecular approaches to address the questions and that the results present a clear story.

      Limitations are that the poly-proline residues identified in yeast Tim50 are not conserved through to humans, so the direct relevance to higher organisms is unclear. However there are many more poly-proline proteins in human genes than in yeast and there are rare genetic conditions affecting eIF5A and its hypusination

      Advance. provides a clear link between dysregulation of eIF5A, Tim50 expression and wider impact on mitochondria.

      Audience. Scientists interested in protein synthesis, mitochondrial biology and clinicians investigating rare human disorders of eIF5A and hypusination.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      eIF5A is required to mediate efficient translation elongation of some amino-acid sequences like polyproline motifs, and eIF5A depletion was reported to impair mitochondrial respiration functions, decreasing mitochondrial protein levels. In this study, Barba-Aliaga et al. showed that eIF5A is important for the translation of the Pro-repeat containing protein, Tim50, an essential subunit of the TIM23 complex, the presequence translocase in the mitochondrial inner membrane. eIF5A ts mutants caused ribosome stalling of Tim50 mRNA on the mitochondrial surface at non-permissive temperature, and the removal of the Pro-repeat from Tim50 (Tim50-delta7Pro mutant) made its translation independent of eIF5A. However, the replacement of endogenous Tim50 with Tim50-delta7Pro did not recover the cell growth defects of eIF5A ts mutant on respiration medium at semi-permissive temperature, suggesting that Tim50 is not the only reason for the global mitochondrial defects caused by defective eIF5A.

      (1) I am wondering why the authors mainly used the eIF5A ts mutant strains instead of the eIF5A degron strain since, for example, the decrease in the level of Tim50 was only marginal (Fig. EV4A).

      Response: eIF5A is a very abundant protein and with high stability (SGD data: 273594 molecules/cell in YPD and 9.1 h protein half-life). We have used temperature-sensitive strains, tif51A-1, instead of eIF5A-degron because eIF5A is depleted much quicker in the first than the second system. As it can be seen in Schuller et al., Mol Cell. 2017;66(2):194-205.e5. doi: 10.1016/j.molcel.2017.03.003, with the eIF5A-degron system the addition of auxin was made in parallel to a transcriptional shut off using GAL promoter to express eIF5A-degron, changing the media from galactose to glucose and incubating the cells for 10 hours. With our approach using temperature-sensitive proteins, almost full depletion (without affecting viability, see Li et al., Genetics 2014; 197(4):1191-200 doi: 10.1534/genetics.114.166926) can be done after 4-6 h incubation at 37ºC or 4 h incubation at 41ºC (Fig. 1C and Fig. S1E, almost no signal is detected by western blotting). Therefore, we chose to use eIF5A depletion with temperature-sensitive yeast strains to achieve stronger protein depletion with shorter times and avoid secondary effects. In addition, the two eIF5A temperature-sensitive strains used in this study have been widely used by us and others (Pelechano and Alepuz, 2017; Zanelli and Valentini, 2005; Zanelli et al., 2006; Dias et al., 2008; Muñoz-Soriano et al., 2017; Rossi et al., 2014; Li et al., 2014; Xiao et al., 2024).

      (2) To show that the compromised translation of Tim50 in the absence of functional eIF5A causes defects in the mitochondrial protein import by clogging the import channels, the authors should directly observe the accumulation of the precursor forms of several matrix-targeting proteins by immunoblotting. In this sense, the results in Fig. 1C for Hsp60 do not fit the interpretation of import channel clogging.

      Response: We did not see precursor mitochondrial proteins by Western blot upon eIF5A depletion possibly because: 1) the mature protein form is more abundant and stable; 2) the precursor mito-protein appears in cytoplasmic aggregates and this may not be easily extracted during preparation of proteins for Western blot analysis. In the work by Weidberg and Amon, 2018, who described the mitoCPR response; Krämer et al., 2023, who described mitostores; and others (Wrobel et al., 2015; Boos et al., 2019) the authors use extreme over-expression of mitoproteins or mutations in essential proteins for mitochondrial biogenesis to induce clogging of translocases and accumulation of precursors in the cytosol. However, we are using and detecting proteins at their physiological levels, expressed under their native promoters, what may explain why we do not detect precursor mito-proteins. We are using what we believe to be a much more physiologically relevant system, where we use endogenous expression of mitochondrially imported proteins. Yet we see similar transcriptional induction of mitoCPR targets (CIS1, PDR5, PDR15) and mislocalization of mitochondrial proteins to Hsp104 marked aggregates (MitoStores).

      (3) The authors speculated in the Discussion section that import defects caused by compromised translation of Tim50 could cause down-regulation of translation through prolonged mitochondrial stress. However, this lacks experimental evidence.

      Response: We do see that depletion of eIF5A causes import defects through Tim50 and correlates with the down-regulation of translation of mRNAs encoding mitoproteins as shown in Fig. 2C and Fig. S2. In these figures it can be seen that mito-mRNAs move from heavier to lighter polysomal fractions upon eIF5A depletion, indicating that less ribosomes are bound to these mRNAs. Importantly, synthesis of Cyc1 and Cox5A mitochondrial proteins is recovered when TIM50 gene is replaced by an eIF5A-translation independent TIM50ΔPro gene, arguing in favor of a translation defect caused by eIF5A depletion through the collapse of import systems produced by the ribosome stalling in TIM50 mRNA.

      As discussed by Reviewer #2 and in our answers to his/her points 5 and 6, the reduction in the number of ribosomes bound to mito-mRNAs upon eIF5A depletion may be a consequence of the stall of ribosomes after the mRNA 5’ coding region encoding the MTS. This discussion has now been introduced in the Discussion section. This information also responds to the comments made by Reviewer #2.

      (4) The authors stated that human Tim50 does not have Pro-repeat motif, but how about other organisms (like other fungi species)? Is the present observation specific only to S. cerevisiae?

      Response: We have now included a sequence alignment of the Tim50 protein sequences of different yeast species (Saccharomyces cerevisiae, Candida albicans, Candida glabrata, Candida lipolytica, Schizzosaccharomyces pombe, Schizzosaccharomyces jamonicus), mouse and human (Fig. S4A). The resulting alignment shows that S. cerevisiae is the only organism presenting the seven consecutive proline residues. Still, C. albicans and C. glabrata conserve five consecutive prolines while C. lipolytica conserves five non-consecutive prolines. Furthermore, S. pombe and S. jamonicus, and mouse and human, conserve three and four non-consecutive prolines respectively. This means that the observations presented in this manuscript could be extended to other fungi species as well since most of the proline residues are conserved and are predicted to behave as eIF5A-dependent motifs for translation. Moreover, the described eIF5A-dependent tripeptide motif PDP is found in humans, mice and S. pombe at the Tim50 region where we found the PPP motif inducing ribosome stalling in S. cerevisiae (Fig S4A). This may confer eIF5A-dependent ribosome stall since as we showed in our previous ribosome footprinting (Pelechano et al., 2017), this PDP motif causes a similar high ribosome stall as the PPP motif. This discussion has now been introduced in the Results and Discussion sections.

      (5) Two references in the text are marked with "?", which should be corrected.

      Response: We thank you the Reviewer #3 for noticing this, references have been corrected in the text.

      __Reviewer #3 (Significance (Required)): __

      The essence of this work, the role of eIF5A in the efficient translation of Pro-repeat containing Tim50 (Figs. 4 and 5), is important and worth publication. However, the results of the effects of defective eIF5A on the levels and localization of mitochondrial proteins (Figs.1-3) can be even deleted to make clear the point of this work.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      The manuscript submitted by Barba-Aliaga et al. aims to understand on the molecular level how eIF5A influences mitochondrial function. elF5A promotes translation elongation at stretches prone to translational stalling like e.g. polyproline sequence. The finding that eIF5a influences mitochondrial function has been previously reported by the same group and by others. In this context, it was suggested that eIF5a promotes translation of N-terminal mitochondrial targeting signals. Here, the authors propose an alternative mechanism and suggest that "eIF5a directly controls mitochondrial protein import through alleviation of ribosome stalling along TIM50 mRNA." Using luciferase reporter assay, the authors indeed convincingly show that the speed of Tim50 translation is dependent on the presence of functional TIF51A, the major eIF5a in yeast, and that this dependence comes from the presence of the polyproline stretch in Tim50. The rest of the manuscript is unfortunately less clear and it is very hard, if not impossible, to sort out direct from secondary effects and compensations. The authors use proteomics, biochemical methods, RNAseq and fluorescence microscopy to analyze the temperature sensitive tif51A mutant but the conditions used in the manuscript are non-consistent between various experiments presented, in respect to the medium, temperature, preculture condition and the length of treatment used.

      Response: We do not agree with this Reviewer #4 appreciation. We used different molecular approaches to investigate different questions. Indeed, this is one of the Strengths that is highlighted by Reviewer 2 as it reads above: “Strengths are that the study uses a wide range of molecular approaches to address the questions and that the results present a clear story.” All the experiments presented in the manuscript, apart from proteomics analysis (Fig. 1), have been performed in the same conditions respect to the medium (SGal), temperature (25ºC/37ºC), preculture condition (SGal, 25ºC) and length of treatment used (4 h of depletion at 37ºC). This is already clearly specified in every Figure legend along the whole manuscript and also in the Materials and Methods section. In addition, individual values from each replicate, mean, standard deviation and statistical tests are shown for every Figure in the manuscript. Therefore, we do believe that conditions are consistent between experiments and conclusions are made based on different experiments and different scientific approaches.

      We agree with Reviewer #4 in that depletion of eIF5A protein in the temperature sensitive tif51A-1 mutant was done in the proteomic at 41°C for 4 h, whereas in the rest of experiments depletion is made at 37°C for 4 h. As answered to Reviewer #2 (see answer to point 2), stronger depletion conditions were used to get clear proteomic results, and in order to compare both temperatures we have added now some controls showing eIF5A depletion and growth of tif51A-1 mutant at 41°C and 37°C; importantly, we also show the reduction in mito-protein levels upon eIF5A depletion at 37°C (Fig. S1B and E).

      In some cases, the genetic background of the yeast strains and plasmids used are also unclear (e.g. pYES2-pGAL-FLAG-TIM50-GFP-URA3 - based on the provided description, TIM50 was inserted between FLAG and GFP tags; if so, mitochondrial targeting signal of Tim50 would be masked making its import into mitochondria impossible).

      Response: We do not agree with this appreciation. The genetic background of the yeast strains is always the same along the whole manuscript (BY4741 background) and is clearly specified in Table S2. In this line, all the information regarding the plasmids used can be found at Table S3 and plasmids construction is extensively detailed in the Materials and Methods section (“Yeast strains, plasmids, and growth conditions” subsection).

      Regarding the pYES2-pGAL-FLAG-TIM50-GFP-URA3 plasmid and as already mentioned in the text, we only used this plasmid to analyze by western blotting the protein synthesis of Tim50 independently of its subcellular localization. Our results (Fig. S4C) confirm that the synthesis defect of this Tim50 version upon eIF5A depletion is only due to the presence of the polyproline region. Importantly, we did not make any conclusion regarding import defects or protein localization based on these results.

      I have no doubt that upon exposure of tif51A cells to 41{degree sign}C for 4h cells initiate a number of cellular responses including mitoCPR and formation of MitoStores, however, I don´t think that the authors convincingly show that these are initiated by reduced levels of Tim50 - on the contrary, the authors show that levels of Tim50 are actually not significantly changed. This can hardly be reconciled with the model proposed. In addition, should the effect of Tif51A on mitochondria primarily be due to its effect on Tim50, then Tim50deltaPro should rescue the phenotype of tif51a mutant but it didn´t; if anything, it made it worse (see Fig 5A - the double mutant grows worse than the single ones). Furthermore, expression of Cyc1 luciferase reporter is reduced in Tim50deltaPro strain even at permissive temperature, Figure 5G. Since cytochrome c is not a substrate of the presequence pathway this again points to the secondary effects that are being observed.

      Response: We believe that our main results, summarized next and all performed at 37°C, do show that translation defects in TIM50 mRNA are the cause of the mitoCPR induction and formation of MitoStores. First, Tim50 protein levels are significantly reduced upon eIF5A depletion, as shown in Fig. S4A and S4B. Although being statistically significant, we agree that the reduction in Tim50 protein level is quantitatively low. This can be explained by the high stability of Tim50 protein, with a half-life of approximately 9.6 h (Christiano et al, 2014), which makes it more difficult to measure large differences in new protein synthesis. This is why we additionally used an accurate and quantitative test for showing the eIF5A-dependency for TIM50 mRNA translation: the fusion of the TIM50 DNA sequence to a TetO7-inducible nLuc reporter, which allows to monitor the appearance of new Tim50 protein and to estimate the translation elongation rate (Fig.4C-E). The ribosome stalling at TIM50 mRNA provoked by eIF5A depletion, where this mRNA is located at the mitochondrial surface to promote the import of nascent Tim50 protein during translation (Fig. S5B), may cause by itself the clogging of the protein import system even though yields only a slight reduction in total Tim50 cellular protein. Second, as Reviewer #4 pointed with our model, Tim50deltaPro should rescue the phenotype of tif51A-1 mutant and it does it: no mitoCPR induction and no mito-protein cytoplasmic aggregation are observed (Fig. 5D-F). Moreover, no differences in Cyc1- and Cox5a-nanoLuc synthesis are observed in the tif51A-1 Tim50ΔPro strain between depletion and not depletion conditions (Fig. 5E). These results strongly suggest that the mitochondrial protein import defects (and consequently the mitoCPR induction and mito-protein cytoplasmic aggregation) caused by eIF5A depletion are a consequence of ribosome stalling during TIM50 mRNA translation. However, Reviewer #4 is right in that mitochondrial respiration and growth in glycerol are not restored in the tif51A-1 Tim50ΔPro strain, even though Tim50 protein levels have been restored under eIF5A depletion conditions. As we discuss in the manuscript, we expect that there are additional mitochondrial proteins as targets of eIF5A, such as Yta12 and/or others. We have added further data pointing to ribosome stalling and RQC for other cotranslationally inserted mitochondrial proteins (Table S6). Accordingly, this has also been included in the Discussion section. However, the identification and study of these other mitochondrial targets goes beyond the aim of our current study.

      Minor comments

      1. Page 1, mitochondrial proteins cross do not the intermembrane space through Tom40 but rather the outer membrane Response: We think the Reviewer #4 misunderstood the sentence because we are saying exactly what he/she states: mitoproteins cross the outer membrane to the intermembrane space through Tom40. Thus our sentence is:

      “Usually, mitoproteins contact the central receptor Tom20 and cross to the intermembrane space (IMS) through Tom40, the β-barrel pore-forming subunit.”

      Therefore, we kept the sentence.

      Page 4, ATP1 is present in the matrix and not the inner membrane

      Response: This has been corrected. We thank the Reviewer for pointing this.

      The citations are missing at several places - they are left as "?"

      Response: References have been corrected in the text.

      It would be nice if microscopy images were colored in magenta and cyan, rather than red and green, to make them accessible to a wider audience.

      Response: Green and red colors for fluorescent microscopy images are widely used in high-impact journals, especially when showing mitochondrial proteins and mitochondrial marker Su9-mCherry (Hughes et al., 2016, eLife, doi: 10.7554/eLife.13943; Kakimoto et al., 2018, Scientific Reports, doi: 10.1038/s41598-018-24466-0; Kreimendahl et al, 2020, BMC Biology, doi: 10.1186/s12915-020-00888-z). However, if the Reviewers think this is essential for publication, microscopy images can be colored in magenta and cyan instead.

      Formally speaking, Tim50 is not per se a translocase, it is either a component of the translocase or, more precisely, a receptor of the translocase. Similarly, Tom20 and Tom70 are not membrane transporters but rather receptors of the TOM complex.

      Response: We have changed the title and text to be more precise in the description of the components of the mitochondrial import systems as suggested by Reviewer #4.

      Reviewer #4 (Significance (Required)):


      This is a potentially interesting story, however, the conditions used for the analysis of the temperature sensitive mutants were either too harsh or these mutants are in general impossible to control, making the manuscript, in my opinion, unfortunately too premature for publication.

      Response: We do not agree with the Reviewer #4 opinion, all experiments were done at 37ºC except the proteomic analysis that it is also confirmed further for Tim50 and Por1 proteins at 37ºC. We want to stress that we show all experiments with at least three biological replicates, individual values for each measurement are included now in the graphics as recommended by Reviewer #2, and the mean, SD and statistical tests are included. We make conclusions based in statistical significant differences along the manuscript. The temperature-sensitive yeast mutants used show reproducible analysis, they behave as expected in the controlled conditions used and they have been widely used in our lab and others (Pelechano and Alepuz, 2017; Zanelli and Valentini, 2005; Zanelli et al., 2006; Dias et al., 2008; Muñoz-Soriano et al., 2017; Rossi et al., 2014; Li et al., 2014; Xiao et al., 2024).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work provides a valuable contribution and assessment of what it means to replicate a null study finding, and what are the appropriate methods for doing so (apart from a rote p-value assessment). Through a convincing re-analysis of results from the Reproducibility Project: Cancer Biology using frequentist equivalence testing and Bayes factors, the authors demonstrate that even when reducing 'replicability success' to a single criterion, how precisely replication is measured may yield differing results. Less focus is directed to appropriate replication of non-null findings.

      Reviewer #1 (Public Review):

      Summary:

      The goal of Pawel et al. is to provide a more rigorous and quantitative approach for judging whether or not an initial null finding (conventionally with p ≥ 0.05) has been replicated by a second similarly null finding. They discuss important objections to relying on the qualitative significant/non-significant dichotomy to make this judgment. They present two complementary methods (one frequentist and the other Bayesian) which provide a superior quantitative framework for assessing the replicability of null findings.

      Strengths:

      Clear presentation; illuminating examples drawn from the well-known Reproducibility Project: Cancer Biology data set; R-code that implements suggested analyses. Using both methods as suggested provides a superior procedure for judging the replicability of null findings.

      Weaknesses:

      The proposed frequentist and the Bayesian methods both rely on binary assessments of an original finding and its replication. I'm not sure if this is a weakness or is inherent to making binary decisions based on continuous data.

      For the frequentist method, a null finding is considered replicated if the original and replication 90% confidence intervals for the effects both fall within the equivalence range. According to this approach, a null finding would be considered replicated if p-values of both equivalences tests (original and replication) were, say, 0.049, whereas would not be considered replicated if, for example, the equivalence test of the original study had a p-value of 0.051 and the replication had a p-value of 0.001. Intuitively, the evidence for replication would seem to be stronger in the second instance. The recommended Bayesian approach similarly relies on a dichotomy (e.g., Bayes factor > 1).

      Thanks for the suggestions, we now emphasize more strongly in the “Methods for assessing replicability of null results” and “Conclusions” sections that both TOST p-values and Bayes factors are quantitative measures of evidence that do not require dichotomization into “success” or “failure”.

      Reviewer #2 (Public Review):

      Summary:

      The study demonstrates how inconclusive replications of studies initially with p > 0.05 can be and employs equivalence tests and Bayesian factor approaches to illustrate this concept. Interestingly, the study reveals that achieving a success rate of 11 out of 15, or 73%, as was accomplished with the non-significance criterion from the RPCB (Reproducibility Project: Cancer Biology), requires unrealistic margins of Δ > 2 for equivalence testing.

      Strengths:

      The study uses reliable and shareable/open data to demonstrate its findings, sharing as well the code for statistical analysis. The study provides sensitivity analysis for different scenarios of equivalence margin and alfa level, as well as for different scenarios of standard deviations for the prior of Bayes factors and different thresholds to consider. All analysis and code of the work is open and can be replicated. As well, the study demonstrates on a case-by-case basis how the different criteria can diverge, regarding one sample of a field of science: preclinical cancer biology. It also explains clearly what Bayes factors and equivalence tests are.

      Weaknesses:

      It would be interesting to investigate whether using Bayes factors and equivalence tests in addition to p-values results in a clearer scenario when applied to replication data from other fields. As mentioned by the authors, the Reproducibility Project: Experimental Philosophy (RPEP) and the Reproducibility Project: Psychology (RPP) have data attempting to replicate some original studies with null results. While the RPCB analysis yielded a similar picture when using both criteria, it is worth exploring whether this holds true for RPP and RPEP. Considerations for further research in this direction are suggested. Even if the original null results were excluded in the calculation of an overall replicability rate based on significance, sensitivity analyses considering them could have been conducted. The present authors can demonstrate replication success using the significance criteria in these two projects with initially p < 0.05 studies, both positive and non-positive.

      Other comments:

      • Introduction: The study demonstrates how inconclusive replications of studies initially with p > 0.05 can be and employs equivalence tests and Bayesian factor approaches to illustrate this concept. Interestingly, the study reveals that achieving a success rate of 11 out of 15, or 73%, as was accomplished with the non-significance criterion from the RPCB (Reproducibility Project: Cancer Biology), requires unrealistic margins of Δ > 2 for equivalence testing.

      • Overall picture vs. case-by-case scenario: An interesting finding is that the authors observe that in most cases, there is no substantial evidence for either the absence or the presence of an effect, as evidenced by the equivalence tests. Thus, using both suggested criteria results in a picture similar to the one initially raised by the paper itself. The work done by the authors highlights additional criteria that can be used to further analyze replication success on a case-by-case basis, and I believe that this is where the paper's main contributions lie. Despite not changing the overall picture much, I agree that the p-value criterion by itself does not distinguish between (1) a situation where the original study had low statistical power, resulting in a highly inconclusive non-significant result that does not provide evidence for the absence of an effect and (2) a scenario where the original study was adequately powered, and a non-significant result may indeed provide some evidence for the absence of an effect when analyzed with appropriate methods. Equivalence testing and Bayesian factor approaches are valuable tools in both cases.

      Regarding the 0.05 threshold, the choice of the prior distribution for the SMD under the alternative H1 is debatable, and this also applies to the equivalence margin. Sensitivity analyses, as highlighted by the authors, are helpful in these scenarios.

      Thank you for the thorough review and constructive feedback. We have added an additional “Appendix C: Null results from the RPP and EPRP” that shows equivalence testing and Bayes factor analyses for the RPP and EPRP null results.

      Reviewer #3 (Public Review):

      Summary:

      The paper points out that non-significance in both the original study and a replication does not ensure that the studies provide evidence for the absence of an effect. Also, it can not be considered a "replication success". The main point of the paper is rather obvious. It may be that both studies are underpowered, in which case their non-significance does not prove anything. The absence of evidence is not evidence of absence! On the other hand, statistical significance is a confusing concept for many, so some extra clarification is always welcome.

      One might wonder if the problem that the paper addresses is really a big issue. The authors point to the "Reproducibility Project: Cancer Biology" (RPCB, Errington et al., 2021). They criticize Errington et al. because they "explicitly defined null results in both the original and the replication study as a criterion for replication success." This is true in a literal sense, but it is also a little bit uncharitable. Errington et al. assessed replication success of "null results" with respect to 5 criteria, just one of which was statistical (non-)significance.

      It is very hard to decide if a replication was "successful" or not. After all, the original significant result could have been a false positive, and the original null-result a false negative. In light of these difficulties, I found the paper of Errington et al. quite balanced and thoughtful. Replication has been called "the cornerstone of science" but it turns out that it's actually very difficult to define "replication success". I find the paper of Pawel, Heyard, Micheloud, and Held to be a useful addition to the discussion.

      Strengths:

      This is a clearly written paper that is a useful addition to the important discussion of what constitutes a successful replication.

      Weaknesses:

      To me, it seems rather obvious that non-significance in both the original study and a replication does not ensure that the studies provide evidence for the absence of an effect. I'm not sure how often this mistake is made.

      Thanks for the feedback. We do not have systematic data on how often the mistake of confusing absence of evidence with evidence of absence has been made in the replication context, but we do know that it has been made in at least three prominent large-scale replication projects (the RPP, RPEP, RPCB). We therefore believe that there is a need for our article.

      Moreover, we agree that the RPCB provided a nuanced assessment of replication success using five different criteria for the original null results. We emphasize this now more in the “Introduction” section. However, we do not consider our article as “a little bit uncharitable” to the RPCB, as we discuss all other criteria used in the RPCB and note that our intent is not to diminish the important contributions of the RPCB, but rather to build on their work and provide constructive recommendations for future researchers. Furthermore, in response to comments made by Reviewer #2, we have added an additional “Appendix B: Null results from the RPP and EPRP” that shows equivalence testing and Bayes factor analyses for null results from two other replication projects, where the same issue arises.

      Reviewer #1 (Recommendations For The Authors):

      The authors may wish to address the dichotomy issue I raise above, either in the analysis or in the discussion.

      Thank you, we now emphasize that Bayes factors and TOST p-values do not need to be dichotomized but can be interpreted as quantitative measures of evidence, both in the “Methods for assessing replicability of null results” and the “Conclusions” sections.

      Reviewer #2 (Recommendations For The Authors):

      Given that, here follow additional suggestions that the authors should consider in light of the manuscript's word count limit, to avoid confusing the paper's main idea:

      2) Referencing: Could you reference the three interesting cases among the 15 RPCB null results (specifically, the three effects from the original paper #48) where the Bayes factor differs qualitatively from the equivalence test?

      We now explicitly cite the original and replication study from paper #48.

      3) Equivalence testing: As the authors state, only 4 out of the 15 study pairs are able to establish replication success at the 5% level, in the sense that both the original and the replication 90% confidence intervals fall within the equivalence range. Among these 4, two (Paper #48, Exp #2, Effect #5 and Paper #48, Exp #2, Effect #6) were initially positive with very low p-values, one (Paper #48, Exp #2, Effect #4) had an initial p of 0.06 and was very precisely estimated, and the only one in which equivalence testing provides a clearer picture of replication success is Paper #41, Exp #2, Effect #1, which had an initial p-value of 0.54 and a replication p-value of 0.05. In this latter case (or in all these ones), one might question whether the "liberal" equivalence range of Δ = 0.74 is the most appropriate. As the authors state, "The post-hoc specification of equivalence margins is controversial."

      We agree that the post hoc choice of equivalence ranges is a controversial issue. The margins define an equivalence region where effect sizes are considered practically negligible, and we agree that in many contexts SMD = 0.74 is a large effect size that is not practically negligible. We therefore present sensitivity analyses for a wide range of margins. However, we do not think that the choice of this margin is more controversial for the mentioned studies with low p-values than for other studies with greater p-values, since the question of whether a margin plausibly encodes practically negligible effect sizes is not related to the observed p-value of a study. Nevertheless, for the new analyses of the RPP and EPRP data in Appendix B, we have added additional sensitivity analyses showing how the individual TOST p-values and Bayes factors vary as a function of the margin and the prior standard deviation. We think that these analyses provide readers with an even more transparent picture regarding the implications of the choice of these parameters than the “project-wise” sensitivity analyses in Appendix A.

      4) Bayes factor suggestions: For the Bayes factor approach, it would be interesting to discuss examples where the BF differs slightly. This is likely to occur in scenarios where sample sizes differ significantly between the original study and replication. For example, in Paper #48, Exp #2 and Effect #4, the initial p is 0.06, but the BF is 8.1. In the replication, the BF dramatically drops to < 1/1000, as does the p-value. The initial evidence of 8.1 indicates some evidence for the absence of an effect, but not strong evidence ("strong evidence for H0"), whereas a p-value of 0.06 does not lead to such a conclusion; instead, it favors H1. It would be interesting if the authors discussed other similar cases in the paper. It's worth noting that in Paper #5, Exp #1, Effect #3, the replication p-value is 0.99, while the BF01 is 2.4, almost indicating "moderate" evidence for H0, even though the p-value is inconclusive.

      We agree that some of the examples nicely illustrate conceptual differences between p-values and Bayes factors, e.g., how they take into account sample size and effect size. As methodologists, we find these aspects interesting ourselves, but we think that emphasizing them is beyond the scope of the paper and would distract eLife readers from the main messages.

      Concerning the conceptual differences between Bayes factors and TOST p-values, we already discuss a case where there are qualitative differences in more detail (original paper #48). We added another discussion of this phenomenon in the Appendix C as it also occurs for the replication of Ranganath and Nosek (2008) that was part of the RPP.

      5) p-values, magnitude and precision: It's noteworthy to emphasize, if the authors decide to discuss this, that the p-value is influenced by both the effect's magnitude and its precision, so in Paper #9, Exp #2, Effect #6, BF01 = 4.1 has a higher p-value than a BF01 = 2.3 in its replication. However, there are cases where both p-values and BF agree. For example, in Paper #15, Exp #2, Effect #2, both the original and replication studies have similar sample sizes, and as the p-value decreases from p = 0.95 to p = 0.23, BF01 decreases from 5.1 ("moderate evidence for H0") to 1.3 (region of "Absence of evidence"), moving away from H0 in both cases. This also occurs in Paper #24, Exp #3, Effect #6.

      We appreciate the suggestions but, as explained before, think that the message of our paper is better understood without additional discussion of more general differences between p-values and Bayes factors.

      6) The grey zone: Given the above topic, it is important to highlight that in the "Absence of evidence grey zone" for the null hypothesis, for example, in Paper #5, Exp #1, Effect #3 with a p = 0.99 and a BF01 = 2.4 in the replication, BF and p-values reach similar conclusions. It's interesting to note, as the authors emphasize, that Dawson et al. (2011), Exp #2, Effect #2 is an interesting example, as the p-value decreases, favoring H1, likely due to the effect's magnitude, even with a small sample size (n = 3 in both original and replications). Bayes factors are very close to one due to the small sample sizes, as discussed by the authors.

      We appreciate the constructive comments. We think that the two examples from Dawson et al. (2011) and Goetz et al. (2011) already nicely illustrate absence of evidence and evidence of absence, respectively, and therefore decided not to discuss additional examples in detail, to avoid redundancy.

      7) Using meta-analytical results (?): For papers from RPCB, comparing the initial study with the meta-analytical results using Bayes factor and equivalence testing approaches (thus, increasing the sample size of the analysis, but creating dependency of results since the initial study would affect the meta-analytical one) could change the conclusions. This would be interesting to explore in initial studies that are replicated by much larger ones, such as: Paper #9, Exp #2, Effect #6; Goetz et al. (2011), Exp #1, Effect #1; Paper #28, Exp #3, Effect #3; Paper #41, Exp #2, Effect #1; and Paper #47, Exp #1, Effect #5).

      Thank you for the suggestion. We considered adding meta-analytic TOST p-values and Bayes factors before, but decided that Figure 3 and the results section are already quite technical, so adding more analyses may confuse more than help. Nevertheless, these meta-analytic approaches are discussed in the “Conclusions” section.

      8) Other samples of fields of science: It would be interesting to investigate whether using Bayes factors and equivalence tests in addition to p-values results in a clearer scenario when applied to replication data from other fields. As mentioned by the authors, the Reproducibility Project: Experimental Philosophy (RPEP) and the Reproducibility Project: Psychology (RPP) have data attempting to replicate some original studies with null results. While the RPCB analysis yielded a similar picture when using both criteria, it is worth exploring whether this holds true for RPP and RPEP. Considerations for further research in this direction are suggested. Even if the original null results were excluded in the calculation of an overall replicability rate based on significance, sensitivity analyses considering them could have been conducted. The present authors can demonstrate replication success using the significance criteria in these two projects with initially p < 0.05 studies, both positive and non-positive.

      Thank you for the excellent suggestion. We added an Appendix B where the null results from the RPP and EPRP are analyzed with our proposed approaches. The results are also discussed in the “Results” and “Conclusions” sections.

      9) Other approaches: I am curious about the potential impact of using an approach based on equivalence testing (as described in https://arxiv.org/abs/2308.09112). It would be valuable if the authors could run such analyses or reference the mentioned work.

      Thank you. We were unaware of this preprint. It seems related to the framework proposed by Stahel W. A. (2021) New relevance and significance measures to replace p-values. PLoS ONE 16(6): e0252991. https://doi.org/10.1371/journal.pone.0252991

      We now cite both papers in the discussion.

      10) Additional evidence: There is another study in which replications of initially p > 0.05 studies with p > 0.05 replications were also considered as replication successes. You can find it here: https://www.medrxiv.org/content/10.1101/2022.05.31.22275810v2. Although it involves a small sample of initially p > 0.05 studies with already large sample sizes, the work is currently under consideration for publication in PLOS ONE, and all data and materials can be accessed through OSF (links provided in the work).

      Thank you for sharing this interesting study with us. We feel that it is beyond the scope of the paper to include further analyses as there are already analyses of the RPCB, RPP, and EPRP null results. However, we will keep this study in mind for future analysis, especially since all data are openly available.

      11) Additional evidence 02: Ongoing replication projects, such as the Brazilian Reproducibility Initiative (BRI) and The Sports Replication Centre (https://ssreplicationcentre.com/), continue to generate valuable data. BRI is nearing completion of its results, and it promises interesting data for analyzing replication success using p-values, equivalence regions, and Bayes factor approaches.

      We now cite these two initiatives as examples of ongoing replication projects in the introduction. Similarly as for your last point, we think that it is beyond the scope of the paper to include further analyses as there are already analyses of the RPCB, RPP, and EPRP null results.

      Reviewer #3 (Recommendations For The Authors):

      I have no specific recommendations for the authors.

      Thank you for the constructive review.

      Reviewing Editor (Recommendations For the Authors):

      I recognize that it was suggested to the authors by the previous Reviewing Editor to reduce the amount of statistical material to be made more suitable for a non-statistical audience, and so what I am about to say contradicts advice you were given before. But, with this revised version, I actually found it difficult to understand the particulars of the construction of the Bayes Factors and would have appreciated a few more sentences on the underlying models that fed into the calculations. In my opinion, the provided citations (e.g., Dienes Z. 2014. Using Bayes to get the most out of non-significant results) did not provide sufficient background to warrant a lack of more technical presentation here.

      Thank you for the feedback. We added a new “Appendix C: Technical details on Bayes factors” that provides technical details on the models, priors, and calculations underlying the Bayes factors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Bendzunas, Byrne et al. explore two highly topical areas of protein kinase regulation in this manuscript. Firstly, the idea that Cys modification could regulate kinase activity. The senior authors have published some standout papers exploring this idea of late, and the current work adds to the picture of how active site Cys might have been favoured in evolution to serve critical regulatory functions. Second, BRSK1/2 are understudied kinases listed as part of the "dark kinome" so any knowledge of their underlying regulation is of critical importance to advancing the field.

      Strengths:

      In this study, the author pinpoints highly-conserved, but BRSK-specific, Cys residues as key players in kinase regulation. There is a delicate balance between equating what happens in vitro with recombinant proteins relative to what the functional consequence of Cys mutation might be in cells or organisms, but the authors are very clear with the caveats relating to these connections in their descriptions and discussion. Accordingly, by extension, they present a very sound biochemical case for how Cys modification might influence kinase activity in cellular environs.

      Weaknesses:

      I have very few critiques for this study, and my major points are barely major.

      Major points

      (1) My sense is that the influence of Cys mutation on dimerization is going to be one of the first queries readers consider as they read the work. It would be, in my opinion, useful to bring forward the dimer section in the manuscript.

      We agree that the influence of Cys on BRSK dimerization is a topic of significant interest. Our primary focus was to explore oxidative regulation of the understudied BRSK kinases as they contain a conserved T-loop Cys, and we have previously demonstrated that equivalent residues at this position in related kinases were critical drivers of oxidative modulation of catalytic activity. We have demonstrated here that BRSK1 & 2 are similarly regulated by redox and this is due to oxidative modification of the T+2 Cys, in addition to Cys residues that are conserved amongst related ARKs as well as BRSK-specific Cys. Although we also provide evidence for limited redox-sensitive higher order BRSK species (dimers) in our in vitro analysis, these represent a small population of the total BRSK protein pool (this was validated by SEC-MALs analysis). As such, we do not have strong evidence to suggest that these limited dimers significantly contribute to the pronounced inhibition of BRSK1 & 2 in the presence of oxidizing agents, and instead believe that other biochemical mechanisms likely drive this response. This may result from oxidized Cys altering the conformation of the activation loop. Indeed, the formation of an intramolecular disulfide within the T-loop of BRSK1 & 2, which we detected by MS, is one such regulatory modification. It is noteworthy, that intramolecular disulfide bonds within the T-loop of AKT and MELK have already been shown to induce an inactive state in the kinase, and we posit a similar mechanism for BRSKs.

      While we recognize the potential importance of dimerization in this context, our current data from in vitro and cell-based assays do not provide substantial evidence to assert dimerization as a primary regulatory mechanism. Hence, we maintained a more conservative stance in our manuscript, discussing dimerization in later sections where it naturally followed from the initial findings. That being said, we acknowledge the potential significance of dimerization in the regulation of the BRSK T-loop cysteine. We believe this aspect merits further investigation and could indeed be the focus of a follow-up study.

      (2) Relatedly, the effect of Cys mutation on the dimerization properties of preparations of recombinant protein is not very clear as it stands. Some SEC traces would be helpful; these could be included in the supplement.

      In order to determine whether our recombinant BRSK proteins (and T-loop mutants) existed as monomers or dimers, we performed SDS-PAGE under reducing and non-reducing conditions (Fig 7). This unambiguously revealed that a monomer was the prominent species, with little evidence of dimers under these experimental conditions (even in the presence of oxidizing agents). Although we cannot discount a regulatory role for BRSK dimers in other physiological contexts, we could not produce sufficient evidence to suggest that multimerization played a substantial role in modifying BRSK kinase activity in our assays. We note that our in vitro analysis was performed using truncated forms of the protein, and as such it is entirely possible that regions of the protein that flank the kinase domain may serve additional regulatory functions that may include higher order BRSK conformations. In this regard, although we have not included SEC traces of our recombinant proteins, we have included analytical SEC-MALS of the truncated proteins (Supplementary Figure 6) which we believe to be more informative. We have also now included additional SEC-MALS data for BRSK2 C176A and C183A (Supplementary Figure 6d and e), which supports our findings in Fig 7, demonstrating the presence of limited dimer species under non-reducing conditions.

      (3) Is there any knowledge of Cys mutants in disease for BRSK1/2?

      We have conducted an extensive search across several databases: COSMIC (Catalogue of Somatic Mutations in Cancer), ProKinO (Protein Kinase Ontology), and TCGA (The Cancer Genome Atlas). These databases are well-regarded for their comprehensive and detailed records of mutations related to cancer and protein kinases. Our analysis using the COSMIC and TCGA databases focused on identifying any reported instances of Cys mutations in BRSK1/2 that are implicated in cancer. Additionally, we utilized the ProKinO database to explore the broader landscape of protein kinase mutations, including any potential disease associations of Cys mutations in BRSK1/2. However, we found no evidence to indicate the presence of Cys mutations in BRSK1/2 that are associated with cancer or disease. This lack of association in the current literature and database records suggests that, as of our latest search, Cys mutations in BRSK1/2 have not been reported as significant contributors to pathogenesis.

      (4) In bar charts, I'd recommend plotting data points. Plus, it is crucial to report in the legend what error measure is shown, the number of replicates, and the statistical method used in any tests.

      We have added the data points to the bar charts and included statistical methods in figure legends.

      (5) In Figure 5b, the GAPDH loading control doesn't look quite right.

      The blot has been repeated and updated.

      (6) In Figure 7 there is no indication of what mode of detection was used for these gels.

      We have updated the figure legend to confirm that the detection method was western blot.

      (7) Recombinant proteins - more detail should be included on how they were prepared. Was there a reducing agent present during purification? Where did they elute off SEC... consistent with a monomer of higher order species?

      We have added ‘produced in the absence of reducing agents unless stated otherwise’ in the methods section to improve clarity. Although we have not added additional sentences to describe the elution profile of the BRSK proteins by SEC during purification, we believe that the inclusion of analytical SEC-MALS data is sufficient evidence that the proteins are largely monomeric under non-reducing conditions.

      Reviewer #2 (Public Review):

      Summary:

      In this study by Bendzunas et al, the authors show that the formation of intra-molecular disulfide bonds involving a pair of Cys residues near the catalytic HRD motif and a highly conserved T-Loop Cys with a BRSK-specific Cys at an unusual CPE motif at the end of the activation segment function as repressive regulatory mechanisms in BSK1 and 2. They observed that mutation of the CPE-Cys only, contrary to the double mutation of the pair, increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells. Molecular modeling and molecular dynamics simulations indicate that oxidation of the CPE-Cys destabilizes a conserved salt bridge network critical for allosteric activation. The occurrence of spatially proximal Cys amino acids in diverse Ser/Thr protein kinase families suggests that disulfide-mediated control of catalytic activity may be a prevalent mechanism for regulation within the broader AMPK family. Understanding the molecular mechanisms underlying kinase regulation by redox-active Cys residues is fundamental as it appears to be widespread in signaling proteins and provides new opportunities to develop specific covalent compounds for the targeted modulation of protein kinases.

      The authors demonstrate that intramolecular cysteine disulfide bonding between conserved cysteines can function as a repressing mechanism as indicated by the effect of DTT and the consequent increase in activity by BSK-1 and -2 (WT). The cause-effect relationship of why mutation of the CPE-Cys only increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells is not clear to me. The explanation given by the authors based on molecular modeling and molecular dynamics simulations is that oxidation of the CPE-Cys (that will favor disulfide bonding) destabilizes a conserved salt bridge network critical for allosteric activation. However, no functional evidence of the impact of the salt-bridge network is provided. If you mutated the two main Cys-pairs (aE-CHRD and A-loop T+2-CPE) you lose the effect of DTT, as the disulfide pairs cannot be formed, hence no repression mechanisms take place, however when looking at individual residues I do not understand why mutating the CPE only results in the opposite effect unless it is independent of its connection with the T+2residue on the A-loop.

      Strengths:

      This is an important and interesting study providing new knowledge in the protein kinase field with important therapeutic implications for the rationale design and development of next-generation inhibitors.

      Weaknesses:

      There are several issues with the figures that this reviewer considers should be addressed.

      Reviewer #1 (Recommendations for The Authors):

      Major points

      Page 26 - the discussion could be more concise. There's an element of recapping the results, which should be avoided.

      Regarding the conciseness of the discussion section, we have thoroughly revised it to ensure a more succinct presentation, deliberately avoiding the recapitulation of results. The revised discussion now focuses on interpreting the findings and their implications, steering clear of redundancy with the results section.

      Figure 1b seems to be mislabeled/annotated. I recommend checking whether the figure legends match more broadly. Figure 1 appears to be incorrectly cited throughout the results.

      Thank you for pointing out the discrepancies in the labeling and citation of Figure 1b. We have carefully reviewed and corrected these issues to ensure that all figure labels, legends, and citations accurately reflect the corresponding data and illustrations. We appreciate your attention to detail and the opportunity to improve the clarity and accuracy of our presentation.

      Figure 6 - please include a color-coding key in the figure. Further support for these simulations could be provided by supplementary movies or plots of the interaction. Figure 4 colour palette should be adjusted for the spheres in the Richardson diagrams to have greater distinction.

      As suggested, we have amended the colour palette in Figure 4 to improve conformity throughout the figure.

      Minor points

      Figure 2 - it'd be helpful to know what the percentage coverage of peptides is.

      We have updated the figure legend to include peptide coverage for both proteins

      Some typos - Supp 2 legend "Domians".

      Fixed

      Figure 6 legend - analyzed by needs a space;

      Fixed

      Fig 8 legend schematic misspelled.

      Fixed

      Broadly, if you Google T-loop you get a pot pourri of enzyme answers. Why not just use Activation loop?

      The choice of "T-loop" over "Activation loop" in our manuscript was made to maintain consistency with other literature in the field, and in particular our previous paper “Aurora A regulation by reversible cysteine oxidation reveals evolutionarily conserved redox control of Ser/Thr protein kinase activity” where we refer to the activation loop cysteine as T-loop + 2. We acknowledge the varied enzyme contexts in which "T-loop" is used and agree on the importance of clarity. To address this, we made an explicit note in the manuscript that the "T-loop" is also referred to as the "Activation loop", ensuring readers are aware of the interchangeable use of these terms. Additionally, this nomenclature facilitates a more straightforward designation of cysteine residues within the loop (T+2 Cysteine). We believe this approach balances adherence to established conventions with the need for clarity and precision in our descriptions.

      Methods - what is LR cloning. Requires some definition. Some manufacturer detail is missing in methods, and referring to prior work is not sufficient to empower readers to replicate.

      We agree, and have added the following to the methods section:

      “BRSK1 and 2 were sub-cloned into pDest vectors (to encode the expression of N-terminal Flag or HA tagged proteins) using the Gateway LR Clonase II system (Invitrogen) according to the manufacturer’s instructions. pENtR BRSK1/2 clones were obtained in the form of Gateway-compatible donor vectors from Dr Ben Major (Washington University in St. Louis). The Gateway LR Clonase II enzyme mix mediates recombination between the attL sites on the Entry clone and the attR sites on the destination vector. All cloned BRSK1/2 genes were fully sequenced prior to use.”

      Page 7 - optimal settings should be reported. How were pTau signals quantified and normalised?

      We have added the following to the methods section:

      “Two-color Western blot detection method employing infrared fluorescence was used to measure the ratio of Tau phospho serine 262 to total Tau. Total GFP Tau was detected using a mouse anti GFP antibody and visualized at 680 nm using goat anti mouse IRdye 680 while phospho-tau was detected using a Tau phospho serine 262 specific antibody and visualized at 800 nm using goat anti rabbit IRdye 800. Imaging was performed using a Licor Odessey Clx with scan control settings set to 169 μm, medium quality, and 0.0 mm distance. Quantification was performed using Licor image studio on the raw image files. Total Tau to phospho Tau ratio was determined by measuring the ratio of the fluorescence intensities measured at 800 nm (pTau) to those at 680 nm (total tau).”

      In the Figure 6g-j legend, the salt bridge is incorrectly annotated as E185-R248 rather than 258.

      Fixed

      Lines 393-395 provides a repeat statement on BRSKs phosphorylating Tau (from 388-389).

      We have removed the repetition and reworded the opening lines of the results section to improve the overall flow of the manuscript.

      Supp. Figure 1 is difficult to view - would it be possible to increase the size of the phylogenetic analysis?

      We thank the reviewer for this observation. We have rotated (90°) and expanded the figure so that it can be more clearly viewed

      Supp. Figure 2 - BRSK1/2 incorrectly spelled.

      Fixed

      Please check the alignment of labels in Supp. Figure 3e.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1, current panel b is not mentioned/described in the figure legend and as a consequence, the rest of the panels in the legends do not fit the content of the figure.

      Reviewer 1 also noted this error, and we have amended the manuscript accordingly.

      What is the rationale for using the HEK293T cells as the main experimental/cellular system? Are there cell lines that express both proteins endogenously so that the authors can recapitulate the results obtained from ectopic overexpression?

      The selection of HEK-293T cells was driven by their well-established utility in overexpression studies, which make them ideal for the investigation of protein interactions and redox regulation. This cell line's robust transfection efficiency and well-characterized biology provide a reliable platform for dissecting the molecular mechanisms underlying the redox regulation of proteins. Furthermore, the use of HEK-293T cells aligns with the broader scientific practice, serving as a common ground for comparability with existing literature in the field of BRSK1/2 signaling, protein regulation and interaction studies.

      The application of HEK-293T cells as a model system in our study serves as a foundational step towards eventually elucidating the functions of BRSK1/2 in neuronal cells, where these kinases are predominantly expressed and play critical roles. Given the fact that BRSKs are classed as ‘understudied’ kinases, the choice of a HEK-293T co-overexpression system allowed us to analyze the direct effects of BRSK kinase activity (using phosphorylation of Tau as a readout) in a cellular context and in more controlled manner. This approach not only aids in the establishment of a baseline understanding of the redox regulation of BRSK1/2, but also sets the stage for subsequent investigations in more physiologically relevant neuronal models

      In current panel d, could the authors recapitulate the same experimental conditions as in current panel c?

      Figure 1 panel c shows that both BRSK1 and 2 are reversibly inhibited by oxidizing agents such as H2O2, whilst panels d and e show the concentration dependent activation and inhibition of the BRSKs with increasing concentrations of DTT and H2O2 respectively. The experimental conditions were identical, other than changing amounts of reducing and oxidizing agents, and used the same peptide coupled assays. Data for all experiments were originally collected in ‘real time’ as depicted in Fig 1c (increase in substrate phosphorylation over time). However, to aid interpretation of the data, we elected to present the latter two panels as dose response curves by calculating the change in the rate of enzyme activity (shown as pmol phosphate incorporated into the peptide substrate per min) for each condition. To aid the reader, we now include an additional supplementary figure (new supplementary figure 2) depicting BRSK1 and 2 dependent phosphorylation of the peptide substrate in the presence of different concentrations of DTT and H2O2 in a real time (kinetic) assay. The new data shown is a subset of the unprocessed data that was used to calculate the rates of BRSK activity in Fig 1d & e.

      Why did the authors use full-length constructs in these experiments and did not in e.g. Figure 2 where they used KD constructs instead?

      In the initial experiments, illustrated in Figure 1, we employed full-length protein constructs to establish a proof of concept, demonstrating the overall behavior and interactions of the proteins in their full-length form. This confirmed that BRSK1 & 2, which both contain a conserved T + 2 Cys residue that is frequently prognostic for redox sensitivity in related kinases, displayed a near-obligate requirement for reducing agents to promote kinase activity.  

      Subsequently, in Figure 2, our focus shifted towards delineating the specific regions within the proteins that are critical for redox regulation. By using constructs that encompass only the kinase domain, we aimed to demonstrate that the redox-sensitive regulation of these proteins is predominantly mediated by specific cysteine residues located within the kinase domain itself. This strategic use of the kinase domain of the protein allowed for a more targeted investigation. Furthermore, in our hands these truncated forms of the protein were more stable at higher concentrations, enabling more detailed characterization of the proteins by DSF and SEC-MALS. We predict that the flanking disordered regions of the full-length protein (as predicted by AlphaFold) contribute to this effect.

      (2) In Figure 2, Did the authors try to do LC/MS-MS in the same experimental conditions as in Figure 1 (e.g. buffer minus/plus DTT, H2O2, H2O2 + DTT)?

      We would like to clarify that the mass spectrometry experiments were conducted exclusively on proteins purified under native (non-reducing) conditions. We did not extend the LC/MS-MS analyses to include proteins treated with various buffer conditions such as minus/plus DTT, H2O2, or H2O2 + DTT as used in the experiments depicted in Figure 1. Given that we could readily detect disulfides in the absence of oxidizing agents, we did not see the benefit of additional treatment conditions as peroxide treatment of protein samples can frequently complicate interpretation of MS data. However, it should be noted that prior to MS analysis, tryptic peptides were subjected to a 50:50 split, with one half alkylated in the presence of DTT (as described in the methods section) to eliminate disulfides and other transiently oxidized Cys forms. Comparative analysis between reduced and non-reduced tryptic peptides improved our confidence when assigning disulfide bonds (which were eliminated in identical peptides in the presence of DTT).

      On panel b, why did the authors show alphafold predictions and not empiric structural information (e.g. X-ray, EM,..)?

      The AlphaFold models were primarily utilized to map the general locations of redox-sensitive cysteine pairs within the proteins of interest. Although we have access to the crystal structure of mouse BRSK2, they do not fully capture the active conformation seen in the Alphafold model of the human version. The use of AlphaFold models for human proteins in this study aids in consistently tracking residue numbering across the manuscript, offering a useful framework for understanding the spatial arrangement of these critical cysteine pairs in their potentially active-like states. This approach facilitates our analysis and discussion by providing a reference for the structural context of these residues in the human proteins.

      What was the rationale for using the KD construct and not the FL as in Figure 1?

      The rationale to use the kinase domain was primarily based on the significantly lower confidence in the structural predictions for regions outside the kinase domain (KD). Our experimental focus was to investigate the role of conserved cysteine residues within the kinase domain, which are critical for the protein's function and regulation. This targeted approach allowed us to concentrate our analyses on the most functionally relevant and structurally defined portion of the protein, thereby enhancing the precision and relevance of our findings. As is frequently the case, truncated forms of the protein, consisting only of the kinase domain, are much more stable than their full length counterparts and are therefore more amenable to in vitro biochemical analysis. In our hands this was true for both BRSK1 and 2, and as such much of the data collected here was generated using kinase-domain (KD) constructs. Simulations using the KD structures are therefore much more representative of our original experimental setup.

      The BSK1 KD construct appears to be rather inactive and not responsive to DTT treatment. Could the authors comment on the differences observed with the FL construct of Figure 1

      It is important to note that BRSK1, in general, exhibits lower intrinsic activity compared to BRSK2. This reduced activity could be attributed to a range of factors, including the need for activation by upstream kinases such as LKB1, as well as potential post-translational modifications (PTMs) that may be absent in the bacterially expressed KD construct. The full-length forms of the protein were purified from Sf21 cells, and as such may have additional modifications that are lacking in the bacterially derived KD counterparts. We also cannot discount additional regulatory roles of the regions that flank the KD, and these may contribute in part to the modest discrepancy observed between constructs.  Despite these differences, it is crucial to emphasize that both the KD and FL constructs of BRSK1 are regulated by DTT, indicating a conserved redox-dependent activation for both of the related BRSK proteins.  

      (3) In Figure 4, on panel A wouldn´t the authors expect that mutating on the pairs e.g. C198A in BSK1 would have the same effect as mutating the C191 from the T+2 site? Did they try mutating individual sites of the aE/CHRD pair? The same will apply to BSK2

      We appreciate the insightful comment. It's important to clarify that the redox regulation of these proteins is influenced not solely by the formation of disulfide bonds but also by the oxidation state of individual cysteine residues, particularly the T+2 Cys. This nuanced mechanism of regulation allows for a diverse range of functional outcomes based on the specific cysteine involved and its state of oxidation. This aspect forms a key finding of our paper, highlighting the complexity of redox regulation beyond mere disulfide bond formation. For example, AURA kinase activity is regulated by oxidation of a single T+2 Cys (Cys290, equivalent to Cys191 and Cys176 of BRSK1 and 2 respectively), but this regulation can be supplemented through artificial incorporation of a secondary Cys at the DFG+2 position (Byrne et al., 2020). This targeted genetic modification or AURA mirrors equivalent regulatory disulfide-forming Cys pairs that naturally occur in kinases such as AKT and MELK, and which provide an extra layer of regulatory fine tuning (and a possible protective role to prevent deleterious over oxidation) to the T+2 Cys. We surmise that the CPE Cys is also an accessory regulatory element to the T+2 Cys in BRSK1 +2, which is the dominant driver of BRSK redox sensitivity (as judged by the fact that CPE Cys mutants are still potently regulated by redox [Fig 4]), by locking it in an inactive disulfide configuration.

      In our preliminary analysis of BRSK1, we observed that mutations of individual sites within the aE/CHRD pair was similarly detrimental to kinase activity as a tandem mutation (see reviewer figure 1). As discussed in the manuscript, we think that these Cys may serve important structural regulatory functions and opted to focus on co-mutations of the aE/CHRD pair for the remainder of our investigation.

      Author response image 1.

      In vitro kinase assays showing rates of in vitro peptide phosphorylation by WT and Cys-to-Ala (aE/CHRD residues) variants of BRSK1 after activation by LKB1.

      In panels C and D, the same experimental conditions should have been measured as in A and B.

      Panels A and B were designed to demonstrate the enzymatic activity and the response to DTT treatment to establish the baseline redox regulation of the kinase and a panel of Cys-to-Ala mutant variants. In contrast, panels C and D were specifically focused on rescue experiments with mutants that showed a significant effect under the conditions tested in A and B. These panels were intended to further explore the role of redox regulation in modulating the activity of these mutants, particularly those that retained some level of activity or exhibited a notable response to redox changes.

      The rationale for this experimental design was to prioritize the investigation of mutants, such as those at the T+2 and CPE cysteine sites, which provided the most insight into the redox-dependent modulation of kinase activity. Other mutants, which resulted in inactivation, were deprioritized in this context as they offered limited additional information regarding the redox regulation mechanism. This focused approach allowed us to delve deeper into understanding how specific cysteine residues contribute to the redox-sensitive control of kinase function, aligning with the overall objective of elucidating the nuanced roles of redox regulation in kinase activity.

      (4) In figure 5: Why did the authors use reduced Glutathione instead of DTT? The authors should have recapitulated the same experimental conditions as in Figure 4 and not focused only on the T+2 or the CPE single mutants but using the double and the aE/CHRD mutants as well, as internal controls and validation of the enzymatic assays using the modified peptide

      Regarding the use of reduced glutathione (GSH) instead of DTT in Figure 5, we chose GSH for its well characterized biological relevance as an antioxidant in cellular responses to oxidative stress. Furthermore, while DTT has been widely used in experimental setups, it is also potentially cytotoxic at high concentrations.

      Addressing the point on experimental consistency with Figure 4, we appreciate the suggestion and indeed had already conducted such experiments (Previously Supp Fig 3, now changed to current Supp Fig 4). These experiments include analyses of BRSK mutant activity in a HEK-293T model. However, we chose not to focus on inactivating mutants (such as the aE/CHRD mutants which had depleted expression levels possibly as a consequence of compromised structural integrity) or pursue the generation of double mutant CMV plasmids, as these were deemed unlikely to add significant insights into the core narrative of our study. Our focus remained on the mutants that yielded the most informative results regarding the redox regulation mechanisms in the in vitro setting, ensuring a clear and impactful presentation of our findings.

      A time course evaluation of the reducing or oxidizing reagents should have been performed. Would we expect that in WT samples, and in the presence of GSH, and also in the case of the CPE mutant, an increment in the levels of Tau phosphorylation as a readout of BSK1-2 activity?

      We acknowledge the importance of such analyses in understanding the dynamic nature of redox regulation on kinase activity and have included a time course (Supp Fig 2 e-g). These results confirm a depletion of Tau phosphorylation over time in response to peroxide generated by the enzyme glucose oxidase.

      (5) In Figure 6, did the authors look at the functional impact of the residues with which interact the T+2 and the CPE motifs e.g. T174 and the E185-R258 tether?

      Our primary focus was on the salt bridges, as this is a key regulatory structural feature that is conserved across many kinases. Regarding the additional interactions mentioned, we have thoroughly evaluated their roles and dynamics through molecular dynamics (MD) simulations but did not find any results of significant relevance to warrant inclusion.

      (6) In Figure 7: Did the author look at the oligomerization state of the BSK1-2 multimers under non-reducing conditions? Were they also observed in the case of the FL constructs? What was the stoichiometry?

      Our current work indicates that the kinase domain of BRSK1-2 primarily exists in a monomeric state, with some evidence of dimerization or multimer formation under specific conditions. Our SEC-MALS (Supp Fig 6) and SDS-PAGE analysis (Figure 7) clearly demonstrates that monomers are overwhelmingly the dominant species under non-reducing conditions (>90 %). We also conclude that these limited oligomeric species can be removed by inclusion of reducing agents such as DTT (Figure 7), which may suggest a role for a Cys residue(s). Notably, removal of the T+2 Cys was insufficient to prevent multimerization.

      We were unable to obtain reliable SEC-MALS data for the full-length forms of the protein, likely due to the presence of disordered regions that flank the kinase domain which results in a highly heterodispersed and unstable preparation (at the concentrations required for SEC-MALS). Although we are therefore unable to comment on the stoichiometry of FL BRSK dimers, we can detect BRSK1 and 2 hetero- and homo-complexes in HEK-293T cells by IP, which supports the existence of limited BRSK1 & 2 dimers (Supp Fig 6a). However, we were unable to detect intermolecular disulfide bonds by MS, although this does not necessarily preclude their existence. The physiological role of BRSK multimerization (if any) and establishing specifically which Cys residues drive this phenomenon is of significant interest to our future investigations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer #1:

      Thank you for the careful reading and the positive evaluation of our manuscript. As you mentioned, the present study tried to address the question of how the lost genomic functions could be compensated by evolutionary adaptation, indicating the potential mechanism of "constructive" rather than "destructive" evolution. Thank you for the instructive comments that helped us to improve the manuscript. We sincerely hope the revised manuscript and the following point-to-point response meet your concerns.

      • Line 80 "Growth Fitness" is this growth rate?

      Yes. The sentence was revised as follows.

      (L87-88) “The results demonstrated that most evolved populations (Evos) showed improved growth rates, in which eight out of nine Evos were highly significant (Fig. 1B, upper).”

      • Line 94 a more nuanced understanding of r/K selection theory, allows for trade-ups between R and K, as well as trade-offs. This may explain why you did not see a trade-off between growth and carrying capacity in this study. See this paper https://doi.org/10.1038/s41396-023-01543-5. Overall, your evos lineages evolved higher growth rates and lower carrying capacity (Figures 1B, C, E). If selection was driving the evolution of higher growth rates, it may have been that there was no selective pressure to maintain high carrying capacity. This means that the evolutionary change you observed in carrying capacity may have been neutral "drift" of the carrying capacity trait, during selection for growth rate, not because of a trade-off between R and K. This is especially likely since carrying capacity declined during evolution. Unless the authors have convincing evidence for a tradeoff, I suggest they remove this claim.

      • Line 96 the authors introduce a previous result where they use colony size to measure growth rate, this finding needs to be properly introduced and explained so that we can understand the context of the conclusion.

      • Line 97 This sentence "the collapse of the trade-off law likely resulted from genome reduction." I am not sure how the authors can draw this conclusion, what is the evidence supporting that the genome size reduction causes the breakdown of the tradeoff between R and K (if there was a tradeoff)?

      Thank you for the reference information and the thoughtful comments. The recommended paper was newly cited, and the description of the trade-off collapse was deleted. Accordingly, the corresponding paragraph was rewritten as follows.

      (L100-115) “Intriguingly, a positive correlation was observed between the growth fitness and the carrying capacity of the Evos (Fig. 1D). It was somehow consistent with the positive correlations between the colony growth rate and the colony size of a genome-reduced strain 11 and between the growth rates and the saturated population size of an assortment of genome reduced strains 13. Nevertheless, the negative correlation between growth rate and carrying capacity, known as the r/K selection30,31 was often observed as the trade-off relationship between r and K in the evolution and ecology studies 32 33,34. As the r/K trade-off was proposed to balance the cellular metabolism that resulted from the cost of enzymes involved 34, the deleted genes might play a role in maintaining the metabolism balance for the r/K correlation. On the other hand, the experimental evolution (i.e., serial transfer) was strictly performed within the exponential growth phase; thus, the evolutionary selection was supposed to be driven by the growth rate without selective pressure to maintain the carrying capacity. The declined carrying capacity might have been its neutral "drift" but not a trade-off to the growth rate. Independent and parallel experimental evolution of the reduced genomes selecting either r or K is required to clarify the actual mechanisms.”

      • Line 103 Genome mutations. The authors claim that there are no mutations in parallel but I see that there is a 1199 base pair deletion in eight of the nine evo strains (Table S3). I would like the author to mention this and I'm actually curious about why the authors don't consider this parallel evolution.

      Thank you for your careful reading. According to your comment, we added a brief description of the 1199-bp deletion detected in the Evos as follows.

      (L119-122) “The number of mutations largely varied among the nine Evos, from two to 13, and no common mutation was detected in all nine Evos (Table S3). A 1,199-bp deletion of insH was frequently found in the Evos (Table S3, highlighted), which well agreed with its function as a transposable sequence.”

      • Line 297 Please describe the media in full here - this is an important detail for the evolution experiment. Very frustrating to go to reference 13 and find another reference, but no details of the method. Looked online for the M63 growth media and the carbon source is not specified. This is critical for working out what selection pressures might have driven the genetic and transcriptional changes that you have measured. For example, the parallel genetic change in 8/9 populations is a deletion of insH and tdcD (according to Table S3). This is acetate kinase, essential for the final step in the overflow metabolism of glucose into acetate. If you have a very low glucose concentration, then it could be that there was selection to avoid fermentation and devote all the pyruvate that results from glycolysis into the TCA cycle (which is more efficient than fermentation in terms of ATP produced per pyruvate).

      Sorry for the missing information on the medium composition, which was additionally described in the Materials and Methods. The glucose concentration in M63 was 22 mM, which was supposed to be enough for bacterial growth. Thank you for your intriguing thinking about linking the medium component to the genome mutation-mediated metabolic changes. As there was no experimental result regarding the biological function of gene mutation in the present study, please allow us to address this issue in our future work.

      (L334-337) “In brief, the medium contains 62 mM dipotassium hydrogen phosphate, 39 mM potassium dihydrogen phosphate, 15 mM ammonium sulfate, 15 μM thiamine hydrochloride, 1.8 μM Iron (II) sulfate, 0.2 mM magnesium sulfate, and 22 mM glucose.”

      • Line 115. I do not understand this argument "They seemed highly related to essentiality, as 11 out of 49 mutated genes were essential (Table S3)." Is this a significant enrichment compared to the expectation, i.e. the number of essential genes in the genome? This enrichment needs to be tested with a Hypergeometric test or something similar.

      • Also, "As the essential genes were known to be more conserved than nonessential ones, the high frequency of the mutations fixed in the essential genes suggested the mutation in essentiality for fitness increase was the evolutionary strategy for reduced genome." I do not think that there is enough evidence to support this claim, and it should be removed.

      Sorry for the unclear description. Yes, the mutations were significantly enriched in the essential genes (11 out of 45 genes) compared to the essential genes in the whole genome (286 out of 3290 genes). The improper description linking the mutation in essential genes to the fitness increase was removed, and an additional explanation on the ratio of essential genes was newly supplied as follows.

      (L139-143) “The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008). As the essential genes were determined according to the growth35 and were known to be more conserved than nonessential ones 36,37, the high frequency of the mutations fixed in the essential genes was highly intriguing and reasonable.”

      • Line 124 Regarding the mutation simulations, I do not understand how the observed data were compared to the simulated data, and how conclusions were drawn. Can the authors please explain the motivation for carrying out this analysis, and clearly explain the conclusions?

      Random simulation was additionally explained in the Materials and Methods and the conclusion of the random simulation was revised in the Results, as follows.

      (L392-401) “The mutation simulation was performed with Python in the following steps. A total of 65 mutations were randomly generated on the reduced genome, and the distances from the mutated genomic locations to the nearest genomic scars caused by genome reduction were calculated. Subsequently, Welch's t-test was performed to evaluate whether the distances calculated from the random mutations were significantly longer or shorter than those calculated from the mutations that occurred in Evos. The random simulation, distance calculation, and statistic test were performed 1,000 times, which resulted in 1,000 p values. Finally, the mean of p values (μp) was calculated, and a 95% reliable region was applied. It was used to evaluate whether the 65 mutations in the Evos were significantly close to the genomic scars, i.e., the locational bias.”

      (L148-157) “Random simulation was performed to verify whether there was any bias or hotspot in the genomic location for mutation accumulation due to the genome reduction. A total of 65 mutations were randomly generated on the reduced genome (Fig. 2B), and the genomic distances from the mutations to the nearest genome reduction-mediated scars were calculated. Welch's t-test was performed to evaluate whether the genomic distances calculated from random mutations significantly differed from those from the mutations accumulated in the Evos. As the mean of p values (1,000 times of random simulations) was insignificant (Fig. 2C, μp > 0.05), the mutations fixed on the reduced genome were either closer or farther to the genomic scars, indicating there was no locational bias for mutation accumulation caused by genome reduction.”

      • Line 140 The authors should give some background here - explain the idea underlying chromosomal periodicity of the transcriptome, to help the reader understand this analysis.

      • Line 142 Here and elsewhere, when referring to a method, do not just give the citation, but also refer to the methods section or relevant supplementary material.

      The analytical process (references and methods) was described in the Materials and Methods, and the reason we performed the chromosomal periodicity was added in the Results as follows.

      (L165-172) “As the E. coli chromosome was structured, whether the genome reduction caused the changes in its architecture, which led to the differentiated transcriptome reorganization in the Evos, was investigated. The chromosomal periodicity of gene expression was analyzed to determine the structural feature of genome-wide pattern, as previously described 28,38. The analytical results showed that the transcriptomes of all Evos presented a common six-period with statistical significance, equivalent to those of the wild-type and ancestral reduced genomes (Fig. 3A, Table S4).”

      • Line 151 "The expression levels of the mutated genes were higher than those of the remaining genes (Figure 3B)"- did this depend on the type of mutation? There were quite a few early stops in genes, were these also more likely to be expressed? And how about the transcriptional regulators, can you see evidence of their downstream impact?

      Sorry, we didn't investigate the detailed regulatory mechanisms of 49 mutated genes, which was supposed to be out of the scope of the present study. Fig. 3B was the statistical comparison between 3225 and 49 genes. It didn't mean that all mutated genes expressed higher than the others. The following sentences were added to address your concern.

      (L181-185) “As the regulatory mechanisms or the gene functions were supposed to be disturbed by the mutations, the expression levels of individual genes might have been either up- or down-regulated. Nevertheless, the overall expression levels of all mutated genes tended to be increased. One of the reasons was assumed to be the mutation essentiality, which remained to be experimentally verified.”

      • Line 199 onward. The authors used WGCNA to analyze the gene expression data of evolved organisms. They identified distinct gene modules in the reduced genome, and through further analysis, they found that specific modules were strongly associated with key biological traits like growth fitness, gene expression changes, and mutation rates. Did the authors expect that there was variation in mutation rate across their populations? Is variation from 3-16 mutations that they observed beyond the expectation for the wt mutation rate? The genetic causes of mutation rate variation are well understood, but I could not see any dinB, mutT,Y, rad, or pol genes among the discovered mutations. I would like the authors to justify the claim that there was mutation rate variation in the evolved populations.

      Thank you for the intriguing thinking. We don't think the mutation rates were significantly varied across the nine populations, as no mutation occurred in the MMR genes, as you noticed. Our previous study showed that the spontaneous mutation rate of the reduced genome was higher than that of the wild-type genome (Nishimura et al., 2017, mBio). As nonsynonymous mutations were not detected in all nine Evos, the spontaneous mutation rate couldn't be calculated (because it should be evaluated according to the ratio of nonsynonymous and synonymous single-nucleotide substitutions in molecular evolution). Therefore, discussing the mutation rate in the present study was unavailable. The following sentence was added for a better understanding of the gene modules.

      (L242-245) “These modules M2, M10 and M16 might be considered as the hotspots for the genes responsible for growth fitness, transcriptional reorganization, and mutation accumulation of the reduced genome in evolution, respectively.”

      • Line 254 I get the idea of all roads leading to Rome, which is very fitting. However, describing the various evolutionary strategies and homeostatic and variable consequence does not sound correct - although I am not sure exactly what is meant here. Looking at Figure 7, I will call strategy I "parallel evolution", that is following the same or similar genetic pathways to adaptation and strategy ii I would call divergent evolution. I am not sure what strategy iii is. I don't want the authors to use the terms parallel and divergent if that's not what they mean. My request here would be that the authors clearly describe these strategies, but then show how their results fit in with the results, and if possible, fit with the naming conventions, of evolutionary biology.

      Thank you for your kind consideration and excellent suggestion. It's our pleasure to adopt your idea in tour study. The evolutionary strategies were renamed according to your recommendation. Both the main text and Fig. 7 were revised as follows.

      (L285-293) “Common mutations22,44 or identical genetic functions45 were reported in the experimental evolution with different reduced genomes, commonly known as parallel evolution (Fig. 7, i). In addition, as not all mutations contribute to the evolved fitness 22,45, another strategy for varied phenotypes was known as divergent evolution (Fig. 7, ii). The present study accentuated the variety of mutations fixed during evolution. Considering the high essentiality of the mutated genes (Table S3), most or all mutations were assumed to benefit the fitness increase, partially demonstrated previously 20. Nevertheless, the evolved transcriptomes presented a homeostatic architecture, revealing the divergent to convergent evolutionary strategy (Fig. 7, iii).”

      Author response image 1.

      • Line 327 Growth rates/fitness. I don't think this should be called growth fitness- a rate is being calculated. I would like the authors to explain how the times were chosen - do the three points have to be during the log phase? Can you also explain what you mean by choosing three ri that have the largest mean and minor variance?

      Sorry for the confusing term usage. The fitness assay was changed to the growth assay. Choosing three ri that have the largest mean and minor variance was to avoid the occasional large values (blue circle), as shown in the following figure. In addition, the details of the growth analysis can be found at https://doi.org/10.3791/56197 (ref. 59), where the video of experimental manipulation, protocol, and data analysis is deposited. The following sentence was added in accordance.

      Author response image 2.

      (L369-371) “The growth rate was determined as the average of three consecutive ri, showing the largest mean and minor variance to avoid the unreliable calculation caused by the occasionally occurring values. The details of the experimental and analytical processes can be found at https://doi.org/10.3791/56197.”

      • Line 403 Chromosomal periodicity analysis. The windows chosen for smoothing (100kb) seem big. Large windows make sense for some things - for example looking at how transcription relates to DNA replication timing, which is a whole-genome scale trend. However, here the authors are looking for the differences after evolution, which will be local trends dependent on specific genes and transcription factors. 100kb of the genome would carry on the order of one hundred genes and might be too coarse-grained to see differences between evos lineages.

      Thank you for the advice. We agree that the present analysis focused on the global trend of gene expression. Varying the sizes may lead to different patterns. Additional analysis was performed according to your comment. The results showed that changes in window size (1, 10, 50, 100, and 200 kb) didn't alter the periodicity of the reduced genome, which agreed with the previous study on a different reduced genome MDS42 of a conserved periodicity (Ying et al., 2013, BMC Genomics). The following sentence was added in the Materials and Methods.

      (L460-461) “Note that altering the moving average did not change the max peak.”

      • Figures - the figures look great. Figure 7 needs a legend.

      Thank you. The following legend was added.

      (L774-777) “Three evolutionary strategies are proposed. Pink and blue arrowed lines indicate experimental evolution and genome reduction, respectively. The size of the open cycles represents the genome size. Black and grey indicate the ancestor and evolved genomes, respectively.”

      Response to Reviewer #2:

      Thank you for reviewing our manuscript and for your fruitful comments. We agree that our study leaned towards elaborating observed findings rather than explaining the detailed biological mechanisms. We focused on the genome-wide biological features rather than the specific biological functions. The underlying mechanisms indeed remained unknown, leaving the questions as you commented. We didn't perform the fitness assay on reconstituted (single and combinatorial) mutants because the research purpose was not to clarify the regulatory or metabolic mechanisms. It's why the RNA-Seq analysis provided the findings on genome-wide patterns and chromosomal view, which were supposed to be biologically valuable. We did understand your comments and complaints that the conclusions were biologically meaningless, as ALE studies that found the specific gene regulation or improved pathway was the preferred story in common, which was not the flow of the present study.

      For this reason, our revision may not address all these concerns. Considering your comments, we tried our best to revise the manuscript. The changes made were highlighted. We sincerely hope the revision and the following point-to-point response are acceptable.

      Major remarks:

      (1) The authors outlined the significance of ALE in genome-reduced organisms and important findings from published literature throughout the Introduction section. The description in L65-69, which I believe pertains to the motivation of this study, seems vague and insufficient to convey the novelty or necessity of this study i.e. it is difficult to grasp what aspects of genome-reduced biology that this manuscript intends to focus/find/address.

      Sorry for the unclear writing. The sentences were rewritten for clarity as follows.

      (L64-70) “Although the reduced growth rate caused by genome reduction could be recovered by experimental evolution, it remains unclear whether such an evolutionary improvement in growth fitness was a general feature of the reduced genome and how the genome-wide changes occurred to match the growth fitness increase. In the present study, we performed the experimental evolution with a reduced genome in multiple lineages and analyzed the evolutionary changes of the genome and transcriptome.”

      (2) What is the rationale behind the lineage selection described in Figure S1 legend "Only one of the four overnight cultures in the exponential growth phase (OD600 = 0.01~0.1) was chosen for the following serial transfer, highlighted in red."?

      The four wells (cultures of different initial cell concentrations) were measured every day, and only the well that showed OD600=0.01~0.1 (red) was transferred with four different dilution rates (e.g., 10, 100, 1000, and 10000 dilution rates). It resulted in four wells of different initial cell concentrations. Multiple dilutions promised that at least one of the wells would show the OD600 within the range of 0.01 to 0.1 after the overnight culture. They were then used for the next serial transfer. Fig. S1 provides the details of the experimental records. The experimental evolution was strictly controlled within the exponential phase, quite different from the commonly conducted ALE that transferred a single culture in a fixed dilution rate. Serial transfer with multiple dilution rates was previously applied in our evolution experiments and well described in Nishimura et al., 2017, mBio; Lu et al., 2022, Comm Biol; Kurokawa et al., 2022, Front Microbiol, etc. The following sentence was added in the Materials and Methods.

      (L344-345) “Multiple dilutions changing in order promised at least one of the wells within the exponential growth phase after the overnight culture.”

      (3) The measured growth rate of the end-point 'F2 lineage' shown in Figure S2 seemed comparable to the rest of the lineages (A1 to H2), but the growth rate of 'F2' illustrated in Figure 1B indicates otherwise (L83-84). What is the reason for the incongruence between the two datasets?

      Sorry for the unclear description. The growth rates shown in Fig. S2 were obtained during the evolution experiment using the daily transfer's initial and final OD600 values. The growth rates shown in Fig. 1B were obtained from the final population (Evos) growth assay and calculated from the growth curves (biological replication, N=4). Fig. 1B shows the precisely evaluated growth rates, and Fig. S2 shows the evolutionary changes in growth rates. Accordingly, the following sentence was added to the Results.

      (L84-87) “As the growth increases were calculated according to the initial and final records, the exponential growth rates of the ancestor and evolved populations were obtained according to the growth curves for a precise evaluation of the evolutionary changes in growth.”

      (4) Are the differences in growth rate statistically significant in Figure 1B?

      Eight out of nine Evos were significant, except F2. The sentences were rewritten and associated with the revised Fig. 1B, indicating significance.

      (L87-90) “The results demonstrated that most evolved populations (Evos) showed improved growth rates, in which eight out of nine Evos were highly significant (Fig. 1B, upper). However, the magnitudes of growth improvement were considerably varied, and the evolutionary dynamics of the nine lineages were somehow divergent (Fig. S2).”

      (5) The evolved lineages showed a decrease in their maximal optical densities (OD600) compared to the ancestral strain (L85-86). ALE could accompany changes in cell size and morphologies, (doi: 10.1038/s41586-023-06288-x; 10.1128/AEM.01120-17), which may render OD600 relatively inaccurate for cell density comparison. I suggest using CFU/mL metrics for the sake of a fair comparison between Anc and Evo.

      The methods evaluating the carrying capacity (i.e., cell density, population size, etc.) do not change the results. Even using CFU is unfair for the living cells that can not form colonies and unfair if the cell size changes. Optical density (OD600) provides us with the temporal changes of cell growth in a 15-minute interval, which results in an exact evaluation of the growth rate in the exponential phase. CFU is poor at recording the temporal changes of population changes, which tend to result in an inappropriate growth rate. Taken together, we believe that our method was reasonable and reliable. We hope you can accept the different way of study.

      (6) Please provide evidence in support of the statement in L115-119. i.e. statistical analysis supporting that the observed ratio of essential genes in the mutant pool is not random.

      The statistic test was performed, and the following sentence was added.

      (L139-141) “The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008).”

      (7) The assumption that "mutation abundance would correlate to fitness improvement" described in L120-122: "The large variety in genome mutations and no correlation of mutation abundance to fitness improvement strongly suggested that no mutations were specifically responsible or crucially essential for recovering the growth rate of the reduced genome" is not easy to digest, in the sense that (i) the effect of multiple beneficial mutations are not necessarily summative, but are riddled with various epistatic interactions (doi: 10.1016/j.mec.2023.e00227); (ii) neutral hitchhikers are of common presence (you could easily find reference on this one); (iii) hypermutators that accumulate greater number of mutations in a given time are not always the eventual winners in competition games (doi: 10.1126/science.1056421). In this sense, the notion that "mutation abundance correlates to fitness improvement" in L120-122 seems flawed (for your perusal, doi: 10.1186/gb-2009-10-10-r118).

      Sorry for the improper description and confusing writing, and thank you for the fruitful knowledge on molecular evolution. The sentence was deleted, and the following one was added.

      (L145-146) “Nevertheless, it was unclear whether and how these mutations were explicitly responsible for recovering the growth rate of the reduced genome.”

      (8) Could it be possible that the large variation in genome mutations in independent lineages results from a highly rugged fitness landscape characterized by multiple fitness optima (doi: 10.1073/pnas.1507916112)? If this is the case, I disagree with the notion in L121-122 "that no mutations were specifically responsible or crucially essential" It does seem to me that, for example, the mutations in evo A2 are specifically responsible and essential for the fitness improvement of evo A2 in the evolutionary condition (M63 medium). Fitness assessment of individual (or combinatorial) mutants reconstituted in the Ancestral background would be a bonus.

      Thank you for the intriguing thinking. The sentence was deleted. Please allow us to adapt your comment to the manuscript as follows.

      (L143-145) “The large variety of genome mutations fixed in the independent lineages might result from a highly rugged fitness landscape 38.”

      (9) L121-122: "...no mutations were specifically responsible or crucially essential for recovering the growth rate of the reduced genome". Strictly speaking, the authors should provide a reference case of wild-type E. coli ALE in order to reach definitive conclusions that the observed mutation events are exclusive to the genome-reduced strain. It is strongly recommended that the authors perform comparative analysis with an ALEed non-genome-reduced control for a more definitive characterization of the evolutionary biology in a genome-reduced organism, as it was done for "JCVI-syn3.0B vs non-minimal M. mycoides" (doi: 10.1038/s41586-023-06288-x) and "E. coli eMS57 vs MG1655" (doi: 10.1038/s41467-019-08888-6).

      The improper description was deleted in response to comments 7 and 8. The mentioned references were cited in the manuscript (refs 21 and 23). Thank you for the experimental advice. We are sorry that the comparison of wild-type and reduced genomes was not in the scope of the present study and will probably be reported soon in our future work.

      (10) L146-148: "The homeostatic periodicity was consistent with our previous findings that the chromosomal periodicity of the transcriptome was independent of genomic or environmental variation" A Previous study also suggested that the amplitudes of the periodic transcriptomes were significantly correlated with the growth rates (doi: 10.1093/dnares/dsaa018). Growth rates of 8/9 Evos were higher compared to Anc, while that of Evo F2 remained similar. Please comment on the changes in amplitudes of the periodic transcriptomes between Anc and each Evo.

      Thank you for the suggestion. The correlation between the growth rates and the amplitudes of chromosomal periodicity was statistically insignificant (p>0.05). It might be a result of the limited data points. Compared with the only nine data points in the present study, the previous study analyzed hundreds of transcriptomes associated with the corresponding growth rates, which are suitable for statistical evaluation. In addition, the changes in growth rates were more significant in the previous study than in the present study, which might influence the significance. It's why we did not discuss the periodic amplitude.

      (11) Please elaborate on L159-161: "It strongly suggested the essentiality mutation for homeostatic transcriptome architecture happened in the reduced genome.".

      Sorry for the improper description. The sentence was rewritten as follows.

      (L191-193) “The essentiality of the mutations might have participated in maintaining the homeostatic transcriptome architecture of the reduced genome.”

      (12) Is FPKM a valid metric for between-sample comparison? The growing consensus in the community adopts Transcripts Per Kilobase Million (TPM) for comparing gene expression levels between different samples (Figure 3B; L372-379).

      Sorry for the unclear description. The FPKM indicated here was globally normalized, statistically equivalent to TPM. The following sentence was added to the Materials and Methods.

      (L421-422) “The resulting normalized FPKM values were statistically equivalent to TPM.”

      (13) Please provide % mapped frequency of mutations in Table S3.

      They were all 100%. The partially fixed mutations were excluded in the present study. The following sentence was added to the caption of Table S3.

      (Supplementary file, p 9) “Note that the entire population held the mutations, i.e., 100% frequency in DNA sequencing.”

      (14) To my knowledge, M63 medium contains glucose and glycerol as carbon sources. The manuscript would benefit from discussing the elements that impose selection pressure in the M63 culture condition.

      Sorry for the missing information on M63, which contains 22 mM glucose as the only carbon source. The medium composition was added in the Materials and Methods, as follows.

      (L334-337) “In brief, the medium contains 62 mM dipotassium hydrogen phosphate, 39 mM potassium dihydrogen phosphate, 15 mM ammonium sulfate, 15 μM thiamine hydrochloride, 1.8 μM Iron (II) sulfate, 0.2 mM magnesium sulfate, and 22 mM glucose.”

      (15) The RNA-Seq datasets for Evo strains seemed equally heterogenous, just as their mutation profiles. However, the missing element in their analysis is the directionality of gene expression changes. I wonder what sort of biological significance can be derived from grouping expression changes based solely on DEGs, without considering the magnitude and the direction (up- and down-regulation) of changes? RNA-seq analysis in its current form seems superficial to derive biologically meaningful interpretations.

      We agree that most studies often discuss the direction of transcriptional changes. The present study aimed to capture a global view of the magnitude of transcriptome reorganization. Thus, the analyses focused on the overall features, such as the abundance of DEGs, instead of the details of the changes, e.g., the up- and down-regulation of DEGs. The biological meaning of the DEGs' overview was how significantly the genome-wide gene expression fluctuated, which might be short of an in-depth view of individual gene expression. The following sentence was added to indicate the limitation of the present analysis.

      (L199-202) “Instead of an in-depth survey on the directional changes of the DEGs, the abundance and functional enrichment of DEGs were investigated to achieve an overview of how significant the genome-wide fluctuation in gene expression, which ignored the details of individual genes.”

      Minor remarks

      (1) L41: brackets italicized "(E. coli)".

      It was fixed as follows.

      (L40) “… Escherichia coli (E. coli) cells …”

      (2) Figure S1. It is suggested that the x-axis of ALE monitor be set to 'generations' or 'cumulative generations', rather than 'days'.

      Thank you for the suggestion. Fig. S1 describes the experimental procedure, so the" day" was used. Fig. S2 presents the evolutionary process, so the "generation" was used, as you recommended here.

      (3) I found it difficult to digest through L61-64. Although it is not within the job scope of reviewers to comment on the language style, I must point out that the manuscript would benefit from professional language editing services.

      Sorry for the unclear writing. The sentences were revised as follows.

      (L60-64) “Previous studies have identified conserved features in transcriptome reorganization, despite significant disruption to gene expression patterns resulting from either genome reduction or experimental evolution 27-29. The findings indicated that experimental evolution might reinstate growth rates that have been disrupted by genome reduction to maintain homeostasis in growing cells.”

      (4) Duplicate references (No. 21, 42).

      Sorry for the mistake. It was fixed (leaving ref. 21).

      (5) Inconsistency in L105-106: "from two to 13".

      "From two to 13" was adopted from the language editing. It was changed as follows.

      (L119) “… from 2 to 13, …”

      Response to Reviewer #3:

      Thank you for reviewing our manuscript and for the helpful comments, which improved the strength of the manuscript. The recommended statistical analyses essentially supported the statement in the manuscript were performed, and those supposed to be the new results in the scope of further studies remained unconducted. The changes made in the revision were highlighted. We sincerely hope the revised manuscript and the following point-to-point response meet your concerns. You will find all your suggested statistic tests in our future work that report an extensive study on the experimental evolution of an assortment of reduced genomes.

      (1) Line 106 - "As 36 out of 45 SNPs were nonsynonymous, the mutated genes might benefit the fitness increase." This argument can be strengthened. For example, the null expectation of nonsynonymous SNPs should be discussed. Is the number of observed nonsynonymous SNPs significantly higher than the expected one?

      (2) Line 107 - "In addition, the abundance of mutations was unlikely to be related to the magnitude of fitness increase." Instead of just listing examples, a regression analysis can be added.

      Yes, it's significant. Random mutations lead to ~33% of nonsynonymous SNP in a rough estimation. Additionally, the regression is unreliable because there's no statistical significance between the number of mutations and the magnitude of fitness increase. Accordingly, the corresponding sentences were revised with additional statistical tests.

      (L123-129) “As 36 out of 45 SNPs were nonsynonymous, which was highly significant compared to random mutations (p < 0.01), the mutated genes might benefit fitness increase. In addition, the abundance of mutations was unlikely to be related to the magnitude of fitness increase. There was no significant correlation between the number of mutations and the growth rate in a statistical view (p > 0.1). Even from an individual close-up viewpoint, the abundance of mutations poorly explained the fitness increase.”

      (3) Line 114 - "They seemed highly related to essentiality, as 11 out of 49 mutated genes were essential (Table S3)." Here, the information mentioned in line 153 ("the ratio of essential to all genes (302 out of 3,290) in the reduced genome.") can be used. Then a statistical test for a contingency table can be used.

      (4) Line 117 - "the high frequency of the mutations fixed in the essential genes suggested the mutation in essentiality for fitness increase was the evolutionary strategy for reduced genome." What is the expected number of fixed mutations in essential genes vs non-essential genes? Is the observed number statistically significantly higher?

      Sorry for the improper and insufficient information on the essential genes. Yes, it's significant. The statistical test was additionally performed. The corresponding part was revised as follows.

      (L134-146) “They seemed highly related to essentiality7 (https://shigen.nig.ac.jp/ecoli/pec/genes.jsp), as 11 out of 49 mutated genes were essential (Table S3). Although the essentiality of genes might differ between the wild-type and reduced genomes, the experimentally determined 302 essential genes in the wild-type E. coli strain were used for the analysis, of which 286 were annotated in the reduced genome. The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008). As the essential genes were determined according to the growth35 and were known to be more conserved than nonessential ones 36,37, the high frequency of the mutations fixed in the essential genes was highly intriguing and reasonable. The large variety of genome mutations fixed in the independent lineages might result from a highly rugged fitness landscape 38. Nevertheless, it was unclear whether and how these mutations were explicitly responsible for recovering the growth rate of the reduced genome.”

      (5) The authors mentioned no overlapping in the single mutation level. Is that statistically significant? The authors can bring up what the no-overlap probability is given that there are in total x number of fixed mutations observed (either theory or simulation is good).

      Sorry, we feel confused about this comment. It's unclear to us why it needs to be statistically simulated. Firstly, the mutations were experimentally observed. The result that no overlapped mutated genes were detected was an Experimental Fact but not a Computational Prediction. We feel sorry that you may over-interpret our finding as an evolutionary rule, which always requires testing its reliability statistically. We didn't conclude that the evolution had no overlapped mutations. Secondly, considering 65 times random mutations happened to a ~3.9 Mb sequence, the statistical test was meaningful only if the experimental results found the overlapped mutations. It is interesting how often the random mutations cause the overlapped mutations in parallel evolutionary lineages while increasing the evolutionary lineages, which seems to be out of the scope of the present study. We are happy to include the analysis in our ongoing study on the experimental evolution of reduced genomes.

      (6) The authors mentioned no overlapping in the single mutation level. How about at the genetic level? Some fixed mutations occur in the same coding gene. Is there any gene with a significantly enriched number of mutations?

      No mutations were fixed in the same gene of biological function, as shown in Table S3. If we say the coding region, the only exception is the IS sequences, well known as the transposable sequences without genetic function. The following description was added.

      (L119-122) “The number of mutations largely varied among the nine Evos, from 2 to 13, and no common mutation was detected in all nine Evos (Table S3). A 1,199-bp deletion of insH was frequently found in the Evos (Table S3, highlighted), which well agreed with its function as a transposable sequence.”

      (7) Line 151-156- It seems like the authors argue that the expression level differences can be just explained by the percentage of essential genes that get fixed mutations. One further step for the argument could be to compare the expression level of essential genes with vs without fixed mutations. Also, the authors can compare the expression level of non-essential genes with vs without fixed mutations. And the authors can report whether the differences in expression level became insignificant after the control of the essentiality.

      It's our pleasure that the essentiality intrigued you. Thank you for the analytical suggestion, which is exciting and valuable for our studies. As only 11 essential genes were detected here and "Mutation in essentiality" was an indication but not the conclusion of the present study, we would like to apply the recommended analysis to the datasets of our ongoing study to demonstrate this statement. Thank you again for your fruitful analytical advice.

      (8) Line 169- "The number of DEGs partially overlapped among the Evos declined significantly along with the increased lineages of Evos (Figure 4B). " There is a lack of statistical significance here while the word "significantly" is used. One statistical test that can be done is to use re-sampling/simulation to generate a null expectation of the overlapping numbers given the DEGs for each Evo line and the total number of genes in the genome. The observed number can then be compared to the distribution of the simulated numbers.

      Sorry for the inappropriate usage of the term. Whether it's statistically significant didn't matter here. The word "significant" was deleted as follows.

      (L205--206) “The number of DEGs partially overlapped among the Evos declined along with the increased lineages of Evos (Fig. 4B).”

      (9) Line 177-179- "In comparison,1,226 DEGs were induced by genome reduction. The common DEGs 177 of genome reduction and evolution varied from 168 to 540, fewer than half of the DEGs 178 responsible for genome reduction in all Evos" Is the overlapping number significantly lower than the expectation? The hypergeometric test can be used for testing the overlap between two gene sets.

      There's no expectation for how many DEGs were reasonable. Not all numbers experimentally obtained are required to be statistically meaningful, which is commonly essential in computational and data science.

      (10) The authors should give more information about the ancestral line used at the beginning of experimental evolution. I guess it is one of the KHK collection lines, but I can not find more details. There are many genome-reduced lines. Why is this certain one picked?

      Sorry for the insufficient information on the reduced genome used for the experimental evolution. The following descriptions were added in the Results and the Materials and Methods, respectively.

      (L75-79) “The E. coli strain carrying a reduced genome, derived from the wild-type genome W3110, showed a significant decline in its growth rate in the minimal medium compared to the wild-type strain 13. To improve the genome reduction-mediated decreased growth rate, the serial transfer of the genome-reduced strain was performed with multiple dilution rates to keep the bacterial growth within the exponential phase (Fig. S1), as described 17,20.”

      (L331-334) “The reduced genome has been constructed by multiple deletions of large genomic fragments 58, which led to an approximately 21% smaller size than its parent wild-type genome W3110.”

      (11) How was the saturated density in Figure 1 actually determined? In particular, the fitness assay of growth curves is 48h. But it seems like the experimental evolution is done for ~24 h cycles. If the Evos never experienced a situation like a stationary phase between 24-48h, and if the author reported the saturated density 48 h in Figure 1, the explanation of the lower saturated density can be just relaxation from selection and may have nothing to do with the increase of growth rate.

      Sorry for the unclear description. Yes, you are right. The evolution was performed within the exponential growth phase (keeping cell division constant), which means the Evos never experienced the stationary phase (saturation). The final evolved populations were subjected to the growth assay to obtain the entire growth curves for calculating the growth rate and the saturated density. Whether the decreased saturated density and the increased growth rate were in a trade-off relationship remained unclear. The corresponding paragraph was revised as follows.

      (L100-115) “Intriguingly, a positive correlation was observed between the growth fitness and the carrying capacity of the Evos (Fig. 1D). It was somehow consistent with the positive correlations between the colony growth rate and the colony size of a genome-reduced strain 11 and between the growth rates and the saturated population size of an assortment of genome reduced strains 13. Nevertheless, the negative correlation between growth rate and carrying capacity, known as the r/K selection30,31 was often observed as the trade-off relationship between r and K in the evolution and ecology studies 32 33,34. As the r/K trade-off was proposed to balance the cellular metabolism that resulted from the cost of enzymes involved 34, the deleted genes might play a role in maintaining the metabolism balance for the r/K correlation. On the other hand, the experimental evolution (i.e., serial transfer) was strictly performed within the exponential growth phase; thus, the evolutionary selection was supposed to be driven by the growth rate without selective pressure to maintain the carrying capacity. The declined carrying capacity might have been its neutral "drift" but not a trade-off to the growth rate. Independent and parallel experimental evolution of the reduced genomes selecting either r or K is required to clarify the actual mechanisms.”

      (12) What annotation of essentiality was used in this paper? In particular, the essentiality can be different in the reduced genome background compared to the WT background.

      Sorry for the unclear definition of the essential genes. They are strictly limited to the 302 essential genes experimentally determined in the wild-type E coli strain. Detailed information can be found at the following website: https://shigen.nig.ac.jp/ecoli/pec/genes.jsp. We agree that the essentiality could differ between the WT and reduced genomes. Identifying the essential genes in the reduced genome will be an exhaustedly vast work. The information on the essential genes defined in the present study was added as follows.

      (L134-139) “They seemed highly related to essentiality7 (https://shigen.nig.ac.jp/ecoli/pec/genes.jsp), as 11 out of 49 mutated genes were essential (Table S3). Although the essentiality of genes might differ between the wild-type and reduced genomes, the experimentally determined 302 essential genes in the wild-type E. coli strain were used for the analysis, of which 286 were annotated in the reduced genome.”

      (13) The fixed mutations in essential genes are probably not rarely observed in experimental evolution. For example, fixed mutations related to RNA polymerase can be frequently seen when evolving to stressful environments. I think the author can discuss this more and elaborate more on whether they think these mutations in essential genes are important in adaptation or not.

      Thank you for your careful reading and the suggestion. As you mentioned, we noticed that the mutations in RNA polymerases (rpoA, rpoB, and rpoD) were identified in three Evos. As they were not shared across all Evos, we didn't discuss the contribution of these mutations to evolution. Instead of the individual functions of the mutated essential gene functions, we focused on the enriched gene functions related to the transcriptome reorganization because they were the common feature observed across all Evos and linked to the whole metabolic or regulatory pathways, which are supposed to be more biologically reasonable and interpretable. The following sentence was added to clarify our thinking.

      (L268-273) “In particular, mutations in the essential genes, such as RNA polymerases (rpoA, rpoB, rpoD) identified in three Evos (Table S3), were supposed to participate in the global regulation for improved growth. Nevertheless, the considerable variation in the fixed mutations without overlaps among the nine Evos (Table 1) implied no common mutagenetic strategy for the evolutionary improvement of growth fitness.”

      (14) In experimental evolution to new environments, several previous literature also show that long-term experimental evolution in transcriptome is not consistent or even reverts the short-term response; short-term responses were just rather considered as an emergency plan. They seem to echo what the authors found in this manuscript. I think the author can refer to some of those studies more and make a more throughput discussion on short-term vs long-term responses in evolution.

      Thank you for the advice. It's unclear to us what the short-term and long-term responses referred to mentioned in this comment. The "Response" is usually used as the phenotypic or transcriptional changes within a few hours after environmental fluctuation, generally non-genetic (no mutation). In comparison, long-term or short-term experimental "Evolution" is associated with genetic changes (mutations). Concerning the Evolution (not the Response), the long-term experimental evolution (>10,000 generations) was performed only with the wild-type genome, and the short-term experimental evolution (500~2,000 generations) was more often conducted with both wild-type and reduced genomes, to our knowledge. Previous landmark studies have intensively discussed comparing the wild-type and reduced genomes. Our study was restricted to the reduced genome, which was constructed differently from those reduced genomes used in the reported studies. The experimental evolution of the reduced genomes has been performed in the presence of additional additives, e.g., antibiotics, alternative carbon sources, etc. That is, neither the genomic backgrounds nor the evolutionary conditions were comparable. Comparison of nothing common seems to be unproductive. We sincerely hope the recommended topics can be applied in our future work.

      Some minor suggestions

      • Figures S3 & Table S2 need an explanation of the abbreviations of gene categories.

      Sorry for the missing information. Figure S3 and Table S3 were revised to include the names of gene categories. The figure was pasted followingly for a quick reference.

      Author response image 3.

      • I hope the authors can re-consider the title; "Diversity for commonality" does not make much sense to me. For example, it can be simply just "Diversity and commonality."

      Thank you for the suggestion. The title was simplified as follows.

      (L1) “Experimental evolution for the recovery of growth loss due to genome reduction.”

      • It is not easy for me to locate and distinguish the RNA-seq vs DNA-seq files in DRA013662 at DDBJ. Could you make some notes on what RNA-seq actually are, vs what DNA-seq files actually are?

      Sorry for the mistakes in the DRA number of DNA-seq. DNA-seq and RNA-seq were deposited separately with the accession IDs of DRA013661 and DRA013662, respectively. The following correction was made in the revision.

      (L382-383) “The raw datasets of DNA-seq were deposited in the DDBJ Sequence Read Archive under the accession number DRA013661.”

    1. Author response:

      eLife assessment

      In this valuable study, Kumar et al., provide evidence suggesting that the p130Cas drives the formation of condensates that sprout from focal adhesions to cytoplasm and suppress translation. Pending further substantiation, this study was found to be likely to provide previously unappreciated insights into the mechanisms linking focal adhesions to the regulation of protein synthesis and was thus considered to be of broad general interest. However, the evidence supporting the proposed model was incomplete; additional evidence is warranted to substantiate the relationship between p130Cas condensates and mRNA translation and establish corresponding functional consequences.

      We thank the Elife editorial team for their positive assessment of the broad significance of our manuscript. We fully agree that the functional consequences need to be explored in more detail. We feel that many of the criticisms are valid points that are not easily addressed via available tools, thus, should be considered limitations of present approaches. We hope that readers appreciate that identification of a new class of liquid-liquid phase separations calls for much more work to fully explore their characteristics, regulation and function, which will likely advance many areas of cell biology and perhaps even medicine.

      Reviewer #1 (Public Review):

      Summary:

      The authors demonstrated the phenomenon of p130Cas, a protein primarily localized at focal adhesions, and its formation of condensates. They identified the constituents within the condensates, which include other focal adhesion proteins, paxillin, and RNAs. Furthermore, they proposed a link between p130Cas condensates and translation.

      Strengths:

      Adhesion components undergo rapid exchange with the cytoplasm for some unclear biological functions. Given that p130Cas is recognized as a prominent mechanical focal adhesion component, investigating its role in condensate formation, particularly its impact on the translation process, is intriguing and significant.

      We thank the reviewer for recognizing the functional significance of the work.

      Weaknesses:

      The authors identified the disordered region of p130Cas and investigated the formation of p130Cas condensate. They attempted to demonstrate that p130Cas condensates inhibit translation, but the results did not fully support this assertion. There are several comments below:

      (1) Despite isolating p130Cas-GFP protein using GFP-trap beads, the authors cannot conclusively eliminate the possibility of isolating p130Cas from focal adhesions. While the characterization of the GFP-tagged pulls can reveal the proteins and RNAs associated with p130Cas, they need to clarify their intramolecular mechanism of localization within p130Cas droplets. Whether the protein condensates retain their liquid phase or these GFP-p130Cas pulls represent protein aggregate remains uncertain.

      We agree, the isolation from cell lysates does not distinguish between focal adhesions and cytoplasmic LLPS. We note that p130Cas in focal adhesions also appears to be in LLPS. But there are no methods available to isolate them separately. We acknowledge this is a limitation of the study.

      (2) The authors utilized hexanediol and ammonium acetate to highlight the phenomenon of p130Cas condensates. Although hexanediol is an inhibitor for hydrophobic interactions and ammonium acetate is a salt, a more thorough explanation of the intramolecular mechanisms underlying p130Cas protein-protein interaction is required. Additionally, given that the size of p130Cas condensates can exceed >100um2, classification is needed to differentiate between p130Cas condensates and protein aggregation.

      Ammonium acetate, which works by promoting hydrophobic interactions and weak Van der Waals forces, has been widely used in phase separation studies to change ionic strength without altering intracellular pH. Conversely, hexanediol weakens hydrophobic/ Van der Walls interactions that commonly mediate phase separation of IDRs. In the case of p130Cas, the multiple tyrosines and within the scaffolding domain are obvious targets. If the reviewer is asking us to resolve the detailed hydrophobic interactions within the scaffolding domain, this is far beyond the scope of the current paper.

      Protein aggregates are defined by their characteristics (e.g irreversibility, departure from spherical) not by size. Older, larger droplets remain circular and show slower but still measurable rates of exchange. Moreover, droplets are essentially absent after trypsinizing and replating cells. All these results argue against aggregates.

      (3) The connection between p130Cas condensates and translation inhibition appears tenuous. The data only suggests a correlation between p130Cas expression and translation inhibition. Further evidence is required to bolster this hypothesis.

      The optogenetic experiment shows that triggering LLPS by dimerizing p130Cas results in inhibition of translation. This is a causal not a correlative experiment. The reviewer may be thinking that dimerizing p130Cas could stimulate focal adhesion signaling, activating FAK or a src family kinase or other signals. However, none of these signals has been linked to inhibition of cell growth or migration. Thus, we agree that this is a limitation but consider it a low probability mechanism.

      Reviewer #2 (Public Review):

      Summary:

      In this article, Kumar et al., report on a previously unappreciated mechanism of translational regulation whereby p130Cas induces LLPS condensates that then traffic out from focal adhesion into the cytoplasm to modulate mRNA translation. Specifically, the authors employed EGFP-tagged p130Cas constructs, endogenous p130Cas, and p130Cas knockouts and mutants in cell-based systems. These experiments in conjunction with various imaging techniques revealed that p130Cas drives assembly of LLPS condensates in a manner that is largely independent of tyrosine phosphorylation. This was followed by in vitro EGFP-tagged p130Cas-dependent induction of LLPS condensates and determination of their composition by mass spectrometry, which revealed enrichment of proteins involved in RNA metabolism in the condensates. The authors excluded the plausibility that p130Cas-containing condensates co-localize with stress granules or p-bodies. Next, the authors determined mRNA compendium of p130Cas-containing condensates which revealed that they are enriched in transcripts encoding proteins implicated in cell cycle progression, survival, and cell-cell communication. These findings were followed by the authors demonstrating that p130Cas-containing condensates may be implicated in the suppression of protein synthesis using puromycylation assay. Altogether, it was found that this study significantly advances the knowledge pertinent to the understanding of molecular underpinnings of the role of p130Cas and more broadly focal adhesions on cellular function, and to this end, it is likely that this report will be of interest to a broad range of scientists from a wide spectrum of biomedical disciplines including cell, molecular, developmental and cancer biologists.

      Strengths:

      Altogether, this study was found to be of potentially broad interest inasmuch as it delineates a hitherto unappreciated link between p130Cas, LLPS, and regulation of mRNA translation. More broadly, this report provides unique molecular insights into the previously unappreciated mechanisms of the role of focal adhesions in regulating protein synthesis. Overall, it was thought that the provided data sufficiently supported most of the authors' conclusions. It was also thought that this study incorporates an appropriate balance of imaging, cell and molecular biology, and biochemical techniques, whereby the methodology was found to be largely appropriate.

      We thank reviewer for this positive assessment.

      Weaknesses:

      Two major weaknesses of the study were noted. The first issue is related to the experiments establishing the role of p130Cas-driven condensates in translational suppression, whereby it remained unclear whether these effects are affecting global mRNA translation or are specific to the mRNAs contained in the condensates. Moreover, some of the results in this section (e.g., experiments using cycloheximide) may be open to alternative interpretation. The second issue is the apparent lack of functional studies, and although the authors speculate that the described mechanism is likely to mediate the effects of focal adhesions on e.g., quiescence, experimental testing of this tenet was lacking.

      We appreciate the reviewer’s insights. Assessing translational inhibition for specific genes rather than global measurement of translation is an important direction for future work.

      Regarding the cycloheximide experiments, we are unsure what the reviewer means. We used it as a control for puromycin labeling but this is a very standard approach. It seems more likely that the question concerns Fig 5G, where we used it to sequester mRNAs on ribosomes to deplete from other pools. In this case, p130cas condensates decrease after 2 minutes. The reviewer may be suggesting that this effect could be due to blocked translation per se and loss of short-lived proteins. We acknowledge that this is possible but given the very rapid effect (2 min), we think it unlikely.

      Lastly, we agree with the reviewer that further functional studies in quiescence or senescence are warranted; however, these are extensive, open-ended studies and we will not be able to include them as part of the current paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this valuable study, the authors investigate the transcriptional landscape of tuberculous meningitis, revealing important molecular differences contributed by HIV co-infection. Whilst some of the evidence presented is compelling, the bioinformatics analysis is limited to a descriptive narrative of gene-level functional annotations, which are somewhat basic and fail to define aspects of biology very precisely. Whilst the work will be of broad interest to the infectious disease community, validation of the data is critical for future utility.

      We appreciate with eLife’s positive assessment, although we challenge the conclusion that we ‘fail to define aspects of biology very precisely’. Our stated objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis and the eLife assessment affirms we have investigated ‘the transcriptional landscape of tuberculous meningitis’. To more precisely define aspects of the biology will require another study with different design and methods.

      Reviewer #1 (Public Review):

      Summary:

      Tuberculous meningitis (TBM) is one of the most severe forms of extrapulmonary TB. TBM is especially prevalent in people who are immunocompromised (e.g. HIV-positive). Delays in diagnosis and treatment could lead to severe disease or mortality. In this study, the authors performed the largest-ever host whole blood transcriptomics analysis on a cohort of 606 Vietnamese participants. The results indicated that TBM mortality is associated with increased neutrophil activation and decreased T and B cell activation pathways. Furthermore, increased angiogenesis was also observed in HIV-positive patients who died from TBM, whereas activated TNF signaling and down-regulated extracellular matrix organisation were seen in the HIV-negative group. Despite similarities in transcriptional profiles between PTB and TBM compared to healthy controls, inflammatory genes were more active in HIV-positive TBM. Finally, 4 hub genes (MCEMP1, NELL2, ZNF354C, and CD4) were identified as strong predictors of death from TBM.

      Strengths:

      This is a really impressive piece of work, both in terms of the size of the cohort which took years of effort to recruit, sample, and analyse, and also the meticulous bioinformatics performed. The biggest advantage of obtaining a whole blood signature is that it allows an easier translational development into a test that can be used in the clinical with a minimally invasive sample. Furthermore, the data from this study has also revealed important insights into the mechanisms associated with mortality and the differences in pathogenesis between HIV-positive and HIV-negative patients, which would have diagnostic and therapeutic implications.

      Weaknesses:

      The data on blood neutrophil count is really intriguing and seems to provide a very powerful yet easy-to-measure method to differentiate survival vs. death in TBM patients. It would be quite useful in this case to perform predictive analysis to see if neutrophil count alone, or in combination with gene signature, can predict (or better predict) mortality, as it would be far easier for clinical implementation than the RNA-based method. Moreover, genes associated with increased neutrophil activation and decreased T cell activation both have significantly higher enrichment scores in TBM (Figure 9) and in morality (Figure 8). While I understand the basis of selecting hub genes in the significant modules, they often do not represent these biological pathways (at least not directly associated in most cases). If genes were selected based on these biologically relevant pathways, would they have better predictive values?

      We conducted a sensitivity analysis including blood neutrophil as a potential predictor in the multivariate Cox elastic-net regression model for important predictor selection (Table S14). In this analysis, all six selected important predictors (genes and clinical risk factors) identified in the original analysis (Table S13) were also selected, together with blood neutrophil number. Additionally, we evaluated the predictive value of blood neutrophil alone, which demonstrated poor performance, with an optimism-corrected AUC of 0.63 for all TBM, 0.67 for HIV-negative TBM, and 0.70 for HIV-positive TBM. Even when combined with identified gene signatures, blood neutrophil did not improve the overall performance of predictive model (optimism-corrected AUC of 0.79 for all TBM, 0.76 for HIV-negative TBM, and 0.80 for HIV-positive). These results indicate that identified hub genes exhibit better predictive values compared to blood neutrophil alone or in combination. These findings have been incorporated into our manuscript results.

      To test whether pathway representative genes have better predictive values than hub genes, we included all these genes in the analysis for important predictor selection. Pathway representative genes comprised ANXA3 and CXCR2 representing neutrophil activation and IL1b representing acute inflammatory response. We observed that all hub genes (MCEMP1, NELL2, ZNF354C, and CD4) consistently emerged as the most important genes with the highest selection in the models, compared to the rest, in both the HIV-negative TBM and HIV-positive TBM cohorts. Additionally, these identified hub genes were still selected when testing together with other hub genes representing relevant biological pathways associated with TBM mortality, such as CYSTM1 involved in neutrophil activation, TRAF5 involved in NF-kappa B signaling pathway, CD28 and TESPA1 involved in T cell receptor signaling. These results show that selected genes based on known biologically relevant pathways did not give better predictive values than the identified hub genes in the significant modules.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the analysis of blood transcriptomic data from patients with TB meningitis, with and without HIV infection, with some comparison to those of patients with pulmonary tuberculosis and healthy volunteers. The objectives were to describe the comparative biological differences represented by the blood transcriptome in TBM associated with HIV co-infection or survival/mortality outcomes and to identify a blood transcriptional signature to predict these outcomes. The authors report an association between mortality and increased levels of acute inflammation and neutrophil activation, but decreased levels of adaptive immunity and T/B cell activation. They propose a 4-gene prognostic signature to predict mortality.

      Strengths:

      Biological evaluations of blood transcriptomes in TB meningitis and their relationship to outcomes have not been extensively reported previously.

      The size of the data set is a major strength and is likely to be used extensively for secondary analyses in this field of research.

      Weaknesses:

      The bioinformatic analysis is limited to a descriptive narrative of gene-level functional annotations curated in GO and KEGG databases. This analysis cannot be used to make causal inferences. In addition, the functional annotations are limited to 'high-level' terms that fail to define biology very precisely. At best, they require independent validation for a given context. As a result, the conclusions are not adequately substantiated. The identification of a prognostic blood transcriptomic signature uses an unusual discovery approach that leverages weighted gene network analysis that underpins the bioinformatic analyses. However, the main problem is that authors seem to use all the data for discovery and do not undertake any true external validation of their gene signature. As a result, the proposed gene signature is likely to be overfitted to these data and not generalisable. Even this does not achieve significantly better prognostic discrimination than the existing clinical scoring.

      As explained in response to the eLife assessment, our objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis. We agree that ‘This analysis cannot be used to make causal inferences’: that would require different study design and approaches. The proposed gene signature has higher AUC values than the existing clinical model alone or in combination with clinical risk factors (Table 4). We agree that independent validation of the gene signature will be a crucial next step for future utility. We have performed qPCR in another sample set, and have added these results in the revision (Table 4 and supplementary figure S8)

      Reviewer #1 (Recommendations For The Authors):

      I have a few additional comments most of which are relatively minor:

      (1) Can the authors please clarify if all the PTB cases are also HIV-negative?

      This has been added to the methods section.

      (2) For Table 1, can the authors please list the total number of patients with microbiologically confirmed TB regardless of the methods used? And for the two TBM groups, was the positive microbiology based on CSF findings?

      The total number of patients with microbiologically confirmed TB was presented in Table 2 in definite TBM group, which was microbiologically confirmed TB diagnosed using microscopy, culture, and Xpert testing in cerebrospinal fluid (CSF) samples. We have updated the note in Table 2 to provide clarity on the definition.

      (3) How was the discovery and validation set selected? Was it based on randomisation?

      We randomly split TBM data into two datasets, a discovery cohort (n=142) and a validation cohort (n=139) with a purpose to ensure reproducibility of data analysis. We described this in the methods section.

      (4) Line 107 can be better clarified by stating that the overall 3-month mortality rate is 21.7% for TBM regardless of HIV status.

      Thank you, we have restated this sentence in the results section.

      (5) The authors stated that samples were collected at enrolment when patients would have received less than 6 days of anti-tubercular treatment. Is there information on the median and IQR on the number of days that the patients would have received Rx, especially between the groups? Did the authors control for this variable when analysing for DEGs?

      One of criteria to enroll participants in LAST-ACT and ACT-HIV trials is that they must receive less than 6 consecutive days of two or more drugs active against M. tuberculosis. However, the information of the days that the patients would have received Rx was not recorded and we could not control this variable when performing differential expression analysis for DEGs. This has been clarified further in the methods section: ‘The samples were taken at enrollment, when patients could not have received more than 6 consecutive days of two or more drugs active against M. tuberculosis.’

      (6) I am a little bit concerned with the reads mapping accuracy (57%) to the human genome, which is fairly low. Did the authors investigate the reasons behind this low accuracy?

      Thank you. It was indeed a typo. We have corrected it in the results section.

      (7) On Tables S2-S4, can the authors please clarify what the last column (labelled as "B") shows?

      Tables S2-S4 now have been changed to S3-S5. We have updated the legend of these tables to provide clarification regarding the meaning of the last column.

      Reviewer #2 (Recommendations For The Authors):

      If the authors wish to revise their manuscript, I suggest the following amendments:

      (1) Provide a consort diagram for the selection of samples included in the present analysis (from parent study cohorts), allocation to test and validation splits for bioinformatics analysis, and outcomes.

      We have provided our consort diagram in supplementary Figure S10.

      (2) Provide details of inclusion criteria for pulmonary TB cohort, and how samples from this cohort were selected for inclusion in the present analysis. Please clarify whether this cohort excluded HIV-positive participants by design or by chance.

      The inclusion criteria for the pulmonary TB cohort were described in the methods section. Due to the very low prevalence of HIV in this prospective observational study, HIV-positive participants were excluded. We have clarified in the amended manuscript that the pulmonary TB cohort only included HIV-negative participants.

      (3) Baseline characteristics of HIV-positive participants (Table 1) should include CD4 count, HIV viral load, and whether anti-retroviral therapy was naïve or experienced.

      We have included pre-treatment CD4 cell count, information on anti-retroviral therapy, and HIV viral load data in Table 1, as well as described these information in the results section.

      (4) I note that the TBM samples were derived from RCTs of adjunctive steroid therapy, but not stratified in the present analysis by treatment arm allocation. Clearly, this may affect the survival/mortality outcomes that are the central focus of this manuscript. Therefore, they should be included in the models for differential gene expression analysis and prognostic signature discovery. To do so, the authors may need to wait until they are able to unblind the trial metadata.

      With permission from the trial investigators, we were able to adjust the analyses for treatment with corticosteroids. The investigators remained blind to the allocation and we have not reported any direct effects of corticosteroids on outcome – such an analysis could only be done once the LAST-ACT trial has been reported (which won’t be until the end of 2024). Treatment outcome and effect were blinded by extracting only the fold change difference between survival and death in the linear regression model, in which gene expression was outcome and survival and treatment were covariates.

      (5) I understood from the methods (lines 460-461) that batch correction of the RNAseq data was necessary. However, it is not clear how the samples were batched. PCA of the transcriptomes before and after batch correction with batch and study group labels should be provided. I would also advocate for a sensitivity analysis to check the robustness of the main findings without batch correction. I assume Fig2A represents batch-corrected data, but this is not clear.

      We have now added information about the RNA sequencing batch and the batch correction approach, analyses and data visualizations utilized batch-corrected data in the methods section. We have also updated results related to batch correction in Fig. 2A and Supplementary Figure S9.

      (6) I would encourage the authors to include a differential gene expression analysis to directly compare the transcriptome of TBM to that of pulmonary TB. I think it would add additional value to their focus on describing the transcriptome in TBM.

      We thank for reviewer’s suggestion. Conducting differential gene expression analysis to compare the transcriptome of TBM with that of PTB is beyond the scope of this manuscript and we will examine this question separately.

      (7) I don't really understand the purpose of splitting their data set into test and validation for the purposes of showing that WGCNA analysis is mostly reproduced in the two halves of the data. I would advocate that they scrap this approach to maximise the statistical power of their analysis in the descriptive work.

      As mentioned in response to reviewer #1 in question #3, the purpose of splitting data is to ensure the reproducibility of the data analysis as suggested by Langfelder et al. (PMID: 21283776). This approach served two purposes: (i) to affirm the existence of functional modules in an independent cohort and (ii) to validate the association of interested modules or their hub genes with survival outcomes.

      (8) The authors should soften the confidence in their interpretation of the GO/KEGG annotations of WGCNA modules. At least, they should include a paragraph that explicitly details the limitations of their analyses, including (i) the accuracy GO/KEGG annotations are not validated in this context (if at all), (ii) that none of the data can be used to make causal inferences and (iii) that peripheral blood assessments that are obviously impacted by changes in cellular composition of peripheral blood do not necessarily reflect immunopathogenesis at the site of disease - in fact if circulating cells are being recruited to the site of disease or other immune compartments, then quite the opposite interpretations may be true.

      We appreciate the reviewer's comment. (i) In our analysis, we initially confirmed the existence of Weighted Gene Co-expression Network Analysis (WGCNA) modules in discovery cohort and validated the association of these modules with mortality outcomes in validation cohort. We then applied GO/KEGG annotations to define the biological functions involved in WGCNA modules. Finally, we performed Qusage analysis to directly test the association of top-hit pathways of each WGCNA module with mortality outcomes (see supplementary S6). This analysis approach helped to identify and validate modules and biological pathways associated with TBM mortality in this context, avoiding potential false positives in GO/KEGG annotations of WGCNA modules. (ii) We agree with the assessment that 'This analysis cannot be used to make causal inferences,' as that would require a different study design and approach. (iii) The focus of this study is to investigate the pathogenesis of TBM in the systemic immune system. We have highlighted this focus in the title and the aim of the manuscript.

      (9) For the prognostic signature discovery and validation, I strongly recommend the authors include more robust validation. For example, to undertake an 80:20 split for sequential discovery (for feature selection and derivation of a prognostic model), followed by validation of a 'locked' model in data that made no contribution to discovery. In two separate sensitivity analyses. I also suggest they split their dataset (i) by treatment allocation in the RCT and (ii) by HIV status. In addition, their method for feature selection has to be clearer- precisely how they select hub genes from their WGCNA analysis as candidate predictors is not explained. Since this is such a prominent output of their manuscript, the results of this analysis should really be included in the main manuscript, and all performance metrics for discrimination should include confidence intervals.

      Employing an 80:20 split for training and testing models is a good approach for an internal validation. However, we addressed the issue of overestimating the performance of a prognostic model by bootstrapping sampling approach proposed by Steyerberg et al. (PMID: 11470385). This approach has been proven to provide stable estimates with low bias. The overall model performance for discrimination, reported in our manuscript, was corrected for “optimism” to ensure internal validity. This adjustment was achieved through a 1000-times bootstrapping approach, which effectively accounted for estimation uncertainty. As such, there is no need to present confidence intervals for these metrics.

      Moreover, in our revision, to confirm prognostic signatures independently, we have evaluated the predictive value of identified gene signatures using qPCR in another set of samples. The results have been added in Table 4, supplementary Figure S8 and the results section.

      For the reasons given above (comment 4), we are unable to split our dataset by treatment allocation in this analysis. But as described, we have adjusted the analysis for corticosteroid treatment. Once the primary results of the LAST ACT trial have been published, we will examine the impact of corticosteroids on TBM pathophysiology and outcomes, seeking to better understand the mechanisms by which steroids have their therapeutic effects.

      Given the difference in pathogenesis and immune response by HIV-coinfection, we stratified our analysis by HIV status. As the reviewer’s suggestion, we have provided additional details in the methods section regarding the selection of hub genes from associated WGCNA modules and the feature selection process for predictive modeling.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Below is our point-by-point reply to the reviewer's comments __

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      PNKP is one of critical end-processing enzymes for DNA damage repair, mainly base excision & single strand break repair, and double strand break repair to a certain extent. This protein has dual enzyme function: 3' phosphatase and 5' kinase to make DNA ends proper for ligation. It has been demonstrated that PTM of PNKP (e.g., S114, S126), particularly phosphorylation by either ATM or DNAPK, is important for PNKP function in DNA damage repair. The authors found a new phosphorylation site, T118, of PNKP which might be modified by CDK1 or 2 during S phase. This modification of phosphorylation is involved in maintenance and stability of the lagging strand, particularly Okazaki fragments. Loss of this phosphorylation could result in increased single strand gaps, accelerated speed of fork progression, and eventually genomic instability. And for this process, PNKP enzyme activity is not that important. And the authors concluded that PNKP T118 phosphorylation is important for lagging strand stability and DNA damage repair.

      Major comments

      In general, enzymes have protein interactions with its/their substrates. If PNKP is phosphorylated by either/both CDK1/2, the protein interaction between these would be expected. However, the authors did not provide any protein interactions in PNKP and CDKs. *Thank you for your suggestion. We will perform GFP-pulldown assays using cell extracts from HEK293 cells expressing GFP-WT-PNKP, GFP-T118A-PNKP. And then to confirm the interaction of PNKP and CDK1/2, we will blot with CDK1 and CDK2 antibodies. *

      It is not clear how T118 phosphorylation is involved in DNA damage repair itself as the authors suggested. The data presenting the involvement of T118 phosphorylation in this mechanism are limited. This claim opens more questions than answers. CDK1/2 still phosphorylates T118 in this DNA damage repair process? What would happen to DNA damage repair in which PNKP involves outside of S phase in terms of T118 phosphorylation?

      Thank you for your comment. We have investigated how T118 phosphorylation is important in DNA damage repair by several experiments. In figure S8, we tested SSB and DSB repair abilities of PNKP KO cells expressing PNKP T118A mutant, in which PNKP T118 phosphorylation has critical roles in both SSB and DSB repair pathways. Interestingly, the result of SSB repair assay (figure S8A & B) may indirectly indicate that T118 phosphorylation is important for SSB repair throughout cell cycle as these SSBs are instantly induced by IR exposure and recovered only for 30 mins that is presumably not enough time for cells to go through cell cycle. Along with the repair abilities, we also analyzed a recruitment kinetics/ability to DNA damage in PNKP T118A and T118D mutants using laser micro-irradiation assay in figure S9. This result indicates that the phosphorylation of PNKP at T118 is controlling its recruitment to at least laser-induced DNA damage sites. Moreover, we have analyzed recruitment of PNKP to a single-strand DNA gap structure, which mimics intermediates of some DNA repair pathways and incomplete Okazaki fragment maturation, using cell extracts from PNKP KO cells expressing PNKP T118A and T118D mutants and biochemical assay in figure 4H. This assay is much cleaner and shows that loss of T118 phosphorylation impairs PNKP recruitment to the ssDNA gap structure. We believe that these data sufficiently support our model that the phosphorylation of T118 on PNKP is involved in DNA repair in general. However, we agree with that we have not yet directly tested DNA repair ability of PNKP T118A in outside of S-phase. Therefore, in addition to these data, we will perform H2O2-induced SSB and IR-induced DSB repair assay using EdU (S phase) pulse labelling in PNKP KO cells expressing PNKP T118A mutant, then we will measure the ADP-ribose intensity and pH2AX foci in EdU negative cells (outside of S phase as the reviewer suggested).

      Along the same line with #1/2 comments, the recruitment of PNKP to the damage sites is XRCC1 dependent. Is not clear whether PNKP recruitment to gaps on the lagging strand is XRCC1 independent or dependent. It might be interesting to examine (OPTIONAL)

      *Thank you for an important suggestion. XRCC1 acts as a scaffold of PNKP and is required for recruitment of PNKP for canonical SSB repair, although we propose that PNKP is involved in two pathways in DNA replication: PARP1-XRCC1-dependent ssDNA gap filling pathway and Okazaki fragment maturation pathway working with FEN1. It is still important to address how XRCC1 is required for PNKP recruitment to the single-strand gaps on nascent DNA. Therefore, we will perform iPOND analysis in XRCC1 knock down + GFP-WT-PNKP expressed HEK293 cells. *

      Minor comments

      In results: 'Generation of PNKP knock out U2OS cell line' - In figure S2A; There are no data regarding diminishing the phosphorylation of g-H2AX.

      Thank you for your suggestion. We will add pH2AX blot data in fig S2A (all reviewers requested).

      • By showing data in figure S2B/C/D/E, the authors describe 'PNKP KO cells impaired the SSBs repair activity'. However, as the authors mentioned in this manuscript, PNKP could bind to either XRCC1 or XRCC4. Also for this experiment, IR had been applied, which induces DNA double strand breaks. Therefore, it is not certain that the authors' description is fully supported by these data presented. Perhaps, SSB inducing reagents should be used instead of IR.

      In figure S2B/C/D/E, we used gamma-ray as IR source, which classified as low energy transfer irradiation. which mainly act as indirect effect to the DNA. It is estimated gamma-ray induce DNA damage as 60-80% SSBs and 20-40 % DSBs. We believe our results are reasonable. In addition to these results, we will perform poly-ADP-ribose assay with H2O2 treatment to more specifically assess SSBs repair activity.

      • Is there any FACS analysis data to support the description of the last sentence 'especially the phosphorylation of PNKP T118, is required for S phase progression and proper cell proliferation'?

      Thank you for your suggestion. We will add the FACS analysis data of cell cycle profiles in PNKP KO cells expressing GFP, GFP-PNKP WT, T118A.

      In results: 'CDKs phosphorylate T118 of PNKP ~~~ replication forks'

      • In figure 3A, Is there any change in total PNKP (both GFP-tagged & endogenous) level?

      *Thank you for your suggestion. We agree with your comment. We will add the PNKP expression analysis in different cell cycle population in asynchronized and synchronized cells (G1, S, G2/M samples). *

      In results: 'Phosphorylation of PNKP at T118 ~~~ between Okazaki fragments'

      • In figure 4D, What happens in the ADP-ribose level, when T118D PNKP is expressed?

      *Thank you for your suggestion. This is interesting question. We will perform ADP-ribosylation assay in PNKP KO cells and PNKP KO cells expressing PNKP WT and T118D, and add data of ADP-ribose levels in those cells. *

      In results: 'PNKP is involved in postreplicative single-strand DNA gap-filling pathway'

      • The description regarding data presented in figure 6 is not clear enough. These data might suggest that wildtype U2OS does not have SSB which is a substrate for S1 nuclease (except under FEN1i and PARPi treatment), whereas PNKP KO has SSB during both IdU and CIdU incorporation, so that S1 nuclease treatment dramatically reduces the speed of fork formation in PNKP KO cells. Also In figure 6B/C/D, adding an experimental group of PNKP KO with S1 nuclease + PARPi might help to understand the role of PNKP during replication better. Also these additional data could support the description in discussion 'Furthermore, PNKP is required for the PARP1-dependent single-strand gap-filling pathway ~~~ DNA gap structure'.

      • *

      *We agree with reviewer's comment and suggestion. Since this point is also raised by reviewer 3, we will add the rationale of the experiment and more detailed description about the results, which would substantially improve this manuscript. We will also revise our representation in text followed by the comment. In addition to revising the text, we will add experiment groups of PNKP KO with S1 nuclease with/without PARPi as the reviewer suggested. *

      In results: 'Phosphorylation of PNKP at T118 is essential for genome stability'

      • In figure S8C, Did you measure g-H2AX foci disappearance for later time point, such as 24 hrs after DNA damage? Is not clear whether non-phosphorylated PNKP at T118 inhibit DNA damage repair or make it slower? How does T114A-PNKP behave in this experimental condition? T114 is well known target of ATM/DNAPK for DDR & DSB repair.

      Thank you for your suggestion. We agree with your point. It is very important to analyze whether T118A mutant shows delayed or total loss of DSB repair ability. We will add the measurement of pH2AX foci at 24 hrs after IR in PNKP KO cells expressing GFP, WT-PNKP, T118A-PNKP. Although the analysis of pS114 PNKP is previously reported (Segal-Raz et al., EMBO reports, 2011 and Zolner et al., Nucleic Acids Research, 2011), we will also perform pH2AX assay in PNKP KO cells expressing S114A-PNKP as a control.

      The result shown in figure S9 should be described in the result section, not in the discussion section.

      Thank you for your suggestion. This is a point also raised by Reviewer 3. Since we are going to re-consider the layout of the manuscript upon the planned revision (as reviewer 3 suggested), we will move these points to the appropriate result section from the discussion.

      **Referees cross-commenting**

      I could see a similar degree of positive tendency toward the manuscript. I agree with the comments and suggestions in additional experiments made by reviewers 2 and 3. Those suggestions will improve an impact of the manuscript in the DNA damage repair field.

      Reviewer #1 (Significance (Required)):

      Significance

      The authors discovered new phosphorylation site (T118) of PNKP which is an important DNA repair protein. This modification seems to play a role in maintenance of the lagging strand stability in S phase. This discovery is something positive in DNA repair field to expand the canonical and non-canonical functions of DNA repair factors.

      The data presented to support PNKP functions and T118 phosphorylation in S phase seem solid in general, yet it is not sure how much PNKP is critical in the Okazaki fragment maturation process which is known that several end processing enzymes (like FEN1, EXO1, DNA2 etc which leave clean DNA ends.) are involved.

      These finding might draw good attentions from researchers interested broadly in cell cycle, DNA damage repair, replication, and possibly new tumor treatment.

      My field and research interest: DNA damage response (including cell cycle arrest and programmed cell death), DNA damage repair (including BER, SSBR, DSBR)

      Thank you very much for your positive comment. As you mentioned, there are several other end processing enzymes that seem to be involved in Okazaki fragment maturation, however, none of those enzymes is reported as a protein involved in the gap-filling pathway as well. Therefore, the role(s) of PNKP in DNA replication are very outstanding as PNKP could be involved in two separate pathways, Okazaki fragment maturation and a back-up gap-filling repair process. As you suggested, we will add several experiments such as iPOND experiments using XRCC1-depleted cells, analysis of DNA repair ability of PNKP T118A mutant throughout cell cycle and S1 nuclease DNA fiber assays in PNKP KO cells with/without PARP inhibitor treatment, to reveal how much PNKP is critical in the Okazaki fragment maturation. We believe that performing those experiments makes the conclusion and this manuscript more solid and convincing.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Polynucleotide kinase phosphatase (PNPK) participates in multiple DNA repair processes, where it acts on DNA breaks to generate 5'-phosphate and 3'-OH ends, facilitating the downstream activities of DNA ligases or polymerases.

      This manuscript identifies a CDK-dependent phosphorylation site on threonine 118 in PNKP's linker region. The authors provide some convincing evidence that this modification is important to direct the activity of PNPK towards ssDNA gaps between Okazaki fragments during DNA replication. The authors monitored protein expression levels, enzymatic activity, the growth rate and replication fork speed, as well as the presence of ssDNA damage to make a comprehensive overview of the features of PNKP necessary for its function.

      Overall, the conclusions are sufficiently supported by the results and this manuscript is relevant and of general interest to the DNA repair and genome stability fields. Some level of revision to the experimental data and text would help strengthen its message and conclusions.

      Major points:

      In an iPOND experiment the authors detect the wt PNKP and the T118 phosphorylated form at the forks and conclude that this phosphorylation promotes interaction with nascent DNA (Figure 3E). An informative sample to include here would have been the T118A mutant. Based on the model proposed, the prediction would be that it would not be associated with the forks, or at least, associated at reduced levels compared to the wt. *Thank you for your suggestion. We agree with your comment. We will add the iPOND analysis in PNKP KO cells expressing T118A mutant to confirm that pT118 is important for recruitment of PNKP at nascent DNA. *

      The quality of the gels showing the phosphatase and kinase assays in Figure 5 could be improved to facilitate quantification of the results. The gel showing the phosphatase activity has a deformed band corresponding to K378A mutant. The gel showing the kinase activity seems to be hitting the detection limits, and the overall high background might influence the quantification of D171A mutant in the area of interest. The authors should provide a better quality of these gels, focusing on better separation (running them longer, eventually with a slightly increased electric current) and higher signal of the analyzed bands (longer incubation phosphatase/kinase prior to quenching or loading higher amount of DNA).

      We agree with your suggestion. This phosphatase and kinase assay could be improved. We will perform this assay again followed by reviewer's suggestions.

      The authors sometimes make statements like: "a slight increase, slightly increased, relatively high" without an evaluation of the statistical significance for the presented data. An example of such a statement is: "T118A mutant-expressing cells exhibited a marked delay in cell growth, which was not observed for S114A, although T122A, S126A, and S143A were slightly delayed," based on the figure 2E. A similar comment applies also to figures 4A, 5A, 5E. Whenever possible, the authors should include also an evaluation of the statistical significance in the statement.

      Thank you for your suggestion. We will check manuscript and revise representation as reviewer's suggestion.

      Minor revisions:

      I could not find a gH2AX blot for figure S2A.

      Thank you for your suggestion. We will add pH2AX blot data in fig S2A.

      The authors established two PNKP-/- clones and supported it with sequencing and several functional observations However, the C-terminal antibody appears to detect lower-intensity bands (Figure 1A). Can authors comment on those bands?

      Thank you for your comment. One possibility of this band is artificially recognized bands. To improve this problem, we will try electrophoresis for longer time to separate this band.

      Why the S1 nuclease data on DNA fibers do not show the same level of epistasis with the Fen1i, as do those on ADP-ribosylation?

      Because FEN1 dependent Okazaki fragment maturation and PARP1-XRCC1 dependent gap-filling pathway are different pathways, FEN1i and PARPi treatment resulted in an additive effect in S1 nuclease data in PNKP WT cells. To facilitate better understanding, we will add graphical scheme in figure 6 (a similar problem was raised by Reviewer 3 below) and revise the description of the result.

      **Referees cross-commenting**

      I agree with all the comments from the reviewers 1 and 3.

      Reviewer #2 (Significance (Required)):

      Significance:

      The manuscript identifies a CDK phosphorylation site in a relevant DNA repair protein. The experiments on this part are elegant and convincing. It seems that this phosphorylation is important during DNA replication and there is some supporting evidence in this point, although not as robust, meaning that it is not clear whether this phosphorylation is controlling specifically the recruitment to Okazaki fragments, or a general role in DNA repair. Maybe if they see a reduced recruitment of the T118A mutant to the forks (iPOND experiment) this would further increase the impact.

      This work will be relevant to the basic research, especially in the fields of DNA repair and DNA replication.

      My expertise: DNA replication, genome stability, telomere biology.

      Thank you very much for your positive comment. As you suggested, we will perform an iPOND assay using PNKP T118A mutant. In addition of the T118A iPOND assay, we will also analyze the DNA repair function of PNKP T118A mutant throughout cell cycle as reviewer 1 suggested. We believe that results of these experiments will pin down whether the phosphorylation of PNKP on T118 is controlling its recruitment to Okazaki fragments specifically or single-strand DNA gaps in general, and solidify the conclusion of the manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Tsukada and colleagues studied the role of PNKP phosphorylation in processing single-strand DNA gaps and its link to fork progression and processing of Okazaki fragments.

      They generated two PNKP KO human clonal cell lines and described defects in cell growth, accumulation in S-phase, and faster fork progression. With some elegant experiments, they complement the KO cell lines with deletion and point mutants for PNKP, identifying a critical phosphorylation site (T118) in the linker regions, which is important for cell growth and DNA replication.

      They show that phosphorylation of PNKP peaks in the mid-S phase. CDK1 and CDK2/ with Cyclin A2 are the two main CDK complexes responsible for this modification. With the IPOND experiment, the author shows that PNKP is recruited at nascent DNA during replication.

      They described increased parylation activity in PNKP KO cells, and by using HU and emetin, they concluded that this increased activity depends on replication and synthesis of Okazaki fragments.

      Interfering with Okazaki fragment maturation by FEN1 inhibition is epistatic with PNKP KO (and T118A) in influencing parylation activity in the S phase and fork progression. The authors try to understand by mutant complementation which of the two functions (Phosphatase vs Kinase) is important in processing OF, and they propose a primary role for the phosphatase activity of PNKP. They also show that T118 is important in controlling genome stability following different genotoxic stress. Finally, by coupling the measurement of fork progression with PARP/FEN1 inhibitors and S1 treatment, they propose a role of PNKP in the post-replicative repair of single-strand gaps due to unligated OF.

      Here are my major points:

      The authors use a poly ADP ribose deposition measurement to estimate SSB nick/gap formation. Even if PARP activity is strictly linked to SSB repair, ADP ribosylation does not directly estimate SSB/nick gap formation. In addition, in Figs S2A, B, and C, the authors use IR and PARG inhibition to measure poly-ADP ribosylation in WT and PNKP KO cells. IR produces both SSB and DSB. A better and cleaner experiment would be to directly measure SSB formation (with alkaline comet assay, for example) in combination with treatments that are known to mainly cause SSB (H2O2, or low doses of bleomycin). Thank you for your suggestion. The main purpose of this manuscript is to clarify the potential role of PNKP in DNA replication. Therefore, we generated PNKP KO human cells and figure S2 showed confirmation of function of established role of PNKP in SSBs and DSBs repair. In addition, previous our report published in EMBO Journal (Shimada et al., 2015), we showed SSBs and DSBs repair defect in PNKP KO MEF with comet assay (both alkaline and neutral) after IR and H2O2 treatment. In addition to those observations, we will also perform BrdU incorporation assay in PNKP WT and KO cells treated with H2O2. BrdU staining under an undenatured condition has now been commonly used and is a more direct method to detect ssDNA nick/gap formation. We believe that the importance of PNKP in SSB repair is sufficiently supported by all data such as previous comet assays in PNKP KO MEF cells and two SSB repair assays in human cells using ADP-ribose staining or BrdU incorporation, which will be provided in the revised manuscript.

      The manuscript would benefit from substantially restructuring the figures' order and panels. Before starting the T118 part, the authors could create several figures to explain the main consequences of the loss of PNKP. A figure could be focused on DSB-driven genome instability (fig1 + fig S8 and S9). Then, a figure for the single-strand break and link to the S-phase. For example, by using data from Figure 6 and showing only WT vs PNKP KO +- Nuclease S1 (without FEN1 or PARP inhibitors), the authors could easily convince the readers that loss of PNKP leads to the accumulation of single-strand gaps. Only in the second part of the manuscript could they introduce all the T118 parts. Thank you for your suggestion. The layout of this manuscript makes reviewers feeling confusing. After performing all planned experiments, we will carefully re-consider the total layout of the revised manuscript.

      I understand the use of a FEN1 inhibitor to link the PNKP KO phenotype to OF processing, but this drug does not either rescue or exacerbate any of the phenotypes described by the authors. It seems to have just an epistatic effect everywhere. So, what other conclusion can we have if not that PNKO has a similar effect to FEN1? I think that the presence of this inhibitor in many plots complicates the digestion of several figures a little bit. Maybe clustering the data in a different way (DMSO on one side FEN1i on the other) would help. Thank you for your suggestion. We agree that this data set is complicate. To facilitate better understanding, we will change organization of the data according to your suggestion and add graphical scheme in figure 6.

      In terms of the other conclusion we can have from those experiments, the other conclusion is that PNKP might plays two important roles in DNA replication: Okazaki fragment maturation, which seems an epistatic effect with FEN1, and PARP1-XRCC1 dependent single-strand gap filling pathway, which is required for repairing single-strand gaps between Okazaki fragments when Okazaki fragment maturation pathway does not work properly (e.g., loss of FEN1 or PNKP). In figure 6D, we show that a double treatment of FEN1i and PARPi in PNKP WT cells with S1 nuclease treatment shows extensive amount of digested DNA fibers, although a single treatment of either FEN1i or PARPi in PNKP WT cells with S1 nuclease treatment leads to only limited amount of digested DNA fibers, which indicates that two pathways regulated by FEN1 or PARP are coordinately required for preventing eruption of ssDNA gaps in DNA replication. On the other hand, PNKP KO cells with S1 nuclease treatment cause extensive amount of digested DNA fibers even without FEN1i and PARP1i treatments, also it is not further increased by FEN1i and PARPi treatment. Those results indicate that PNKP itself is involved in two pathways mentioned above. Therefore, loss of PNKP has a similar phenotype with loss of FEN1 in terms of Okazaki fragment maturation, but also there is an additional effect in repairing ssDNA nicks/gaps, which is created in FEN1 loss condition, but FEN1 seems not dealing with it.

      Fig S9 should be removed from the discussion. Additionally, the authors should consider whether they want to keep that piece of data in a manuscript that is already pretty dense. Why should we focus on additional linker residues and microirradiation data at the end of this manuscript? *Thank you for your suggestion. This is a point also raised by Reviewer 1. Since we are going to re-consider the layout of the manuscript upon the planned revision, we will move these points to the appropriate result section from the discussion. *

      I suggest using a free AI writing assistant. I think this manuscript would substantially benefit from one. As a non-native English speaker, I personally use one of them and find it extremely useful. Thank you for your suggestion. Our manuscript was revised by a native speaker from an English correction company. However, for revised manuscript, we will discuss with native speakers as well as use a free AI writing assistant to improve the quality of the manuscript.

      Minor points:

      In Figure S1A, the author refers to P-H2AX, but I do not see this marker in the western blot. Thank you for your suggestion. We will add pH2AX blot data in fig S2A.

      **Referees cross-commenting**

      I agree with all comments from reviewer 1 and 2.

      Reviewer #3 (Significance (Required)):

      This is an interesting paper with generally solid data and proper statistical analysis. The figures are pretty straightforward. Unfortunately, the manuscript is dry, and the reader needs help to follow the logical order and the rationale of the experiments proposed. This is also complicated by the enormous amount of data the authors have generated. The authors should improve their narrative, explaining better why they are performing the experiment and not simply referring to a previous citation. Reordering panels and figures would help in this regard. Overall, with some new experiments, tone-downs over strong claims and a better explanation of the rationale behind experiments the authors could create a fascinating paper.

      Thank you very much for your positive comment about the data/analysis and the logic behind the experiments provided in the manuscript. We agree with that a manner and a structure of the manuscript could be improved by reordering figures, cutting down some redundant experiments, adding better explanation of the rationale behind experiments, and toning-down some claims. With rewriting the manuscript as stated above and performing several additional experiments suggested by the reviewers, we believe that the revised manuscript will be more convincing and fascinating.

      1. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      • *

      Reviewer #1:

      Minor comments

      • Is there any difference (except for PARGi exposure time?!) between figure S2B/C and S2D/E? Both data show increased ADP ribose after IR. It seems redundancy. Also it is hard to imagine that there is absolutely no sign of ADP ribose after IR w/o PARGi treatment (figure S2D).

      Figure S2B/C show spontaneous single strand DNA breaks (SSBs) in PNKP KO cells, on the other hand, figure 2S/E show ectopic SSBs induced by IR exposure in PNKP KO cells. We believe these data help for readers to understand the effect of endo or exo damage in PNKP KO cells. Poly-ADP ribosylations are immediately removed from SSB sites after repair as demonstrated previously (Tsukada, et al., PLoS One 2019, Kalasova et al., Nucleic Acids Research, 2020), although not zero (low level), it is very difficult to detect without PARGi treatment.

      • *

      Legend for figure S3 - typo!

      Thank you for your suggestion about typo. The legend for figure S3 is corrected as "Protein expression of PNKP mutants in U2OS cells".

      • *

      • In figure S3A/B, it is quite interesting that the PNKP antibody used for this analysis can detect all truncated and alanine substituted PNKP proteins. It might be helpful to indicate for other researchers which antibody used (Novus; epitope - 57aa to 189 aa or Abcam; epitope not revealed).

      In S3A/B, Novus PNKP antibody was used for all blots. We indicated this in the figure legend as "PNKP antibody (Novus: NBP1-87257) was used for comparing expression levels of endogenous and exogenous PNKP".

      • *

      In results: 'PNKP phosphorylation, especially of T118 ~~~ proliferation'

      • In the fork progression experiment (figure 2C), is there any statistical difference between D2 and D3/4 expressing cells?

      *Thank you for your suggestion. We performed statistical analysis as the reviewer suggested. Statistical analysis shows that there are no significant differences between D2 and D3/D4. Meanwhile, there are significant differences between WT and D3(P- What is the basis of the description 'Since the linker region of PNKP is considered to be involved in fork progression'? Any reference?

      This sentence was considered based on the above sentences "Furthermore, D2 mutant-expressing cells also showed an increased speed of the replication fork compared to WT and D1 mutant-expressing cells, although D3 and D4 showed mildly high-speed fork progression.". The D2 mutant lacks a whole linker region, which shows increased speed of DNA fiber in figure 2C. Therefore, we originally explained as the sentence above. We have revised the sentence to "Since these results may indicate the linker region of PNKP is involved in proper fork progression".

      • *

      • In figure 3B: pS114-PNKP (also pS15-p53) is DNA damage inducible. In this experiment, was DNA damage introduced? Roscovitine could hinder DNA repair process, but not inducing DNA damage itself.

      Thank you for your suggestion. DNA damage induction was not applied in this experiment. We agree that this panel makes confusing. We think that endogenously S114-PNKP (also S15-p53) might be phosphorylated slightly but not significant, although this is not the scope of this manuscript. This result showing that phosphorylated-T118 is reduced by Roscovitine treatment maybe redundant as we also have a result of in vitro phosphorylation assay using several combinations of CDKs and Cyclin proteins, which is a cleaner experiment to prove which CDK/Cyclin complex is directly controlling the T118 phosphorylation. Since the manuscript already contains enough amount of data to support the conclusion (as reviewer 3 also stated), we removed those blots result from the panel to avoid complicating the conclusion.

      • *

      In results: 'Phosphatase activity of PNKP is ~~~ of Okazaki fragments'

      • In figure 5C, any statistical analysis between WT-PNKP KO vs D171A-PNKP KO or K378A-PNKP KO has been done?

      Thank you for your comment. Statistical analysis shows P *

      In discussion, 'In contrast, the T118A mutants showed the absence of both SSBs and DSBs repair (Fig. S7) : figure S7 does not indicate what the authors describe.

      Thank you for pointing out this. This should refer to figure S8 instead of figure S7. We have corrected this error.

      In addition, the same sentence in discussion: No evidence demonstrate that 'the absence of both SSBs and DSBs repair', and the following sentence is not clear.

      *This is same point with above. We have corrected this mis-referencing and revised the sentence to "In contrast, the T118A mutants showed the impaired abilities of both SSBs and DSBs repair (Fig. S8).". We also revised the following sentence to "However, residual SSBs due to impaired SSB repair ability (e.g., in PARPi-treated cells and T118A cells) sometimes cause DNA replication-coupled DSBs formation in S phase, and the phenotype in DSB repair assay of the T118A mutant may be caused by an accumulated formation of DNA replication-coupled DSBs. Future works will be needed to distinguish whether the T118 phosphorylation directly regulate PNKP recruitment to DSBs as well as SSBs." for better explanation of the result. *

      • *

      In discussion, 'Because both CDK1/cyclin A2 and CDK2/cyclin A2 are involved in PNKP phosphorylation, cyclin A2 is likely important for these activities': It is not clear what this description intends? Is 'cyclin A2' important in what stance?

      This description is coming from Fig3C observation. Since both CDK1 and CDK2 activities are cyclin A2 dependent, we speculated cyclin A2 is important for CDK1/CDK2 dependent PNKP T118 phosphorylation. We revised the description to "Since both CDK1/Cyclin A2 and CDK2/Cyclin A2 phosphorylate T118 of PNKP, we speculated that PNKP T118 is phosphorylated in S phase to G2 phase in CDK1/Cyclin A2- and CDK2/Cyclin A2-dependent manner (Fig. 3B and C)".

      • *

      In discussion, 'This may be explained by the fact that mutations in the phosphorylated residue in the linker region are embryonic lethal': any reference to support this embryonic lethality?

      Thank you for your suggestion. We agree with that this sentence is overwriting. We revise the sentence to "This observation may indicate that mutations in the phosphorylated residue (T118) in the linker region are potentially embryonic lethal due to the importance of T118 in DNA replication, which is revealed in the present study.".

      • *

      • *

      Reviewer #2:

      Minor comments

      Sometimes there are incorrect references to the figures in the discussion (e.g. FigS9A, B, and C, are called out instead of E, F and G), a similar issue is found 4 lines below in the same page.

      Thank you for pointing out these errors. We checked the references in the discussion and corrected to the appropriate references.

      Based on the data in Figure 3A the authors suggest that pT118-PNKP follows Cyclin A2 levels, but this does not appear very clearly in the gel, especially for the last point. Even though the results are convincing, the authors should rephrase the conclusions of Figure 3A to reflect better the results.

      Thank you for your suggestion. We agree that this phrase is overwriting. We revised the conclusion to "pT118-PNKP was detected in asynchronized cells but increased particularly in the S phase, similar to Cyclin A2 expression levels, although the reduction of pT118, possibly dephosphorylation of T118, seems not as robust as the reduction of the Cyclin A2 expression level at the 12 hours time point. However, this effect was very weak during mitosis, suggesting that T118 phosphorylation plays a specific role in the S phase.".

      I did not find a reference to what seems to be a relevant work in this topic: PMID: 22171004

      Thank you for your suggestion. We have added the ref (Coquelle et al., PNAS, 2011) in Introduction section.


      Reviewer #3:

      Major comments

      The authors should consider and discuss the potential role of PNKP KO outside of the S-phase. In Figure 4C, while it is clear that poly ADP ribosylation is higher in S-phase, the effects of PNKP KO and complementation by WT or T118A are equally present. This would be more immediate if comparison, fold change, and statistical significance calculation were done within the same cell cycle phase instead of between cell stages. This is also clear by IF in Figure 4B. How do the authors explain this? Thank you for your suggestion. We agree with reviewer's suggestion. We compared intensities of ADP-ribose between cell lines in same cell cycle rather than between different cell cycles in a same cell line and added the respective statistics in figure 4C. Also, we agree with that poly ADP-ribose intensity is changed outside of S phase between WT and T118A PNKP expressing PNKP KO cells. As shown in figure S8, PNKP pT118 is also involved in DNA repair. These results might reflect of PNKP function outside of S phase. We have added the sentence "Of note, PNKP/*cells and PNKP T118A cells showed markedly higher ADP-ribose intensity in outside the S phase as well, which indicate that PNKP and T118 may have an endogenous role to prevent SSBs formation in outside the S phase. Since FEN1 has been reported to function in R-loop processing, PNKP could also be involved in this process. Future studies of a role of PNKP in different cell cycle will be able to address this question." to discuss about the function of PNKP outside the S phase. We have added the ref (Cristini et al., Cell Reports, 2019, and Laverde et al., Genes, 2022). *

      • *

      • *

      In connection with the previous point, can the author provide the same quantification in Figure 4E also for G2/M and not only the S phase? This should give an estimate of the activity of FEN1 outside the S-phase. This is important because FEN1 has other functions apart from OF maturation, such as R loop processing (Cristini 2019; Laverde 2023) Thank you for your suggestion. Here attached is the data of ADP-ribose intensity in cells outside the S phase as you suggested. FEN1i treatment still induces increased ADP-ribose intensity in outside the S phase as well, although the difference between with/without FEN1i treatment is much smaller than that in S phase, indicating that FEN1 has other functions outside the S phase. This finding is very interesting. However, the function of FEN1 in outside the S phase is outside the scope of this manuscript. Therefore, we would like to not put this data in the manuscript to avoid complicating the conclusion (as reviewer 3 also suggested).

      • *

      Why does FEN1 inhibition induce a faster fork progression in Fig4 but not in Fig5 and Fig6? Yes, it does in figure 4 and figure 5. In PNKP WT cells, FEN1i-treated fibers (CldU) show an increased speed of forks compared to non-treated fibers (IdU). However, loss of PNKP and T118 phosphorylation themselves cause a faster fork progression even if without FEN1i treatment, therefore the difference of speeds of forks before/after FEN1i treatment in PNKP KO and T118A cells is disappeared as both fibers grow faster than intact fibers in normal cells. In regard to figure 6, as you mentioned in a latter comment about figure 6, the title of vertical axis of the graph showing CldU length should not be speeds of replication forks as those DNA fibers are potentially digested by S1 nuclease, which is modified in the revised manuscript. Even so, DNA fibers from FEN1i-treated cells (CldU) with S1 nuclease shows similar length with fibers from untreated cells with S1 nuclease, whereas FEN1 inhibitor treatment accelerates a speed of forks in general (figure 4 and figure 5, assays without S1 nuclease), indicating that FEN1i treatment induces remaining of some ssDNA nicks/gaps which are substrates of S1 nuclease.

      • *

      How do the authors explain the impaired DNA gap binding activity of the phospho-mimetic T118D? Thank you for your suggestion. We think that the appropriate timing of phosphorylation of PNKP T118 is important, while the phosphor-mimetic mutant T118D mimics consecutively phosphorylated situation that may result in incomplete complementation of PNKP function.

      • *

      I would like to see a representative fiber image from Fig 6. Additionally, in Figure 6, the author should not label the y-axis as CldU-fork speed. Nuclease S1 treatment destroys single-strand gaps (in vitro) and does not affect the fork speed (in vivo) Thank you for your suggestion. We have added a representative fiber image. We also agree with that CldU fork speed is not a right label of y-axis as CldU fibers are potentially digested by S1 nuclease. We changed the y-axis label to "CldU tract length [kb/min]" in figure 6.

      • *

      Figure 5E: both mutants (kinase vs phosphatase) increase polyADP ribose intensity, while the title of this figure only emphasizes the phosphatase activity. We agree with your comment. We have changed this subtitle to "Enzymatic activities of PNKP is important for the end-processing of Okazaki fragments".

      • *

      • *

      Minor comments

      • *

      The authors refer to Hoch Nature 2017 when referring to polyADP ribose IF + PARG inhibition. Should they not refer to Hanzlikova Mol Cell 2018?

      Thank you for your suggestion. We have added the ref (Hanzlikova et al., Mol Cell 2018).

      Statistical analysis should be performed on the cell cycle profile in Figure 1B * *

      We performed statistical analysis to check whether there are significant differences of S phase population between WT and PNKP KO cells. There were significant differences between WT vs PNKP KO C1 (PThe authors should not refer to fork degradation or protection as a given fact without assessing it in these conditions. Thank you for your suggestion. We assume that this comment refers to the result section of figure 1 and figure 4. We have added a sentence "although future studies will be needed to investigate whether PNKP/ cells has the fork protection phenotype" in the result section of figure 1. We have changed representation in the section according to the reviewer's suggestion in the result section of figure 4.*

      • *

      • *

    1. Reviewer #1 (Public Review):

      Summary:

      The authors want to explore how much two known minibinder protein domains against the Spike protein of SARS-CoV-2 can function as a binding domain of 2 sets of synthetic receptors (SNIPR and CAR); the authors also want to know how some modifications of the linkers of these new receptors affect their activation profile.

      Major strengths and weaknesses of the methods and results:

      - Strengths include: analysis of synthetic receptor function for 2 classes of synthetic receptors, with robust and appropriate assays for both kinds of receptors. The modifications of the linkers are also interesting and the types of modifications that are often used in the field.

      - Weaknesses include: none of the data analysis provides statistical interpretation of the results (that I could find). One dataset is confusing: Figures 5A and C, are said to be the same assay with the same constructs, but the results are 30% in A, and 70% in C.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      Given the open-ended nature of the goal (implicit in it being an exploration), it is hard to say if the authors have reached their aims; they have done an exploration for sure; is it big enough an exploration? This reviewer is not sure.

      The results are extremely clearly presented, both in the figures and in the text, both for the methods and the results. The claims put forward (with limited exceptions see below) are very solidly supported by the presented data.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:

      The work may stimulate others to consider minibinders as potential binding domains for synthetic receptors. The modifications that are presented although not novel, do provide a starting point for larger-scale analysis.

      It is not clear how much this is generalizable to other binders (the authors don't make such claims though). The claims are very focused on the tested modifications, and the 2 receptors and minibinder used, a scope that I would define as narrow; the take-home message if one wants to try it with other minibinders or other receptors seems to be: test a few things, and your results may surprise you.

      Any additional context you think would help readers interpret or understand the significance of the work:

      We are at the infancy stage of synthetic receptors optimization and next-generation derivation; there is a dearth of systematic studies, as most focus is on developing a few ones that work. This work is an interesting attempt to catalyze more research with these new minibinders. Will it be picked up based on this? Not sure.

    1. Social Media platforms use the data they collect on users and infer about users to increase their power and increase their profits.

      I agree with this statement and feel deeply concerned. When we use most social media platforms, they usually select user preference content for us when registering an account in order to push content to users that they are more interested in. And this is actually a way to obtain information and find ways to attract the user's attention. Moreover, as we use the software, we are also using different algorithms to infer how our interests have changed. At the same time, we may also cooperate with shopping software to directly push the items of interest we just mentioned in the video or forum social media so that we can purchase them. I think it’s a bit scary how much big data knows about us.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Huang and colleagues explored the role of iron in bacterial therapy for cancer. Using proteomics, they revealed the upregulation of bacterial genes that uptake iron, and reasoned that such regulation is an adaptation to the iron-deficient tumor microenvironment. Logically, they engineered E. Coli strains with enhanced iron-uptake efficiency, and showed that these strains, together with iron scavengers, suppress tumor growth in a mouse model. Lastly, they reported the tumor suppression by IroA-E. Coli provides immunological memory via CD8+ T cells. In general, I find the findings in the manuscript novel and the evidence convincing.

      (1) Although the genetic and proteomic data are convincing, would it be possible to directly quantify the iron concentration in (1) E. Coli in different growth environments, and (2) tumor microenvironment? This will provide the functional consequences of upregulating genes that import iron into the bacteria.

      We appreciate the reviewer’s comment regarding the precise quantification of iron concentrations. In our study, we attempted various experimental approaches, including Immunohistochemistry utilizing an a Fe3+ probe, iron assay kit (ab83366), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Despite these attempts, the quantification of oxidized Fe3+ concentrations proved challenging due to the inherently low levels of Fe ions and difficulty to distinguish Fe2+ and Fe3+. We observed measurements below the detection threshold of even the sensitive ICP-MS technique. To circumvent this limitation, we designed an experiment wherein bacteria were cultured in a medium supplemented with Chrome Azurol S (CAS) reagent, which colormetrically detects siderophore activity. We compared WT bacteria and IroA-expressing bacteria at varying levels of Lcn2 proteins. The outcome, as depicted in the updated Fig. 3b, reveals an enhanced iron acquisition capability in IroA-E. coli under the presence of Lcn2 proteins, in comparison to the wild-type E. coli strains. In addition to the Lcn2 study, the proteomic study in Figure 4 highlights the competitive landscape between cancer cells and bacteria. We observed that IroA-E. coli showed reduced stress responses and exerted elevated iron-associated stress to cancer cells, thus further supporting the IroA-E. coli’s iron-scavenging capability against nutritional immunity.

      (2) Related to 1, the experiment to study the synergistic effect of CDG and VLX600 (lines 139-175) is very nice and promising, but one flaw here is a lack of the measurement of iron concentration. Therefore, a possible explanation could be that CDG acts in another manner, unrelated to iron uptake, that synergizes with VLX600's function to deplete iron from cancer cells. Here, a direct measurement of iron concentration will show the effect of CDG on iron uptake, thus complementing the missing link.

      We appreciate the reviewer’s comment and would like to point the reviewer to our results in Figure S3, which shows that the expression of CDG enhances bacteria survival in the presence of LCN2 proteins, which reflects the competitive relationship between CDG and enterobactin for LCN2 proteins as previously shown by Li et al. [Nat Commun 6:8330, 2015]. We regret to inform the reviewer that direct measurement of iron concentration was attempted to no avail due to the limited sensitivity of iron detecting assays. We do acknowledge that CDG may exert different effects in addition to enhancing iron uptake, particularly the potentiation of the STING pathway. We pointed out such effect in Fig 2c that shows enhanced macrophage stimulation by the CDG-expressing bacteria. We would like to accentuate, however, that a primary objective of the experiment is to show that the manipulation of nutritional immunity for promoting anticancer bacterial therapy can be achieved by combining bacteria with iron chelator VLX600. The multifaceted effects of CDG prompted us to focus on IroA-E. coli in subsequent experiments to examine the role of nutritional immunity on bacterial therapy. We have updated the associated text to better convey our experimental design principle.

      Lines 250-268: Although statistically significant, I would recommend the authors characterize the CD8+ T cells a little more, as the mechanism now seems quite elusive. What signals or memories do CD8+ T cells acquire after IroA-E. Coli treatment to confer their long-term immunogenicity?

      We apologize for the overinterpretation of the immune memory response in our previous manuscript and appreciate the reviewer’s recommendation to further characterize CD8+ T cells post-IroA-E. coli treatment. Our findings, which show robust tumor inhibition in rechallenge studies, indicate establishment of anticancer adaptive immune responses. As the scope of the present work is aimed at demonstrating the value of engineered bacteria for overcoming nutritional immunity, expounding on the memory phenotypes of the resulting cellular immunity is beyond the scope of the study. We do acknowledge that our initial writing overextended our claims and have revised the manuscript accordingly. The revised manuscript highlights induction of anticancer adaptive immunity, attributable to CD8+ T cells, following the bacterial therapy.

      (3) Perhaps this goes beyond the scope of the current manuscript, but how broadly applicable is the observed iron-transport phenomenon in other tumor models? I would recommend the authors to either experimentally test it in another model or at least discuss this question.

      We highly appreciate the reviewer’s suggestion regarding the generalizability of the iron-transport phenomenon in diverse tumor models. To address this, we extended our investigations beyond the initial model, employing B16-F10 melanoma and E0771 breast cancer in mouse subcutaneous models. The results, as depicted in Figures 3g to 3j and Figure S5, demonstrate the superiority of IroA-E. coli over WT bacteria in tumor inhibition. These findings support the broad implication of nutritional immunity as well as the potential of iron-scavenging bacteria for different solid tumor treatments.

      Reviewer #2 (Public Review):

      Summary:

      The authors provide strong evidence that bacteria, such as E. coli, compete with tumor cells for iron resources and consequently reduce tumor growth. When sequestration between LCN2 and bacterobactin is blocked by upregulating CDG(DGC-E. coli) or salmochelin(IroA-E.coli), E. coli increase iron uptake from the tumor microenvironment (TME) and restrict iron availability for tumor cells. Long-term remission in IroA-E.coli treated mice is associated with enhanced CD8+ T cell activity. Additionally, systemic delivery of IroA-E.coli shows a synergistic effect with chemotherapy reagent oxaliplatin to reduce tumor growth.

      Strengths:

      It is important to identify the iron-related crosstalk between E. coli and TME. Blocking lcn2-bacterobactin sequestration by different strategies consistently reduces tumor growth.

      Weaknesses:

      As engineered E.coli upregulate their function to uptake iron, they may increase the likelihood of escaping from nutritional immunity (LCN2 becomes insensitive to sequester iron from the bacteria). Would this raise the chance of developing sepsis? Do authors think that it is safe to administrate these engineered bacteria in mice or humans?

      We appreciate the reviewer’s comment on the safety evaluation of the iron-scavenging bacteria. To address the concern, we assessed the potential risk of sepsis development by measuring the bacterial burden and performing whole blood cell analyses following intravenous injection of the engineered bacteria. As illustrated in Figures 3k and 3l, our findings indicate that the administration of these engineered bacteria does not elevate the risk of sepsis. The blood cell analysis suggests that mice treated with the bacteria eventually return to baseline levels comparable to untreated mice, supporting the safety of this approach in our experimental models.

      Reviewer #3 (Public Review):

      Summary:

      Based on their observation that tumor has an iron-deficient microenvironment, and the assumption that nutritional immunity is important in bacteria-mediated tumor modulation, the authors postulate that manipulation of iron homeostasis can affect tumor growth. They show that iron chelation and engineered DGC-E. coli have synergistic effects on tumor growth suppression. Using engineered IroA-E. coli that presumably have more resistance to LCN2, they show improved tumor suppression and survival rate. They also conclude that the IroA-E. coli treated mice develop immunological memory, as they are resistant to repeat tumor injections, and these effects are mediated by CD8+ T cells. Finally, they show synergistic effects of IroA-E. coli and oxaliplatin in tumor suppression, which may have important clinical implications.

      Strengths:

      This paper uses straightforward in vitro and in vivo techniques to examine a specific and important question of nutritional immunity in bacteria-mediated tumor therapy. They are successful in showing that manipulation of iron regulation during nutritional immunity does affect the virulence of the bacteria, and in turn the tumor. These findings open future avenues of investigation, including the use of different bacteria, different delivery systems for therapeutics, and different tumor types.

      Weaknesses:

      • There is no discussion of the cancer type and why this cancer type was chosen. Colon cancer is not one of the more prominently studied cancer types for LCN2 activity. While this is a proof-of-concept paper, there should be some recognition of the potential different effects on different tumor types. For example, this model is dependent on significant LCN production, and different tumors have variable levels of LCN expression. Would the response of the tumor depend on the role of iron in that cancer type? For example, breast cancer aggressiveness has been shown to be influenced by FPN levels and labile iron pools.

      We highly appreciate the reviewer’s insightful comment on the varying LCN2 activities across different tumor types. In light of the reviewer’s suggestion, we extended our investigations beyond the initial colon cancer model, employing B16-F10 melanoma and E0771 breast cancer in mouse subcutaneous models. The results, as depicted in Figures 3g to 3j and Figure S5, demonstrate that IroA-E. coli consistently outperforms WT bacteria in tumor inhibition. We acknowledge the reviewer’s comment regarding LCN2 being more prominently examined in breast cancer and have highlighted this aspect in the revised manuscript. For colon and melanoma cancers, several reports have pointed out the correlation of LCN2 expression and the aggressiveness of these cancers [Int J Cancer. 2021 Oct 1;149(7):1495-1511][Nat Cancer. 2023 Mar;4(3):401-418], albeit to a lesser extent. These findings support the broad implication of nutritional immunity as well as the potential of iron-scavenging bacteria for different solid tumor treatments. The manuscript has been revised to reflect the reviewer’s insightful comment.

      • Are the effects on tumor suppression assumed to be from E. coli virulence, i.e. Does the higher number of bacteria result in increased immune-mediated tumor suppression? Or are the effects partially from iron status in the tumor cells and the TME?

      We appreciate the reviewer’s question regarding the therapeutic mechanism of IroA-E. coli. Bacterial therapy exerts its anticancer action through several different mechanisms, including bacterial virulence, nutrient and ecological competition, and immune stimulation. Decoupling one mechanism from another would be technically challenging and beyond the scope of the present work. With the objective of demonstrating that an iron-scavenging bacteria can elevate anticancer activity by circumventing nutritional immunity, we highlight our data in Fig. S6, which shows that IroA-E. coli administration resulted in higher bacterial colonization within solid tumors compared to WT-E. coli on Day 15. This increased bacterial presence supports our iron-scavenging bacteria design, and we highlight a few anticancer mechanisms mediated by the engineered bacteria. Firstly, as shown in Fig. 4d, IroA-E. coli is shown to induce an elevated iron stress response in tumor cells as the treated tumor cells show increased expression of transferrin receptors. Secondly, our experiments involving CD8+ T cell depletion indicates that the IroA-E. coli establishes a more robust anticancer CD8+ T cell response than WT bacteria. Both immune-mediated responses and alterations in iron status within the tumor microenvironment are demonstrated to contribute to the enhanced anticancer activity of IroA-E. coli in the present study.

      • If the effects are iron-related, could the authors provide some quantification of iron status in tumor cells and/or the TME? Could the proteomic data be queried for this data?

      We appreciate the reviewer’s query regarding the quantification of iron concentrations. In our study, we attempted various experimental approaches, including Immunohistochemistry utilizing an a Fe3+ probe, iron assay kit (ab83366), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Despite these attempts, the quantification of oxidized Fe3+ concentrations proved challenging due to the inherently low levels of Fe ions and difficulty to distinguish Fe2+ and Fe3+. We observed measurements below the detection threshold of even the sensitive ICP-MS technique. Consequently, to circumvent this limitation, we designed an experiment wherein bacteria were cultured in a medium supplemented with Chrome Azurol S (CAS) reagent, which colormetrically detects siderophore activity. We compared WT bacteria and IroA-expressing bacteria at varying levels of Lcn2 proteins. The outcome, as depicted in the updated Fig. 3b, reveals an enhanced iron acquisition capability in IroA-E. coli under the presence of Lcn2 proteins, in comparison to the wild-type E. coli strains. In addition to the Lcn2 study, the proteomic study in Figure 4 highlights the competitive landscape between cancer cells and bacteria. We observed that IroA-E. coli showed reduced stress responses and exerted elevated iron-associated stress to cancer cells, thus further supporting the IroA-E. coli’s iron-scavenging capability against nutritional immunity.

      Reviewing Editor:

      The authors provide compelling technically sound evidence that bacteria, such as E. coli, can be engineered to sequester iron to potentially compete with tumor cells for iron resources and consequently reduce tumor growth. Long-term remission in IroA-E.coli treated mice is associated with enhanced CD8+ T cell activity and a synergistic effect with chemotherapy reagent oxaliplatin is observed to reduce tumor growth. The following additional assessments are needed to fully evaluate the current work for completeness; please see individual reviews for further details.

      We appreciate the editor’s positive comment.

      (1) The premise is one of translation yet the authors have not demonstrated that manipulating bacteria to sequester iron does not provide a potential for sepsis or other evidence that this does not increase the competitiveness of bacteria relative to the host. Only tumor volume was provided rather than animal survival and cause of death, but bacterial virulence is enhanced including the possibility of septic demise. Alternatively, postulated by the authors, that tumor volume is decreased due to iron sequestration but they do not directly quantify the iron concentration in (1) E. Coli in different growth environments, and (2) tumor microenvironment. These important endpoints will provide the functional consequences of upregulating genes that import iron into the bacteria.

      We appreciate the editor’s comment and have added substantial data to support the translational potential of the iron-scavenging bacteria. In particular, we added evidence that the iron-scavenging bacteria does not increase the risk of sepsis (Fig. 3k, l), evidence of increased bacteria competitiveness and survival in tumor (Fig. S6), and iron-scavenging bacteria’s superior anticancer ability and survival benefit across 3 different tumor models (Fig. 3e-j; Fig. S5). While direct measurement of iron concentration in the tumor environment is technically difficult due to the challenge in differentiating Fe2+ and Fe3+ by available techniques, we added a colormetric CAS assay to demonstrate the iron-scavenging bacteria can more effectively utility Fe than WT bacteria in the presence of LCN2 (Fig. 3b). These results substantiate the translational relevance of the engineered bacteria.

      (2) There is no discussion of the cancer type and why this cancer type was chosen. If the current tumor modulation system is dependent on LCN2 activity, there would need to be some recognition that different tumors have variable levels of LCN expression. Would the response of the tumor depend on the role of iron in that cancer type?

      We appreciate the comment and added relevant text and citations describing clinical relevance of LCN2 expression associated with the tumor types used in the study (breast cancer, melanoma, and colon cancer). Elevated LCN2 has been associated with higher aggressiveness for all three cancer types.

      (3) To demonstrate long-term anti-cancer memory was established through enhancement of CD8+ T cell activity (Fig 5c), the "2nd seeding tumor cells" experiment may need to be done in CD8 antibody-treated IronA mice since CD8+ T cells may play a role in tumor suppression regardless of whether or not iron regulation is being manipulated. It appears that the control group for this experiment is naive mice (and not WT-E. coli treated mice), in which case the immunologic memory could be from having had tumor/E. coli rather than the effect of IroA-E. coli.

      We acknowledge that our prior writing may have overstated our claim on immunological memory. Our intention is to show that upon treatment and tumor eradication by iron-scavenging bacteria, adaptive immunity mediated by CD8 T cells can be elicited. We also did not consider a WT-E. coli control as no WT-E. coli treated group achieved complete tumor regression. We have modified our text to reflect our intended message.

      Reviewer #1 (Recommendations For The Authors):

      All the figures seem to be in low resolution and pixelated. Please upload high-resolution ones.

      We have updated figures to high-resolution ones.

      Reviewer #2 (Recommendations For The Authors):

      Some specific comments towards experiments:

      (1) For Fig 2 f/ Fig 3f/ Fig 5d/Fig6c, the survival rate is based on the tumor volume (the mouse was considered dead when the tumor volume exceeded 1,500 mm3). Did the mice die from the experiment (how many from each group)? If it only reflects the tumor size, do these figures deliver the same information as the tumor growth figure?

      We appreciate the reviewer’s comment. The survival rate is indeed based on tumor volume, and we used a cutoff of 1500 mm3. No death event was observed prior to the tumors reaching 1500 mm3. Although the survival figures cover some of the information conveyed by the tumor volume tracking, the figures offer additional temporal resolution of tumor progression with the survival figures. Having both tumor volume and survival tracking are commonly adopted to depict tumor progression. We have the protocol regarding survival monitoring to the materials and method section.

      (2) Fig 3a, not sure if entE is a good negative control for this experiment. Neg. Ctrl should maintain its CFU/ml at a certain level regardless of Lcn2 conc. However, entE conc. is at 100 CUF/ml throughout the experiment suggesting there is no entE in media or if it is supersensitive to Lcn2 that bacteria die at the dose of 0.1nM?

      We appreciate the reviewer’s comment. The △entE-E. coli was indeed observed to be highly sensitive to LCN2. We included the control to highlight the competitive relationship between entE and LCN2 for iron chelation, which is previously reported in literature [Biometals 32, 453–467 (2019)].

      (3) Fig 4, the authors harvested bacteria from the tumor by centrifuging homogenized samples at different speeds. Internal controls confirming sample purity (positive for bacteria and negative for cells for panels a,b,c; or vice versa for panel d) may be necessary. This comment may also apply to samples from Fig 1.

      We acknowledge the reviewer’s concern and would like to point out that the proteomic analysis was performed using a highly cited protocol that provides reference and normalization standards for E. coli proteins [Mol Cell Proteomics. 2014 Sep; 13(9): 2513–2526]. The reference is cited in the Materials and Method section associated with the proteomic analysis.

      (4) To demonstrate long-term anti-caner memory was established through enhancement of CD8+ T cell activity, the "2nd seeding tumor cells" experiment may need to be done in CD8 antibody-treated IronA mice.

      We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We apologize for overstating our claim in the previous manuscript draft.

      Minor suggestions:

      (1) Please include the tumor re-challenge experiment in the method section.

      The re-challenge experiment has been added to the method section as instructed.

      (2) Please cite others' and your previous work. E.g. line 281, 282, line 306-307.

      We have added the citations as instructed.

      (3) Line 448, BL21 is bacteria, not cells.

      We have made the correction accordingly.

      Reviewer #3 (Recommendations For The Authors):

      • The authors postulate that IroA-E. coli is more potent than DGC-E. coli in resisting LCN2 activity, and that this potency is the cause of the increased tumor suppression of this engineered strain. If so, Fig 3a should include DGC-E. coli for direct comparison.

      We appreciate the reviewer for the comment and would like to clarify that we intended construct IroA-E. coli as a more specific iron-scavenging strategy, which can aide the discussion of nutritional immunity and minimize compounding factors from the immune-stimulatory effect of CDG. We have modified our text to clarify our stance.

      • The data refers to the effects of WT bacteria-mediated tumor suppression, e.g. Figure 3e shows that even WT bacteria have a significant suppressive effect on tumor growth. Could the authors provide background on what is known about the mechanism of this tumor suppression, outside of tumor targeting and engineerability? They only reference "immune system stimulation."

      We appreciate the reviewer’s comment and would like to refer the reviewer to our recently published article [Lim et al., EMBO Molecular Medicine 2024; DOI: 10.1038/s44321-023-00022-w], which shows that in addition to immune system stimulation, WT bacteria can also be perceived as an invading species in the tumor that can exert differential selective pressure against cancer cells. Competition for nutrient is highlighted as a major contribution to contain tumor growth. In fact, the nutrient competition that we observed in the prior article inspired the design of the iron scavenging bacteria towards overcoming nutritional immunity. We have cited this recently published article to the revised manuscript to enrich the background.

      • The authors claim that there is immunologic memory because of tumor resistance in re-challenged mice after IroA-E. coli treatment (Fig 5c). It appears that the control group for this experiment is naive mice (and not WT-E. coli treated mice), in which case the immunologic memory could be from having had tumor/E. coli rather than the effect of IroA-E. coli.

      We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We did not intend to highlight that the adaptive immunity stemmed from IroA-E. coli only, and we intend to build upon current literature that has reported CD8+ T cell elicitation by bacterial therapy. The IroA-E.coli is shown to enhance adaptive immunity. We also did not consider a WT-E. coli control as no WT-E. coli treated group achieved complete tumor regression.

      • The authors claim that CD8+ T cells are mechanistically important in the effects of iron status manipulation in E. coli-mediated tumor suppression (Fig 5). In order to show this, it seems that Fig 5c should include WT-E. coli and WT-E. coli+CD8 ab groups, as it may be that CD8+ T cells play a role in tumor suppression regardless of whether or not iron regulation is being manipulated.

      We apologize for the confusion from our prior writing. We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We did not intend to convey that CD8+ T cells are mechanistically important in the effects of iron status manipulation.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Public Reviews:

      Reviewer 1 (Public Review):

      In this paper, the authors evaluate the utility of brain-age-derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain-age-derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ("brain-cognition") as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      (1) I thank the authors for addressing many of my concerns with this revision. However, I do not feel they have addressed them all. In particular I think the authors could do more to address the concern I raised about the instability of the regression coefficients and about providing enough detail to determine that the stacked regression models do not overfit.

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #1 and #2 (see below).

      (2) In considering my responses to the authors revision, I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. To be fair, these conceptual problems are more widespread than this paper alone, so I do not believe the authors should be penalised for that. However, I would recommend to make these concerns more explicit in the manuscript

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #3 (see below).

      Reviewer 2 (Public Review):

      In this study, the authors aimed to evaluate the contribution of brain-age indices in capturing variance in cognitive decline and proposed an alternative index, brain-cognition, for consideration.

      The study employs suitable methods and data to address the research questions, and the methods and results sections are generally clear and easy to follow.

      I appreciate the authors' efforts in significantly improving the paper, including some considerable changes, from the original submission. While not all reviewer points were tackled, the majority of them were adequately addressed. These include additional analyses, more clarity in the methods and a much richer and nuanced discussion. While recognising the merits of the revised paper, I have a few additional comments.

      (1) Perhaps it would help the reader to note that it might be expected for brain-cognition to account for a significantly larger variance (11%) in fluid cognition, in contrast to brain-age. This stems from the fact that the authors specifically trained brain-cognition to predict fluid cognition, the very variable under consideration. In line with this, the authors later recommend that researchers considering the use of brain-age should evaluate its utility using a regression approach. The latter involves including a brain index (e.g. brain-cognition) previously trained to predict the regression's target variable (e.g. fluid cognition) alongside a brain-age index (e.g., corrected brain-age gap). If the target-trained brain index outperforms the brain-age metric, it suggests that relying solely on brain-age might not be the optimal choice. Although not necessarily the case, is it surprising for the target-trained brain index to demonstrate better performance than brain-age? This harks back to the broader point raised in the initial review: while brain-age may prove useful (though sometimes with modest effect sizes) across diverse outcomes as a generally applicable metric, a brain index tailored for predicting a specific outcome, such as brain-cognition in this case, might capture a considerably larger share of variance in that specific context but could lack broader applicability. The latter aspect needs to be empirically assessed.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (please see our responses to Reviewer 1 Recommendations For The Authors #3 below).

      Briefly, as in our 2nd revision, we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      (2) Furthermore, the discussion pertaining to training brain-age models on healthy populations for subsequent testing on individuals with neurological or psychological disorders seems somewhat one-sided within the broader debate. This one-sidedness might potentially confuse readers. It is worth noting that the choice to employ healthy participants in the training model is likely deliberate, serving as a norm against which atypical populations are compared. To provide a more comprehensive understanding, referencing Tim Hans's counterargument to Bashyam's perspective could offer a more complete view (https://academic.oup.com/brain/article/144/3/e31/6214475?login=false).

      Thank you Reviewer 2 for bringing up this issue. We have now revised the paragraph in question and added nuances on the usage of Brain Age for normative vs. case-control studies. We also cited Tim Hahn’s article that explained the conceptual foundation of the use of Brain Age in case-control studies. Please see below. Additionally, we also made a statement about our study not being able to address issues about the case-control studies directly in the newly written conclusion (see Reviewer 3 Recommendations for the Authors #3).

      Discussion:

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      (3) Overall, this paper makes a significant contribution to the field of brain-age and related brain indices and their utility.

      Thank you for the encouragement.

      Reviewer 3 (Public Review):

      The main question of this article is as follows: "To what extent does having information on brain-age improve our ability to capture declines in fluid cognition beyond knowing a person's chronological age?" This question is worthwhile, considering that there is considerable confusion in the field about the nature of brain-age.

      (1) Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain-age metrics.

      Thank you Reviewer 3 for the comment. We addressed them in our response to Reviewer 3 Recommendations For The Authors #1-3 (see below).

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) I do not feel the authors have fully addressed the concern I raised about the stacked regression models. Despite the new figure, it is still not entirely clear what the authors are using as the training set in the final step. To be clear, the problem occurs because of the parameters, not the hyperparameters (which the authors now state that they are optimising via nested grid search). in other words, given a regression model y = X*beta, if the X are taken to be predictions from a lower level regression model, then they contain information that is derived from both the training set at the test set for the model that this was trained on. If the split is the same (i.e. the predictions are derived on the same test set as is being used at the second level), then this can lead to overfitting. It is not clear to me whether the authors have done this or not. Please provide additional detail to clarify this point.

      Thank you for allowing us an opportunity to clarify our stacked model. We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models. We made additional clarification to make this clearer (see below). Let us explain what we did and provide the rationales below.

      From Methods:

      “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Author response image 1.

      Diagram of the nested cross-validation used for creating predictions for models of each set of features as well as predictions for stacked models.

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      (2) I also do not feel the authors have fully addressed the concern I raised about stability of the regression coefficients over splits of the data. I wanted to see the regression coefficients, not the predictions. The predictions can be stable when the coefficients are not.

      The focus of this article is on the predictions. Still, as pointed out by reviewer 1, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Author response image 2.

      Stability of feature importance (i.e., Elastic Net Coefficients) of prediction models. Each dot represents rank stability (reflected by Spearman’s ρ) in the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, there were 10 Spearman’s ρs for each prediction model. The numbers to the right of the plots indicate the mean of Spearman’s ρ for each prediction model.

      (3) I also must say that I agree with Reviewer 3 about the limitations of the brain-age and brain-cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain-age model that is trained to predict age. This suffers from the same problem the authors raise with brain-age and I agree that this would probably disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain-age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain-cognition.

      Thank you so much for raising this point. Reviewer 2 (Public Review #1) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      Reviewer #3 (Recommendations For The Authors):

      Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to: 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain age metrics.

      (1) I understand your point here. I think the distinction is that it is fine to build predictive models, but then there is no need to go through this intermediate step of "brain-cognition". Just say that brain features can predict cognition XX well, and brain-age (or some related metric) can predict cognition YY well. It creates a confusing framework for the reader that can lead them to believe that "brain-cognition" is not just a predicted value of fluid cognition from a model using brain features to predict cognition. While you clearly state that that is in fact what it is in the text, which is a huge improvement, I do not see what is added by going through brain-cognition instead of simply just obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa, depending on the question. Please do this analysis, and either compare and contrast it with going through "brain-cognition" in your paper, or switch to this analysis, as it more directly addresses the question of the incremental predictive utility of brain-age above and beyond brain features.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 2 (Public Review #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see our responses to Reviewer 1 Recommendations For The Authors #3 above).

      Briefly, as in our 2nd revision, we made it explicitly clear that we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. And, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      We have thought about changing the name Brain Cognition into something along the lines of “predicted values of prediction models predicting fluid cognition based on brain MRI.” However, this made the manuscript hard to follow, especially with the commonality analyses. For instance, the sentence, “Here, we tested Brain Cognition’s unique effects in multiple regression models with a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition” would become “Here, we tested predicted values of prediction models predicting fluid cognition based on brain MRI unique effects in multiple regression models with a Brain Age index, chronological age and predicted values of prediction models predicting fluid cognition based on brain MRI as regressors to explain fluid cognition.” We believe, given our additional explanation (see our responses to Reviewer 1 Recommendations For The Authors #3 above), readers should understand what Brain Cognition is, and that we did not intend to compare Brain Age and Brain Cognition directly.

      As for the suggested analysis, “obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa,” we have already done this in the form of commonality analysis (Nimon et al., 2008) (see Figure 7 below). That is, to obtain unique and common effects of the regressors, we need to look at all of the possible changes in R2 when all possible subsets of regressors were excluded or included, see equations 12 and 13 below.

      From Methods:

      “Similar to the above multiple regression model, we had chronological age, each Brain Age index and Brain Cognition as the regressors for fluid cognition:

      Fluid Cognitioni = β0 + β1 Chronological Agei + β2 Brain Age Indexi,j + β3 Brain Cognitioni + εi, (12)

      Applying the commonality analysis here allowed us, first, to investigate the addictive, unique effects of Brain Cognition, over and above chronological age and Brain Age indices. More importantly, the commonality analysis also enabled us to test the common, shared effects that Brain Cognition had with chronological age and Brain Age indices in explaining fluid cognition. We calculated the commonality analysis as follows (Nimon et al., 2017):

      Unique Effectchronological age = ΔR2chronological age = R2chronological age, Brain Age index, Brain Cognition – R2 Brain Age index, Brain Cognition

      Unique EffectBrain Age index = ΔR2Brain Age index = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Cognition

      Unique EffectBrain Cognition = ΔR2Brain Cognition = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Age Index

      Common Effectchronological age, Brain Age index = R2chronological age, Brain Cognition + R2 Brain Age index, Brain Cognition – R2 Brain Cognition – R2chronological age, Brain Age index, Brain Cognition

      Common Effectchronological age, Brain Cognition = R2chronological age, Brain Age Index + R2 Brain Age index, Brain Cognition – R2 Brain Age Index – R2chronological age, Brain Age index, Brain Cognition

      Common Effect Brain Age index, Brain Cognition = R2chronological age, Brain Age Index + R2 chronological age, Brain Cognition – R2 chronological age – R2chronological age, Brain Age index, Brain Cognition

      Common Effect chronological age, Brain Age index, Brain Cognition = R2 chronological age + R2 Brain Age Index + R2 Brain Cognition – R2chronological age, Brain Age Index – R2 chronological age, Brain Cognition – R2 Brain Age Index, Brain Cognition – R2chronological age, Brain Age index, Brain Cognition , (13)”

      (2) I agree that the solution is not to exclude age as a covariate, and that there is a big difference between inevitable and obvious. I simply think a further discussion of the inevitability of the results would be clarifying for the readers. There is a big opportunity in the brain-age literature to be as direct as possible about why you are finding what you are finding. People need to know not only what you found, but why you found what you found.

      Thank you. We agreed that we need to make this point more explicit and direct. In the revised manuscript, we had the statements in both Introduction and Discussion (see below) about the tight relationship between Brain Age and chronological age by design, making the small unique effects of Brain Age inevitable.

      Introduction:

      “Accordingly, by design, Brain Age is tightly close to chronological age. Because chronological age usually has a strong relationship with fluid cognition, to begin with, it is unclear how much Brain Age adds to what is already captured by chronological age.“

      Discussion:

      “First, Brain Age itself did not add much more information to help us capture fluid cognition than what we had already known from a person’s chronological age. This can clearly be seen from the small unique effects of Brain Age indices in the multiple regression models having Brain Age and chronological age as the regressors. While the unique effects of some Brain Age indices from certain age-prediction models were statistically significant, there were all relatively small. Without Brain Age indices, chronological age by itself already explained around 32% of the variation in fluid cognition. Including Brain Age indices only added around 1.6% at best. We believe the small unique effects of Brain Age were inevitable because, by design, Brain Age is tightly close to chronological age. Therefore, chronological age and Brain Age captured mostly a similar variation in fluid cognition.

      Investigating the simple regression models and the commonality analysis between each Brain Age index and chronological age provided additional insights….”

      (3) I believe it is very important to critically examine the use of brain-age and related metrics. As part of this process, I think we should be asking ourselves the following questions (among others): Why go through age prediction? Wouldn't the predictions of cognition (or another variable) using the same set of brain features always be as good or better? You still have not justified the use of brain-age. As I said before, if you are going to continue to recommend the use of brain-age, you need a very strong argument for why you are recommending this. What does it truly add? Otherwise, temper your statements to indicate possible better paths forward.

      Thank you Reviewer 3 for making an argument against the use of Brain Age. We largely agree with you. However, our work only focuses on one phenotype, fluid cognition, and on the normative situation (i.e., not having a case vs control group). As Reviewer 2 pointed out, Brain Age might still have utility in other cases, not studied here. Still, future studies that focus on other phenotypes may consider using our approach as a template to test the utility of Brain Age in other situations. We added the conclusion statement to reflect this.

      From Discussion:

      “Altogether, we examined the utility of Brain Age as a biomarker for fluid cognition. Here are the three conclusions. First, Brain Age failed to add substantially more information over and above chronological age. Second, a higher ability to predict chronological age did not correspond to a higher utility to capture fluid cognition. Third, Brain Age missed up to around one-third of the variation in fluid cognition that could have been explained by brain MRI. Yet, given our focus on fluid cognition, future empirical research is needed to test the utility of Brain Age on other phenotypes, especially when Brain Age is used for anomaly detection in case-control studies (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We hope that future studies may consider applying our approach (i.e., using the commonality analysis that includes predicted values from a model that directly predicts the phenotype of interest) to test the utility of Brain Age as a biomarker for other phenotypes.”

      References

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers praised multiple aspects of our study. Reviewer 1 noted that “the work aligns well with current research trends and will greatly interest researchers in the field.” Reviewer 2 highlighted the unique capability of our imaging approach, which “allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry.” Reviewer 3 commented that “the experiments are beautifully executed” and “are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before.”

      In addition to the positive feedback, the reviewers also provided useful criticisms and suggestions, some of which may not be fully addressed in a single study. For instance, questions regarding whether dopamine axons encode the valence or specific identity of the stimuli, or the most salient aspects of the environment, remain open. At the same time, as all the reviewers agreed, our report on the diversity of dopamine axonal responses using a novel imaging design introduces significant new insights to the neuroscience community. Following the reviewers’ recommendations, we have refrained from making interpretations that could be perceived as overinterpretation, such as concluding that “dopamine axons are involved in aversive processing.” This has necessitated extensive revisions, including modifying the title of our manuscript to make clear that the novelty of our work is revealing ‘functional diversity’ using our new imaging approach.

      Below, we respond to the reviewers’ comments point by point.

      eLife assessment

      This valuable study shows that distinct midbrain dopaminergic axons in the medial prefrontal cortex respond to aversive and rewarding stimuli and suggest that they are biased toward aversive processing. The use of innovative microprism based two-photon calcium imaging to study single axon heterogeneity is solid, although the experimental design could be optimized to distinguish aversive valence from stimulus salience and identity in this dopamine projection. This work will be of interest to neuroscientists working on neuromodulatory systems, cortical function and decision making.

      Reviewer #1

      Summary:

      In this manuscript, Abe and colleagues employ in vivo 2-photon calcium imaging of dopaminergic axons in the mPFC. The study reveals that these axons primarily respond to unconditioned aversive stimuli (US) and enhance their responses to initially-neutral stimuli after classical association learning. The manuscript is well-structured and presents results clearly. The utilization of a refined prism-based imaging technique, though not entirely novel, is well-implemented. The study's significance lies in its contribution to the existing literature by offering single-axon resolution functional insights, supplementing prior bulk measurements of calcium or dopamine release. Given the current focus on neuromodulator neuron heterogeneity, the work aligns well with current research trends and will greatly interest researchers in the field.

      However, I would like to highlight that the authors could further enhance their manuscript by addressing study limitations more comprehensively and by providing essential details to ensure the reproducibility of their research. In light of this, I have a number of comments and suggestions that, if incorporated, would significantly contribute to the manuscript's value to the field.

      Strengths:

      • Descriptive.

      • Utilization of a well-optimized prism-based imaging method.

      • Provides valuable single-axon resolution functional observations, filling a gap in existing literature.

      • Timely contribution to the study of neuromodulator neuron heterogeneity.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      (1) It's important to fully discuss the fact that the measurements were carried out only on superficial layers (30-100um), while major dopamine projections target deep layers of the mPFC as discussed in the cited literature (Vander Weele et al., 2018) and as illustrated in FigS1B,C. This limitation should be explicitly acknowledged and discussed in the manuscript, especially given the potential functional heterogeneity among dopamine neurons in different layers. This potential across-layer heterogeneity could also be the cause of discrepancy among past recording studies with different measurement modalities. Also, mentioning technical limitations would be informative. For example: how deep the authors can perform 2p-imaging through the prism? was the "30-100um" maximum depth the authors could get?

      Thank you for pointing out this important issue about layer differences.

      It is possible that the mesocortial pathway has layer-specific channels, with some neurons targeting supra granular layers and others targeting infragranular ones. Alternatively, it is also plausible that the axons of the same neurons branch into both superficial and deep layers. This is a critical issue that has not been investigated in anatomical studies and will require single-cell labeling of dopamine neurons (Matsuda et al 2009 and Aransay et al 2015). We now discuss this issue in the Discussion.

      As for the imaging depth of 30–100 m, we were unable to visualize deeper axons in a live view mode. Our imaging system has already been optimized to detect weak signals (e.g., we have employed an excitation wavelength of 980 nm, dispersion compensation, and a hybrid photodetector). It is possible that future studies using improved imaging approaches may be able to visualize deeper layers. Importantly, sparse axons in the supragranular layers are advantageous in detecting weak signals; dense labeling of axons would increase the background fluorescence relative to signals. We now reference this layer issue in the Results and Discussion sections.

      (2) In the introduction, it seems that the authors intended to refer to Poulin et al. 2018 regarding molecular/anatomical heterogeneity of dopamine neurons, but they inadvertently cited Poulin et al. 2016 (a general review on scRNAseq). Additionally, the statement that "dopamine neurons that project to the PFC show unique genetic profiles (line 85)" requires clarification, as Poulin et al. 2018 did not specifically establish this point. Instead, they found at least the Vglut2/Cck+ population projects into mPFC, and they did not reject the possibility of other subclasses projecting to mPFC. Rather, they observed denser innervation with DAT-cre, suggesting that non-Vglut2/Cck populations would also project to mPFC. Discuss the potential molecular heterogeneity among mPFC dopamine axons in light of the sampling limitation mentioned earlier.

      We thank the reviewer for pointing this out. Genetic profiles of PFC-projecting DA neurons are still being investigated, so describing them as “unique” was misleading. We have edited the Introduction accordingly, and now discuss this issue in detail in the Discussion.

      (3) I find the data presented in Figure 2 to be odd. Firstly, the latency of shock responses in the representative axons (right panels of G, H) is consistently very long - nearly 500ms. It raises a query whether this is a biological phenomenon or if it stems from a potential technical artifact, possibly arising from an issue in synchronization between the 2-photon imaging and stimulus presentation. My reservations are compounded by the notable absence of comprehensive information concerning the synchronization of the experimental system in the method section.

      The synchronization of the stimulus and data acquisition is accomplished at a sub-millisecond resolution. We use a custom-made MATLAB program that sends TTL commands to standard imaging software (ThorImage or ScanImage) and a stimulator for electrical shocks. All events are recorded as analogue inputs to a different DAQ to ensure synchronization. We have provided additional details regarding the configuration in the Methods section.

      We consider that the long latency of shock response is biological. For instance, a similar long latency was found after electrical shock in a photometry imaging study (Kim, …, Deisseroth, 2016).

      Secondly, there appear to be irregularities in Panel J. While the authors indicate that "Significant axons were classified as either reward-preferring (cyan) or aversive-preferring (magenta), based on whether the axons are above or below the unity line of the reward/aversive scatter plot (Line 566)," a cyan dot slightly but clearly deviates above the unity line (around coordinates (x, y) = (20, 21)). This needs clarification. Lastly, when categorizing axons for analysis of conditioning data in Fig3 (not Fig2), the authors stated "The color-coded classification (cyan/magenta) was based on k-means clustering, using the responses before classical conditioning (Figure 2J)". I do not understand why the authors used different classification methods for two almost identical datasets.

      We thank the reviewer for pointing out these insufficient descriptions. We classified the axons using k-means clustering, and the separation of the two clusters happened to roughly coincide with the unity line of the reward/aversive scatter plot in Fig 2J. In other words, we did not use the unity line to classify the data points (which is why the color separation of the histogram is not at 45 degrees). We have clarified this point in the Methods section.

      (4) In connection with Point 3, conducting separate statistical analyses for aversive and rewarding stimuli would offer a fairer approach. This could potentially reveal a subset of axons that display responses to both aversive and appetitive stimuli, aligning more accurately with the true underlying dynamics. Moreover, the characterization of Figure 2J as a bimodal distribution while disregarding the presence of axons responsive to both aversive and appetitive cues seems somewhat arbitrary and circular logic. A more inclusive consideration of this dual-responsive population could contribute to a more comprehensive interpretation.

      We also attempted k-means clustering with additional dimensions (e.g., temporal domains as shown in Fig. 3I, J), but no additional clusters were evident. We note that the lack of other clusters does not exclude the possibility of their existence, which may only become apparent with a substantial increase in the number of samples. In the current report, we present the clusters that were the easiest/simplest for us to identify.

      Additionally, we have revised our manuscript to reflect that many axons respond to both reward and aversive stimuli, and that aversive-preferring axons do not exclusively respond to the aversive stimulus.

      (5) The contrast in initialization to novel cues between aversive and appetitive axons mirrors findings in other areas, such as the tail-of-striatum (TS) and ventral striatum (VS) projecting dopamine neurons (Menegas et al., 2017, not 2018). You might consider citing this very relevant study and discussing potential collateral projections between mPFC and TS or VS.

      Thank you for pointing this out. We have now included Menegas et al., 2017, and also discuss the possibility of collaterals to these areas. In addition, we also referred to Azcorra et al., 2023 - this was published after our initial submission.

      (6) The use of correlation values (here >0.65) to group ROIs into axons is common but should be justified based on axon density in the FOV and imaging quality. It's important to present the distribution of correlation values and demonstrate the consistency of results with varying cut-off values. Also, provide insights into the reliability of aversive/appetitive classifications for individual ROIs with high correlations. Importantly, if you do the statistical testing and aversive/appetitive classifications for individual ROIs with above-threshold high correlation (to be grouped into the same axon), do they always fall into the same category? How many false positives/false negatives are observed?


      "Our results remained similar for different correlation threshold values (Line 556)" (data not shown) is obsolete.

      We have conducted additional analysis using correlation values 0.5 and 0.3 that resulted in a smaller number of axon terminals. In essence, the relationship between reward responses and aversive responses remained very similar to Fig. 2J, K.

      Author response image 1.

      Reviewer #2 (Public Review):

      Summary:

      This study aims to address existing differences in the literature regarding the extent of reward versus aversive dopamine signaling in the prefrontal cortex. To do so, the authors chose to present mice with both a reward and an aversive stimulus during different trials each day. The authors used high spatial resolution two-photon calcium imaging of individual dopaminergic axons in the medial PFC to characterize the response of these axons to determine the selectivity of responses in unique axons. They also paired the reward (water) and an aversive stimulus (tail shock) with auditory tones and recorded across 12 days of associative learning.

      The authors find that some axons respond to both reward and aversive unconditioned stimuli, but overall, there is a strong preference to respond to aversive stimuli consistent with expectations from prior studies that used other recording methods. The authors find that both of their two auditory stimuli initially drive responses in axons, but that with training axons develop more selective responses for the shock associated tone indicating that associative learning led to changes in these axon's responses. Finally, the authors use anticipatory behaviors during the conditioned stimuli and facial expressions to determine stimulus discrimination and relate dopamine axons signals with this behavioral evidence of discrimination. This study takes advantage of cutting-edge imaging approaches to resolve the extent to which dopamine axons in PFC respond appetitive or aversive stimuli. They conclude that there is a strong bias to respond to the aversive tail shock in most axons and weaker more sparse representation of water reward.

      Strengths:

      The strength of this study is the imaging approach that allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry which provide a measure of the average population activity. The use of appetitive and aversive stimuli to probe responses across individual axons is another strength.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      A weakness of this study is the design of the associative conditioning paradigm. The use of only a single reward and single aversive stimulus makes it difficult to know whether these results are specific to the valence of the stimuli versus the specific identity of the stimuli. Further, the reward presentations are more numerous than the aversive trials making it unclear how much novelty and habituation account for results. Moreover, the training seems somewhat limited by the low number of trials and did not result in strong associative conditioning. The lack of omission responses reported may reflect weak associative conditioning. Finally, the study provides a small advance in our understanding of dopamine signaling in the PFC and lacks evidence for if and what might be the consequence of these axonal responses on PFC dopamine concentrations and PFC neuron activity.

      We thank the reviewer for the suggestions.

      We agree that interpreting the response change during classical conditioning is not straightforward. Although the reward and aversive stimuli we employed are commonly used in the field, future studies with more sophisticated paradigms will be necessary to address whether dopamine axons encode the valence of the stimuli, the specific identity of the stimuli, or novelty and habituation. In our current manuscript, we refrain from making a conclusion that distinct groups of neurons encode different valances. In fact, many axons respond to both stimuli, at different ratios. We have removed descriptions that may suggest exclusive coding of reward or aversive processing. Additionally, we have extensively discussed possible interpretations.

      In terms of the strength of the conditioning association, behavioral results indicated that the learning plateaued – anticipatory behaviors did not increase during the last two phases when the conditioned span was divided into six phases (Figure 3–figure supplement 1).

      Our goal in the current manuscript is to provide new insight into the functional diversity of dopamine axons in the mPFC. Investigating the impact of dopamine axons on local dopamine concentration and neural activity in the mPFC is important but falls beyond the scope of our current study. In particular, given the functional diversity of dopamine axons, interpreting bulk optogenetic or chemogenetic axonal manipulation experiments would not be straightforward. As suggested, measuring the dopamine concentration through two-photon imaging of dopamine sensors and monitoring the activity of dopamine recipient neurons (e.g., D1R- or D2R-expressing neurons) is a promising approach that we plan to undertake in the near future.

      Reviewer #3 (Public Review):

      Summary:

      The authors image dopamine axons in medial prefrontal cortex (mPFC) using microprism-mediated two-photon calcium imaging. They image these axons as mice learn that two auditory cues predict two distinct outcomes, tailshock or water delivery. They find that some axons show a preference for encoding of the shock and some show a preference for encoding of water. The authors report a greater number of dopamine axons in mPFC that respond to shock. Across time, the shock-preferring axons begin to respond preferentially to the cue predicting shock, while there is a less pronounced increase in the water-responsive axons that acquire a response to the water-predictive cue (these axons also increase non-significantly to the shock-predictive cue). These data lead the authors to argue that dopamine axons in mPFC preferentially encode aversive stimuli.

      Strengths:

      The experiments are beautifully executed and the authors have mastered an impressively complex technique. Specifically, they are able to image and track individual dopamine axons in mPFC across days of learning. This technique is used the way it should be: the authors isolate distinct dopamine axons in mPFC and characterize their encoding preferences and how this evolves across learning of cue-shock and cue-water contingencies. Thus, these experiments are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before. This is timely and important.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      The overarching conclusion of the paper is that dopamine axons preferentially encode aversive stimuli. This is prevalent in the title, abstract, and throughout the manuscript. This is fundamentally confounded. As the authors point out themselves, the axonal response to stimuli is sensitive to outcome magnitude (Supp Fig 3). That is, if you increase the magnitude of water or shock that is delivered, you increase the change in fluorescence that is seen in the axons. Unsurprisingly, the change in fluorescence that is seen to shock is considerably higher than water reward.

      We agree that the interpretation of our results is not straightforward. Our current manuscript now focuses on our strength, which is reporting the functional diversity of dopamine axons. Therefore, we avoid using the word ‘encode’ when describing the response.

      We believe that our results could reconcile the apparent discrepancy as to why some previous studies reported only aversive responses while others reported reward responses. In particular, if the reward volume were very small, the reward response could go undetected.

      Further, when the mice are first given unexpected water delivery and have not yet experienced the aversive stimuli, over 40% of the axons respond [yet just a few lines below the authors write: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards", which seems inconsistent with their own data].

      We always recorded the reward and aversive response together, which might have confused the reviewer. Therefore, there is no inconsistency in our data. We have clarified our methods and reasoning accordingly.

      Given these aspects of the data, it could be the case that the dopamine axons in mPFC encodes different types of information and delegates preferential processing to the most salient outcome across time.

      This is certainly an exciting interpretation, so we have included it in our discussion. Meanwhile, ‘the most salient outcome’ alone cannot fully capture the diverse response patterns of the dopaminergic axons, particularly reward-preferring axons. We discuss our findings in more detail in the revised manuscript.

      The use of two similar sounding tones (9Khz and 12KHz) for the reward and aversive predicting cues are likely to enhance this as it requires a fine-grained distinction between the two cues in order to learn effectively. There is considerable literature on mPFC function across species that would support such a view. Specifically, theories of mPFC function (in particular prelimbic cortex, which is where the axon images are mostly taken) generally center around resolution of conflict in what to respond, learn about, and attend to. That is, mPFC is important for devoting the most resources (learning, behavior) to the most relevant outcomes in the environment. This data then, provides a mechanism for this to occur in mPFC. That is, dopamine axons signal to the mPFC the most salient aspects of the environment, which should be preferentially learned about and responded towards. This is also consistent with the absence of a negative prediction error during omission: the dopamine axons show increases in responses during receipt of unexpected outcomes, but do not encode negative errors. This supports a role for this projection in helping to allocate resources to the most salient outcomes and their predictors, and not learning per se. Below are a just few references from the rich literature on mPFC function (some consider rodent mPFC analogous to DLPFC, some mPFC), which advocate for a role in this region in allocating attention and cognitive resources to most relevant stimuli, and do not indicate preferential processing of aversive stimuli.

      Distinguishing between 9 kHz and 12 kHz sound tones may not be that difficult, considering anticipatory licking and running are differentially manifested. In addition, previous studies have shown that mice can distinguish between two sound tones when they are separated by 7% (de Hoz and Nelken 2014). Nonetheless, we agree with the attractive interpretation that “the mPFC devotes the most resources (learning, behavior) to the most relevant outcomes in the environment” and that dopamine is a mechanism for this. Therefore, we discuss this interpretation in the revised text.

      References:

      (1) Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167-202.

      (2) Bissonette, G. B., Powell, E. M., & Roesch, M. R. (2013). Neural structures underlying set-shifting: roles of medial prefrontal cortex and anterior cingulate cortex. Behavioural brain research, 250, 91101.

      (3) Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, 18(1), 193-222.

      (4) Sharpe, M. J., Stalnaker, T., Schuck, N. W., Killcross, S., Schoenbaum, G., & Niv, Y. (2019). An integrated model of action selection: distinct modes of cortical control of striatal decision making. Annual review of psychology, 70, 53-76.

      (5) Ridderinkhof, K. R., Ullsperger, M., Crone, E. A., & Nieuwenhuis, S. (2004). The role of the medial frontal cortex in cognitive control. science, 306(5695), 443-447.

      (6) Nee, D. E., Kastner, S., & Brown, J. W. (2011). Functional heterogeneity of conflict, error, taskswitching, and unexpectedness effects within medial prefrontal cortex. Neuroimage, 54(1), 528-540.

      (7) Isoda, M., & Hikosaka, O. (2007). Switching from automatic to controlled action by monkey medial frontal cortex. Nature neuroscience, 10(2), 240-248.

      Reviewer #1 (Recommendations For The Authors):

      Specific Suggestions and Questions on the Methods Section:

      In general, the methods part is not well documented and sometimes confusing. Thus, as it stands, it hinders reproducible research. Specific suggestions/questions are listed in the following section.

      (1) Broussard et al. 2018 introduced axon-GCaMP6 instead of axon-jGCaMP8m. The authors should provide details about the source of this material. If it was custom-made, a description of the subcloning process would be appreciated. Additionally, consider depositing sequence information or preferably the plasmid itself. Furthermore, the introduction of the jGCaMP8 series by Zhang, Rozsa, et al. 2023 should be acknowledged and referenced in your manuscript.

      We thank the reviewer for pointing this out. We have now included details on how we prepared the axon-jGCaMP8m, which was based on plasmids available at Addgene. Additionally, we have deposited our construct to Addgene ( https://www.addgene.org/216533/ ). We have also cited Janelia’s report on jGCaMP8, Zhang et al.

      (2) The authors elaborate on the approach taken for experimental synchronization. Specifically, how was the alignment achieved between 2-photon imaging, treadmill recordings, aversive/appetitive stimuli, and videography? It would be important to document the details of the software and hardware components employed for generating TTLs that trigger the pump, stimulator, cameras, etc.

      We have now included a more detailed explanation about the timing control. We utilize a custommade MATLAB program that sends TTL square waves and analogue waves via a single National Instruments board (USB-6229) to control two-photon image acquisition, behavior camera image acquisition, water syringe movement, current flow from a stimulator, and sound presentation. We also continuously recorded at 30 kHz via a separate National Instrument board (PCIe-6363) the frame timing of two-photon imaging, the frame timing of a behavior camera, copies of command waves (sent to the syringe pump, the stimulator, and the speaker), and signals from the treadmill corresponding to running speed.

      (3) The information regarding the cameras utilized in the study presents some confusion. In one instance, you mention, "To monitor licking behavior, the face of each mouse was filmed with a camera at 60 Hz (CM3-U3-13Y3M-CS, FLIR)" (Line 488). However, there's also a reference to filming facial expressions using an infrared web camera (Line 613). Could you clarify whether the FLIR camera (which is an industrial CMOS not a webcam) is referred to as a webcam? Alternatively, if it's a different camera being discussed, please provide product details, including pixel numbers and frame rate for clarity.

      We thank the reviewer for pointing this out. This was a mistake on our end. The camera used in the current project was a CM3-U3-13Y3M-CS, not a web camera. We have now corrected this.

      (4) Please provide more information about the methodology employed for lick detection. Specifically, did the authors solely rely on videography for this purpose? If so, why was an electrical (or capacitive) detector not used? It would provide greater accuracy in detecting licking.

      Lick detection was performed offline based on videography, using DeepLabCut. As licking occurs at a frequency of ~6.5 Hz (Xu, …, O’Connor Nature Neurosci, 2022), the movement can be detected at a frame rate of 60 Hz. Initially, we used both a lick sensor and videography. However, we favored videography because it could potentially provide non-binary information.

      Other Minor Points:

      (5) Ensure consistency in the citation format; both Vander Weele et al. 2018 and Weele et al. 2019, share the same first author.

      Thank you for pointing this out. Endnote processes the first author’s name differently depending on the journal. We fixed the error manually. The first paper (2018) is an original research paper, and the second one (2019) is a review about how dopamine modulates aversive processing in the mPFC. We cited the second one in three instances where we mentioned review papers.

      (6) The distinction between "dashed vs dotted lines" in Figure 3K and 3M appears to be very confusing. Please consider providing a clearer visualization/labeling to mitigate this confusion.

      We have now changed the line styles.

      (7) Additionally plotting mean polar angles of aversive/appetitive axons as vectors in the Cartesian scatter plots (2J, 3I,J) would make interpretation easier.

      We have now made this change to Figures 2, 3, 4.

      (8) Data and codes should be shared in a public database. This is important for reproducible research and we believe that "available from the corresponding author upon reasonable request" is outdated language.

      We have uploaded the data to GitHub, https://github.com/pharmedku/2024-elife-da-axon.

      Reviewer #2 (Recommendations For The Authors):

      (1) Authors don't show which mouse each axon data comes from making it hard to know if differences arise from inter-mouse differences vs differences in axons. The best way to address this point is to show similar plots as Figure 2J & K but broken down by mouse to shows whether each mouse had evidence of these two clusters.

      We have now made this change to Figure 2-figure supplement 3.

      (2) Line 166: Should this sentence point to panels 2F, G, H rather than 2I which doesn't show a shock response?

      We thank the reviewer for pointing this out. We have fixed the incorrect labels.

      Line 195: The population level bias to aversive stimuli was shown previously using photometry so it is not justified to say "for the first time" regarding this statement.

      We have adjusted this sentences so the claim of ”for the first time” is not associated with the population-level bias.

      (4) The paper lacks a discussion of the potential role that novelty plays in the amplitude of the responses given that tail shocks occur less often that rewards. Is the amplitude of the first reward of the day larger than subsequent rewards? Would tail shock responses decay if they occurred in sequential trials?

      Following the reviewer's suggestion, we conducted a comparison of individual axonal responses to both conditioned and unconditioned stimuli across the first trial and subsequent trials. Our findings reveal a notable trend: aversive-preferring axons exhibited attenuation in response to CSreward, yet enhancement in response to CSaversive. Conversely, the response of these axons to USreward was attenuated, with no significant change observed for USaversive. In contrast, reward-preferring axons displayed an invariable activity pattern from the initial trial, highlighting the functional diversity present within dopamine axons. This analysis has been integrated into Figure 3-figure supplement 4 and is elaborated upon in the Discussion section.

      (5) Fix typo in Figure 1 - supplement 1. Shift

      We have now corrected this. Thank you.

      (6) The methods section needs information about trial numbers. Please indicate how many trials were presented to each mouse per day.

      We have now added the information about trial numbers to the Methods section.

      Reviewer #3 (Recommendations For The Authors):

      In line with the public review, my recommendation is for the authors to remain as objective about their data as possible. There are many points in the manuscript where the authors seem to directly contradict their own data. For example, they first detail that dopamine axons respond to unexpected water rewards. Indeed, they find that there are 40% of dopamine axons that respond in this way. Then, a few paragraphs later they state: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards". As detailed above, I do not think these data support an idea that dopamine axons in mPFC preferentially encode aversive outcomes. If the authors wanted to examine a role for mPFC in preferential encoding of aversive stimuli, you would first have to equate the outcomes by magnitude and then compare how the axons acquire preferences across time. Alternatively, a prediction of a more general process that I detail above would predict that you could give mice two rewards that differ in magnitude (e.g., lots of food vs. small water) and you would see the same results that the authors have seen here (i.e., a preference for the food, which is the larger and more salient outcome). Without other tests of how dopamine axons in mPFC respond to situations like this, I don't think any conclusion around mPFC in favoring aversive stimuli can be made.

      As suggested, we have made the current manuscript as objective as possible, removing interpretation aspects regarding what dopamine axons encode and emphasizing their functional diversity. In particular, we remove the word ‘encode’ when describing the response of dopamine axons.

      Although it may have appeared unclear, there was no contradiction within our data regarding the response to reward and aversive stimuli. We have now improved the readability of the Results and Methods sections. Concerning the interpretation of what exactly the mPFC dopamine axons encode, we have rewritten the discussion to be as objective about our data as possible, as suggested. We also have edited our title and abstract accordingly. Meanwhile, we wish to emphasize that our reward and aversive stimuli are standard paradigms commonly used in the field. We believe, and all the reviewers agreed, that reporting the diversity of dopamine axonal responses with a novel imaging design constitutes new insight for the neuroscience community. Therefore, we have decided to leave the introduction of new behavioral tasks for future studies and instead expanded our discussion.

      As mentioned, I think the experiments are executed really well and the technological aspects of the authors' methods are impressive. However, there are also some aspects of the data presentation that would be improved. Some of the graphs took a considerable amount of effort to unpack. For example, Figure 4 is hard going. Is there a way to better illustrate the main points that this figure wants to convey? Some of this might be helped by a more complete description in the figure captions about what the data are showing. It would also be great to see how the response of dopamine axons changes across trial within a session to the shock and water-predictive cues. Supp Figure 1 should be in the main text with standard error and analyses across time. Clarifying these aspects of the data would make the paper more relevant and accessible to the field.

      We thank the reviewer for pointing out that the legend of Figure 4 was incomplete. We have fixed it, along with improving the presentation of the figure. We have also prepared a new figure (Figure 3– figure supplement 4) to compare CSaversive and CSreward signals for the first and rest of the trials within daily sessions, revealing further functional diversity in dopamine axons. We have decided to keep Figure 1–figure supplement 2 as a figure supplement with an additional analysis, as another reviewer pointed out that the design is not completely new. Furthermore, as eLife readers can easily access figure supplements, we believe it is appropriate to maintain it in this way.

      Minor points:

      (1) What is the control period for the omission test? Was omission conducted for the shock?

      The control period for reward omission is a 2-second period just before the CS onset. We did not include shock omission, because a sufficient number of trials (> 6 trials) for the rare omission condition could not be achieved within a single day.

      (2) The authors should mention how similar the tones were that predicted water and shock.

      According to de Hoz and Nelken (2014), a frequency difference of 4–7% is enough for mice to discriminate between tones. In addition, anticipatory licking and running confirmed that the mice could discriminate between the frequencies. We have now included this information in the Discussion.

      (3) I realize the viral approach used in the current studies may not allow for an idea of where in VTA dopamine neurons are that project to mPFC- is there data in the literature that speak to this? Particularly important as we now know that there is considerable heterogeneity in dopamine neuronal responses, which is often captured by differences in medial/lateral position within VTA.

      Some studies have suggested that mesocortical dopamine neurons are located in the medial posterior VTA (e.g., Lammel et al., 2008). However, in mouse anterograde tracing, it is not possible to spatially confine the injection of conventional viruses/tracers. We now refer to Lammel et al., 2008 in the Introduction.

    1. Author response:

      eLife assessment

      This study provides valuable information on the mechanism of PepT2 through enhanced-sampling molecular dynamics, backed by cell-based assays, highlighting the importance of protonation of selected residues for the function of a proton-coupled oligopeptide transporter (hsPepT2). The molecular dynamics approaches are convincing, but with limitations that could be addressed in the manuscript, including lack of incorporation of a protonation coordinate in the free energy landscape, possibility of protonation of the substrate, errors with the chosen constant pH MD method for membrane proteins, dismissal of hysteresis emerging from the MEMENTO method, and the likelihood of other residues being affected by peptide binding. Some changes to the presentation could be considered, including a better description of pKa calculations and the inclusion of error bars in all PMFs. Overall, the findings will appeal to structural biologists, biochemists, and biophysicists studying membrane transporters.

      We would like to express our gratitude to the reviewers for providing their feedback on our manuscript, and also for recognising the variety of computational methods employed, the amount of sampling collected and the experimental validation undertaken. Following the individual reviewer comments, as addressed point-by-point below, we will shortly prepare a revised version of this paper. Intended changes to the revised manuscript are marked up in bold font in the detailed responses below, but before that we address some of the comments made above in the general assessment:

      • “lack of incorporation of a protonation coordinate in the free energy landscape”. We acknowledge that of course it would be highly desirable to treat protonation state changes explicitly and fully coupled to conformational changes. However, at this point in time, evaluating such a free energy landscape is not computationally feasible (especially considering that the non-reactive approach taken here already amounts to almost 1ms of total sampling time). Previous reports in the literature tend to focus on either simpler systems or a reduced subset of a larger problem. As we were trying to obtain information on the whole transport cycle, we decided to focus here on non-reactive methods.

      • “possibility of protonation of the substrate”. The reviewers are correct in pointing out this possibility, which we had not discussed explicitly in our manuscript. Briefly, while we describe a mechanism in which protonation of only protein residues (with an unprotonated ligand) can account for driving all the necessary conformational changes of the transport cycle, there is some evidence for a further intermediate protonation site in our data (as we commented on in the first version of the manuscript as well), which may or may not be the substrate itself. A future explicit treatment of the proton movements through the transporter, when it will become computationally tractable to do so, will have to include the substrate as a possible protonation site; for the present moment, we will amend our discussion to alert the reader to the possibility that the substrate could be an intermediate to proton transport. This has repercussions for our study of the E56 pKa value, where – if protons reside with a significant population at the substrate C-terminus – our calculated shift in pKa upon substrate binding could be an overestimate, although we would qualitatively expect the direction of shift to be unaffected. However, we also anticipate that treating this potential coupling explicitly would make convergence of any CpHMD calculation impractical to achieve and thus it may be the case that for now only a semi-quantitative conclusion is all that can be obtained.

      • “errors with the chosen constant pH MD method for membrane proteins”. We acknowledge that – as reviewer #1 has reminded us – the AMBER implementation of hybrid-solvent CpHMD is not rigorous for membrane proteins, and as such we will add a cautionary note to our paper. We will also explain how the use of the ABFE thermodynamic cycle calculations helps to validate the CpHMD results in a completely orthogonal manner (we will promote this validation which was in the supplementary figures into the main text in the revised version). We therefore remain reasonably confident in the results presented with regards to the reported pKa shift of E56 upon substrate binding, and suggest that if the impact of neglecting the membrane in the implicit-solvent stage of CpHMD is significant, then there is likely an error cancellation when considering shifts induced by the incoming substrate.

      • “dismissal of hysteresis emerging from the MEMENTO method”. We have shown in our method design paper how the use of the MEMENTO method drastically reduces hysteresis compared to steered MD and metadynamics for path generation, and find this improvement again for PepT2 in this study. We will address reviewer #3’s concern about our presentation on this point by revising our introduction of the MEMENTO method, as detailed in the response below.

      • “the likelihood of other residues being affected by peptide binding”. In this study, we have investigated in detail the involvement of several residues in proton-coupled di-peptide transport by PepT2. Short of the potential intermediate protonation site mentioned above, the set of residues we investigate form a minimal set of sorts within which the important driving forces of alternating access can be rationalised. We have not investigated in substantial detail here the residues involved in holding the peptide in the binding site, as they are well studied in the literature and ligand promiscuity is not the problem of interest here. It remains entirely possible that further processes contribute to the mechanism of driving conformational changes by involving other residues not considered in this paper. We will make our speculation that an ensemble of different processes may be contributing simultaneously more explicit in our revision, but do not believe any of our conclusions would be affected by this.

      As for the additional suggested changes in presentation, we will provide the requested details on the CpHMD analysis. Furthermore, we will use the convergence data presented separately in figures S12 and S16 to include error bars on our 1D-reprojections of the 2D-PMFs in figures 3, 4 and 5. (Note that we will opt to not do so in figures S10 and S15 which collate all 1D PMF reprojections for the OCC ↔ OF and OCC ↔ IF transitions in single reference plots, respectively, to avoid overcrowding those necessarily busy figures). We are also changing the colours schemes of these plots in our revision to improve accessibility.

      Reviewer #1 (Public Review):

      The authors have performed all-atom MD simulations to study the working mechanism of hsPepT2. It is widely accepted that conformational transitions of proton-coupled oligopeptide transporters (POTs) are linked with gating hydrogen bonds and salt bridges involving protonatable residues, whose protonation triggers gate openings. Through unbiased MD simulations, the authors identified extra-cellular (H87 and D342) and intra-cellular (E53 and E622) triggers. The authors then validated these triggers using free energy calculations (FECs) and assessed the engagement of the substrate (Ala-Phe dipeptide). The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cellbased transport assays. An alternating-access mechanism was proposed. The study was largely conducted properly, and the paper was well-organized. However, I have a couple of concerns for the authors to consider addressing.

      We would like to note here that it may be slightly misleading to the reader to state that “The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cell-based transport assays.” The cellbased transport assays confirmed the importance of the extracellular gating trigger residues H87, S321 and D342 (as mentioned in the preceding sentence), not of the substrate-protonation link as this line might be understood to suggest.

      (1) As a proton-coupled membrane protein, the conformational dynamics of hsPepT2 are closely coupled to protonation events of gating residues. Instead of using semi-reactive methods like CpHMD or reactive methods such as reactive MD, where the coupling is accounted for, the authors opted for extensive non-reactive regular MD simulations to explore this coupling. Note that I am not criticizing the choice of methods, and I think those regular MD simulations were well-designed and conducted. But I do have two concerns.

      a) Ideally, proton-coupled conformational transitions should be modelled using a free energy landscape with two or more reaction coordinates (or CVs), with one describing the protonation event and the other describing the conformational transitions. The minimum free energy path then illustrates the reaction progress, such as OCC/H87D342- → OCC/H87HD342H → OF/H87HD342H as displayed in Figure 3.

      We concur with the reviewer that the ideal way of describing the processes studied in our paper would be as a higher-dimensional free energy landscapes obtained from a simulation method that can explicitly model proton-transfer processes. Indeed, it would have been particularly interesting and potentially informative with regards to the movement of protons down into the transporter in the OF → OCC → IF sequence of transitions. As we note in our discussion on the H87→E56 proton transfer:

      “This could be investigated using reactive MD or QM/MM simulations (both approaches have been employed for other protonation steps of prokaryotic peptide transporters, see Parker et al. (2017) and Li et al. (2022)). However, the putative path is very long (≈ 1.7 nm between H87 and E56) and may or may not involve a large number of intermediate protonatable residues, in addition to binding site water. While such an investigation is possible in principle, it is beyond the scope of the present study.”

      Where even sampling the proton transfer step itself in an essentially static protein conformation would be pushing the boundaries of what has been achieved in the field, we believe that considering the current state-of-the-art, a fully coupled investigation of large-scale conformational changes and proton-transfer reaction is not yet feasible in a realistic/practical time frame. We also note this limitation already when we say that:

      “The question of whether proton binding happens in OCC or OF warrants further investigation, and indeed the co-existence of several mechanisms may be plausible here”.

      Nonetheless, we are actively exploring approaches to treat uptake and movement of protons explicitly for future work.

      In our revision, we will expand on our discussion of the reasoning behind employing a nonreactive approach and the limitations that imposes on what questions can be answered in this study.

      Without including the protonation as a CV, the authors tried to model the free energy changes from multiple FECs using different charge states of H87 and D342. This is a practical workaround, and the conclusion drawn (the OCC→ OF transition is downhill with protonated H87 and D342) seems valid. However, I don't think the OF states with different charge states (OF/H87D342-, OF/H87HD342-, OF/H87D342H, and OF/H87HD342H) are equally stable, as plotted in Figure 3b. The concern extends to other cases like Figures 4b, S7, S10, S12, S15, and S16. While it may be appropriate to match all four OF states in the free energy plot for comparison purposes, the authors should clarify this to ensure readers are not misled.

      The reviewer is correct in their assessment that the aligning of PMFs in these figures is arbitrary; no relative free energies of the PMFs to each other can be estimated without explicit free energy calculations at least of protonation events at the end state basins. The PMFs in our figures are merely superimposed for illustrating the differences in shape between the obtained profiles in each condition, as discussed in the text, and we will make this clear in the appropriate figure captions in our revision.

      b) Regarding the substrate impact, it appears that the authors assumed fixed protonation states. I am afraid this is not necessarily the case. Variations in PepT2 stoichiometry suggest that substrates likely participate in proton transport, like the Phe-Ala (2:1) and Phe-Gln (1:1) dipeptides mentioned in the introduction. And it is not rigorous to assume that the N- and C-termini of a peptide do not protonate/deprotonate when transported. I think the authors should explicitly state that the current work and the proposed mechanism (Figure 8) are based on the assumption that the substrates do not uptake/release proton(s).

      This is indeed an assumption inherent in the current work. While we do “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change” we do not in the current version indicate explicitly that this may involve the substrate. We will make clear the assumption and this possibility in the revised version of our paper. Indeed, as we discuss, there is some evidence in our PMFs of an additional protonation site not considered thus far, which may or may not be the substrate. We will make note of this point in the revised manuscript.

      As for what information can be drawn from the given experimental stoichiometries, we note in our paper that “a 2:1 stoichiometry was reported for the neutral di-peptide D-Phe-L-Ala and 3:1 for anionic D-Phe-L-Glu. (Chen et al., 1999) Alternatively, Fei et al. (1999) have found 1:1 stoichiometries for either of D-Phe-L-Gln (neutral), D-Phe-L-Glu (anionic), and D-Phe-L-Lys (cationic).”

      We do not assume that it is our place to arbit among the apparent discrepancies in the experimental data here, although we believe that our assumed 2:1 stoichiometry is additionally “motivated also by our computational results that indicate distinct and additive roles played by two protons in the conformational cycle mechanism”.

      (2) I have more serious concerns about the CpHMD employed in the study.

      a) The CpHMD in AMBER is not rigorous for membrane simulations. The underlying generalized Born model fails to consider the membrane environment when updating charge states. In other words, the CpHMD places a membrane protein in a water environment to judge if changes in charge states are energetically favorable. While this might not be a big issue for peripheral residues of membrane proteins, it is likely unphysical for internal residues like the ExxER motif. As I recall, the developers have never used the method to study membrane proteins themselves. The only CpHMD variant suitable for membrane proteins is the membrane-enabled hybrid-solvent CpHMD in CHARMM. While I do not expect the authors to redo their CpHMD simulations, I do hope the authors recognize the limitations of their method.

      We will discuss the limitations of the AMBER CpHMD implementation in the revised version. However, despite that, we believe we have in fact provided sufficient grounds for our conclusion that substrate binding affects ExxER motif protonation in the following way:

      In addition to CpHMD simulations, we establish the same effect via ABFE calculations, where the substrate affinity is different at the E56 deprotonated vs protonated protein. This is currently figure S20, though in the revised version we will move this piece of validation into a new panel of figure 6 in the main text, since it becomes more important with the CpHMD membrane problem in mind. Since the ABFE calculations are conducted with an all-atom representation of the lipids and the thermodynamic cycle closes well, it would appear that if the chosen CpHMD method has a systematic error of significant magnitude for this particular membrane protein system, there may be the benefit of error cancellation. While the calculated absolute pKa values may not be reliable, the difference made by substrate binding appears to be so, as judged by the orthogonal ABFE technique.

      Although the reviewer does “not expect the authors to redo their CpHMD simulations”, we consider that it may be helpful to the reader to share in this response some results from trials using the continuous, all-atom constant pH implementation that has recently become available in GROMACS (Aho et al 2022, https://pubs.acs.org/doi/10.1021/acs.jctc.2c00516) and can be used rigorously with membrane proteins, given its all-atom lipid representation.

      Unfortunately, when trying to titrate E56 in this CpHMD implementation, we found few protonationstate transitions taking place, and the system often got stuck in protonation state–local conformation coupled minima (which need to interconvert through rearrangements of the salt bridge network involving slow side-chain dihedral rotations in E53, E56 and R57). Author response image 1 shows this for the apo OF state, Author response image 2 shows how noisy attempts at pKa estimation from this data turn out to be, necessitating the use of a hybrid-solvent method.

      Author response image 1.

      All-atom CpHMD simulations of apo-OF PepT2. Red indicates protonated E56, blue is deprotonated.

      Author response image 2.

      Difficulty in calculating the E56 pKa value from the noisy all-atom CpHMD data shown in Author response image 1

      b) It appears that the authors did not make the substrate (Ala-Phe dipeptide) protonatable in holosimulations. This oversight prevents a complete representation of ligand-induced protonation events, particularly given that the substrate ion pairs with hsPepT2 through its N- & C-termini. I believe it would be valuable for the authors to acknowledge this potential limitation.

      In this study, we implicitly assumed from the outset that the substrate does not get protonated, which – as by way of response to the comment above – we will acknowledge explicitly in revision. This potential limitation for the available mechanisms for proton transfer also applies to our investigation of the ExxER protonation states. In particular, a semi-grand canonical ensemble that takes into account the possibility of substrate C-terminus protonation may also sample states in which the substrate is protonated and oriented away from R57, thus leaving the ExxER salt bridge network in an apo-like state. The consequence would be that while the direction of shift in E56 pKa value will be the same, our CpHMD may overestimate its magnitude. It would thus be interesting to make the C-terminus protonatable for obtaining better quantitative estimates of the E56 pKa shift (as is indeed true in general for any other protein protonatable residue, though the effects are usually assumed to be negligible). We do note, however, that convergence of the CpHMD simulations would be much harder if the slow degree of freedom of substrate reorientation (which in our experience takes 10s to 100s of ns in this binding pocket) needs to be implicitly equilibrated upon protonation state transitions. We will discuss such considerations in the revision.

      Reviewer #2 (Public Review):

      This is an interesting manuscript that describes a series of molecular dynamics studies on the peptide transporter PepT2 (SLC15A2). They examine, in particular, the effect on the transport cycle of protonation of various charged amino acids within the protein. They then validate their conclusions by mutating two of the residues that they predict to be critical for transport in cell-based transport assays. The study suggests a series of protonation steps that are necessary for transport to occur in Petp2. Comparison with bacterial proteins from the same family shows that while the overall architecture of the proteins and likely mechanism are similar, the residues involved in the mechanism may differ.

      Strengths:

      This is an interesting and rigorous study that uses various state-of-the-art molecular dynamics techniques to dissect the transport cycle of PepT2 with nearly 1ms of sampling. It gives insight into the transport mechanism, investigating how the protonation of selected residues can alter the energetic barriers between various states of the transport cycle. The authors have, in general, been very careful in their interpretation of the data.

      Weaknesses:

      Interestingly, they suggest that there is an additional protonation event that may take place as the protein goes from occluded to inward-facing but they have not identified this residue.

      We have indeed suggested that there may be an additional protonation site involved in the conformational cycle that we have not been able to capture, which – as we discuss in our paper – might be indicated by the shapes of the OCC ↔ IF PMFs given in Figure S15. One possibility is for this to be the substrate itself (see the response to reviewer #1 above) though within the scope of this study the precise pathway by which protons move down the transporter and the exact ordering of conformational change and proton transfer reactions remains a (partially) open question. We acknowledge this and denote it with question marks in the mechanistic overview we give in Figure 8, and also “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change”.

      Some things are a little unclear. For instance, where does the state that they have defined as occluded sit on the diagram in Figure 1a? - is it truly the occluded state as shown on the diagram or does it tend to inward- or outward-facing?

      Figure 1a is a simple schematic overview intended to show which structures of PepT2 homologues are available to use in simulations. This was not meant to be a quantitative classification of states. Nonetheless, we can note that the OCC state we derived has extra- and intracellular gate opening distances (as measured by the simple CVs defined in the methods and illustrated in Figure 2a) that indicate full gate closure at both sides. In particular, although it was derived from the IF state via biased sampling, the intracellular gate opening distance in the OCC state used for our conformational change enhanced sampling was comparable to that of the OF state (ie, full closure of the gate), see Figure S2b and the grey bars therein. Therefore, we would schematically classify the OCC state to lie at the center of the diagram in Figure 1a. Furthermore, it is largely stable over triplicates of 1 μslong unbiased MD, where in 2/3 replicates the gates remain stable, and the remaining replicate there is partial opening of the intracellular gate (as shown in Figure 2 b/c under the “apo standard” condition). We comment on this in the main text by saying that “The intracellular gate, by contrast, is more flexible than the extracellular gate even in the apo, standard protonation state”, and link it to the lower barrier for transition to IF than to OF. We did this by saying that “As for the OCC↔OF transitions, these results explain the behaviour we had previously observed in the unbiased MD of Figure 2c.” We acknowledge this was not sufficiently clear and will add details to the latter sentence in revision to help clarify better the nature of the occluded state.

      The pKa calculations and their interpretation are a bit unclear. Firstly, it is unclear whether they are using all the data in the calculations of the histograms, or just selected data and if so on what basis was this selection done. Secondly, they dismiss the pKa calculations of E53 in the outward-facing form as not being affected by peptide binding but say that E56 is when there seems to be a similar change in profile in the histograms.

      In our manuscript, we have provided two distinct analyses of the raw CpHMD data. Firstly, we analysed the data by the replicates in which our simulations were conducted (Figure 6, shown as bar plots with mean from triplicates +/- standard deviation), where we found that only the effect on E56 protonation was distinct as lying beyond the combined error bars. This analysis uses the full amount of sampling conducted for each replicate. However, since we found that the range of pKa values estimated from 10ns/window chunks was larger than the error bars obtained from the replicate analysis (Figures S17 and S18), we sought to verify our conclusion by pooling all chunk estimates and plotting histograms (Figure S19). We recover from those the effect of substrate binding on the E56 protonation state on both the OF and OCC states. However, as the reviewer has pointed out (something we did not discuss in our original manuscript), there is a shift in the pKa of E53 of the OF state only. In fact, the trend is also apparent in the replicate-based analysis of Figure 6, though here the larger error bars overlap. In our revision, we will add more details of these analyses for clarity (including more detailed figure captions regarding the data used in Figure 6) as well as a discussion of the partial effect on the E53 pKa value.

      We do not believe, however, that our key conclusions are negatively affected. If anything, a further effect on the E53 pKa which we had not previously commented on (since we saw the evidence as weaker, pertaining to only one conformational state) would strengthen the case for an involvement of the ExxER motif in ligand coupling.

      Reviewer #3 (Public Review):

      Summary:

      Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most wellstudied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions.

      The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family.

      Some of the key results include:

      (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition.

      (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down.

      (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations.

      Strengths:

      (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.

      (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.

      (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.

      Weaknesses:

      (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge.

      The reviewer is right to point out that the statement and Figure S3 as they stand do not adequately support our decision to exclude the K64-D317 salt-bridge in our further investigations. The violin plot shown in Figure S3, visualised as pooled data from unbiased 1 μs triplicates, does indeed not rule out a scenario where the salt bridge only formed late in our simulations (or only in some replicates), but then is stable. Therefore, in our revision, we will include the appropriate time-series of the salt bridge distances, showing how K64-D317 is initially stable but then falls apart in replicate 1, and is transiently formed and disengaged across the trajectories in replicates 2 and 3. We will also remake the data for this plot as we discovered a bug in the relevant analysis script that meant the D170-K642 distance was not calculated accurately. The results are however almost identical, and our conclusions remain.

      (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.

      If the conclusions were based on that alone, then we would agree. However, this section of work covers merely the observations of the initial unbiased simulations which we go on to test/explore with enhanced sampling in the rest of the paper, and which then lead us to the eventual conclusions.

      Figure S5 shows the results from triplicate 1 μs-long trajectories as violin-plot histograms of the extracellular gate opening distance, also indicating the first and final frames of the trajectories as connected by an arrow for orientation – a format we chose for intuitively comparing 48 trajectories in one plot. The reviewer reads the plot correctly when they analyse the “WT H87-prot” vs “D342A H87-prot” conditions. In the former case, no spontaneous opening in unbiased MD is taking place, whereas when D342 is mutated to alanine in addition to H87 protonation, we see spontaneous transition in 1 out of 3 replicates. However, the reviewer does not seem to interpret the statement in question in our paper (“the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed”) in the way we intended it to be understood. We merely want to note here a correlation in the unbiased dataset we collected at this stage, and indeed the one spontaneous opening in the case comparison picked out by the reviewer is in the condition where both the H87 interaction network and D342-R206 are perturbed. In noting this we do not intend to make statistically significant statements from the limited dataset. Instead, we write that “these simulations show a large amount of stochasticity and drawing clean conclusions from the data is difficult”. We do however stand by our assessment that from this limited data we can “already appreciate a possible mechanism where protons move down the transporter pore” – a hypothesis we investigate more rigorously with enhanced sampling in the rest of the paper. We will revise the section in question to make clearer that the unbiased MD is only meant to give an initial hypothesis here to be investigated in more detail in the following sections. In doing so, we will also incorporate, as we had not done before, the case (not picked out by the reviewer here but concerning the same figure) of S321A & H87 prot. In the third replicate, this shows partial gate opening towards the end of the unbiased trajectory (despite D342 not being affected), highlighting further the stochastic nature that makes even clear correlative conclusions difficult to draw.

      (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresisfree; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.

      We certainly do not intend to claim that the MEMENTO method is flawless. The concern the reviewer raises around the statement "These paths are then by definition non-hysteretic" is perhaps best addressed by a clarification of the language used and considering how MEMENTO is applied in this work.

      Hysteresis in the most general sense denotes the dependence of a system on its history, or – more specifically – the lagging behind of the system state with regards to some physical driver (for example the external field in magnetism, whence the term originates). In the context of biased MD and enhanced sampling, hysteresis commonly denotes the phenomenon where a path created by a biased dynamics method along a certain collective variable lags behind in phase space in slow orthogonal degrees of freedom (see Figure 1 in Lichtinger and Biggin 2023, https://doi.org/10.1021/acs.jctc.3c00140). When used to generate free energy profiles, this can manifest as starting state bias, where the conformational state that was used to seed the biased dynamics appears lower in free energy than alternative states. Figure S6 shows this effect on the PepT2 system for both steered MD (heavy atom RMSD CV) + umbrella sampling (tip CV) and metadynamics (tip CV). There is, in essence, a coupled problem: without an appropriate CV (which we did not have to start with here), path generation that is required for enhanced sampling displays hysteresis, but the refinement of CVs is only feasible when paths connecting the true phase space basins of the two conformations are available. MEMENTO helps solve this issue by reconstructing protein conformations along morphing paths which perform much better than steered MD paths with respect to giving consistent free energy profiles (see Figure S7 and the validation cases in the MEMENTO paper), even if the same CV is used in umbrella sampling.

      There are still differences between replicates in those PMFs, indicating slow conformational flexibility propagated from end-state sampling through MEMENTO. We use this to refine the CVs further with dimensionality reduction (see the Method section and Figure S8), before moving to 2D-umbrella sampling (figure 3). Here, we think, the reviewer’s point seems to bear. The MEMENTO paths are ‘non-hysteretic by definition’ with respect to given end states in the sense that they connect (by definition) the correct conformations at both end-states (unlike steered MD), which in enhanced sampling manifests as the absence of the strong starting-state bias we had previously observed (Figure S7 vs S6). They are not, however, hysteresis-free with regards to how representative of the end-state conformational flexibility the structures given to MEMENTO really were, which is where the iterative CV design and combination of several MEMENTO paths in 2D-PMFs comes in.

      We also cannot make a direct claim about whether in the transition region the MEMENTO paths might be separated from the true (lower free energy) transition paths by slow orthogonal degrees of freedom, which may conceivably result in overestimated barrier heights separating two free energy basins. We cannot guarantee that this is not the case, but neither in our MEMENTO validation examples nor in this work have we encountered any indications of a problem here.

      We hope that the reviewer will be satisfied by our revision, where we will replace the wording in question by a statement that the MEMENTO paths do not suffer from hysteresis that is otherwise incurred as a consequence of not reaching the correct target state in the biased run (in some orthogonal degrees of freedom).

    2. Reviewer #3 (Public Review):

      Summary:

      Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most well-studied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions.

      The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family.

      Some of the key results include<br /> (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition.

      (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down.

      (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations.

      Strengths:

      (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.

      (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.

      (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.

      Weaknesses:

      (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge.

      (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.

      (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresis-free; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1

      (1) Given the low trial numbers, and the point of sequential vs clustered reactivation mentioned in the public review, it would be reassuring to see an additional sanity check demonstrating that future items that are currently not on-screen can be decoded with confidence, and if so, when in time the peak reactivation occurs. For example, the authors could show separately the decoding accuracy for near and far items in Fig. 5A, instead of plotting only the difference between them.

      We have now added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have also chosen to replace Figure 5B with the new figure as we think it provides more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

      “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median.”

      (2) The non-sequential reactivation analyses often use a time window of peak decodability, and it was not entirely clear to me what data this time window is determined on, e.g., was it determined based on all future reactivations irrespective of graph distance? This should be clarified in the methods.

      Thank you for raising this. We now clarify this in the relevant section to read: “First, we calculated a time point of interest by computing the peak probability estimate of decoders across all trials, i.e., the average probability for each timepoint of all trials (except previous onscreen items) of all distances, which is equivalent to the peak of the differential reactivation analysis”

      (3) Fig 4 shows evidence for forward and backward sequential reactivation, suggesting that both forward and backward replay peak at a lag of 40-50msec. It would be helpful if this counterintuitive finding could be picked up in the discussion, explaining how plausible it is, physiologically, to find forward and backward replay at the same lag, and whether this could be an artifact of the TDLM method.

      This is an important point and we agree that it appears counterintuitive. However, we would highlight this exact time range has been reported in previous studies, though t never for both forward and backward replay. We now include a discussion of this finding. The section now reads:

      “[… ] Even though we primarily focused on the mean sequenceness scores across time lags, there appears s to be a (non-significant) peak at 40-60 milliseconds. While simultaneous forward and backward replay is theoretically possible, we acknowledge that it is somewhat surprising and, given our paradigm, could relate to other factors such as autocorrelations (Liu, Dolan, et al., 2021).”

      (4) It is reported that participants with below 30% decoding accuracy are excluded from the main analyses. It would be helpful if the manuscript included very specific information about this exclusion, e.g., was the criterion established based on the localizer cross-validated data, the temporal generalisation to the cued item (Fig. 2), or only based on peak decodability of the future sequence items? If the latter, is it applied based on near or far reactivations, or both?

      We now clarify this point to include more specific information, which reads:

      “[…] Therefore, we decided a priori that participants with a peak decoding accuracy of below 30% would be excluded from the analysis (nine participants in all) as obtained from the cross-validation of localizer trials”

      (5) Regarding the low amount of data for the reactivation analysis, the manuscript should be explicit about the number of trials available for each participant. For example, Supplemental Fig. 1 could provide this information directly, rather than the proportion of excluded trials.

      We have adapted the plot in the supplement to show the absolute number of rejected epochs per participant, in addition to the ratio.

      (6) More generally, the supplements could include more detailed information in the legends.

      We agree and have added more extensive explanation of the plots in the supplement legends.

      (7) The choice of comparing the 2 nearest with all other future items in the clustered reactivation analysis should be better motivated, e.g., was this based on the Wimmer et al. (2020) study?

      We have added our motivation for taking the two nearest items and contrasting them with the items further away. The paragraph reads:

      “[…] We chose to combine the following two items for two reasons: First, this doubled the number of included trials; secondly, using this approach the number of trials for each category (“near” and “distant”) was more balanced. […]”

      Reviewer 2

      (1) Focus exclusively on retrieval data (and here just on the current image trials).

      If I understand correctly, you focus all your analyses (behavioural as well as MEG analyses) on retrieval data only and here just on the current image trials. I am surprised by that since I see some shortcomings due to that. These shortcomings can likely be addressed by including the learning data (and predecessor image trials) in your analyses.

      a) Number of trials: During each block, you presented each of the twelve edges once. During retrieval, participants then did one "single testing session block". Does that mean that all your results are based on max. 12 trials? Given that participants remembered, on average, 80% this means even fewer trials, i.e., 9-10 trials?

      This is correct and a limitation of the paper. However, while we used only correct trials for the reactivation analysis, the sequential analysis was conducted using all trials disregarding the response behaviour. To retain comparability with previous studies we mainly focused on data from after a consolidation phase. Nevertheless, despite the trial limitation we consider the results are robust and worth reporting. Additionally, based on the suggestion of the referee, we now include results from learning blocks (see below).

      b) Extend the behavioural and replay/reactivation analysis to predecessor images.

      Why do you restrict your analyses to the current image trials? Especially given that you have such a low trial number for your analyses, I was wondering why you did not include the predecessor trials (except the non-deterministic trials, like the zebra and the foot according to Figure 2B) as well.

      We agree it would be great to increase power by adding the predecessor images to the current image cue analysis, excluding the ambiguous trials, we did not do so as we considered the underlying retrieval processes of these trial types are not the same, i.e. cannot be simply combined. Nevertheless, we have performed the suggested analysis to check if it increases our power. We found, that the reactivation effect is robust and significant at the same time point of 220-230 ms. However, the effect size actually decreased: While before, peak differential reactivation was at 0.13, it is now at 0.07. This in fact makes conceptual sense. We suspect that the two processes that are elicited by showing a single cue and by showing a second, related, cue are distinct insofar as the predecessor image acts as a primer for the current image, potentially changing the time course/speed of retrieval. Given our concerns that the two processes are not actually the same we consider it important to avoid mixing these data.

      We have added a statement to the manuscript discussing this point. The section reads:

      “Note that we only included data from the current image cue, and not from the predecessor image cue, as we assume the retrieval processes differ and should not be concatenated.”

      c) Extend the behavioural and replay/reactivation analysis to learning trials.

      Similar to point 1b, why did you not include learning trials in your analyses?

      The advantage of including (correct and incorrect) learning trials has the advantage that you do not have to exclude 7 participants due to ceiling performance (100%).

      Further, you could actually test the hypothesis that you outline in your discussion: "This implies that there may be a switch from sequential replay to clustered reactivation corresponding to when learned material can be accessed simultaneously without interference." Accordingly, you would expect to see more replay (and less "clustered" reactivation) in the first learning blocks compared to retrieval (after the rest period).

      To track reactivation and replay over the course of learning is a great idea. We have given a lot of thought as to how to integrate these findings but have not found a satisfying solution. Thus, analysis of the learning data turned out to be quite tricky: We decided that each participant should perform as many blocks as necessary to reach at least 80% (with a limit of six and lower bound of two, see Supplement figure 4). Indeed, some participant learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). With the benefit of hindsight, we realise our design means that different blocks are not directly comparable between participants. In theory, we would expect that replay emerges in parallel with learning and then gradually changes to clustered reactivation as memory traces become consolidated/stronger. However, it is unclear when replay should emerge and when precisely a switch to clustered reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper.

      Nevertheless, to provide some insight into the learning process, and to see how consolidation impacts differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track processes on a block basis, it does offer potential (albeit limited) insight into the hypothesis we outline in the discussion.

      For reactivation, we see emergence of a clear increase, further strengthening the outlined hypothesis, however, for replay the evidence is less clear, as we do not know over how many learning blocks replay is expected.

      We calculated individual trajectories of how reactivation and replay changes from learning to retrieval and related these to performance. Indeed, we see an increase of reactivation is nominally associated with higher learning performance, while an increase in replay strength is associated with lower performance (both non-significant). However, due to the above-mentioned reasons we think it would premature to add this weak evidence to the paper.

      To mitigate problems of experiment design in relation to this question we are currently implementing a follow-study, where we aim to normalize the learning process across participants and index how replay/reactivation changes over the course of learning and after consolidation.

      We have added plots showing clustered reactivation sequential replay measures during learning (Figure 5D and Supplement 8)

      The added section(s) now read:

      “To provide greater detail on how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures across learning trials in contrast to retrieval trials. For all learning trials, for each participant, we calculated differential reactivation for the same time point we found significant in the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D). […]

      Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), due to experimental design features our data do not enable us to test for an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

      d) Introduction (last paragraph): "We examined the relationship of graph learning to reactivation and replay in a task where participants learned a ..." If all your behavioural analyses are based on retrieval performance, I think that you do not investigate graph learning (since you exclusively focus the analyses on retrieving the graph structure). However, relating the graph learning performance and replay/reactivation activity during learning trials (i.e., during graph learning) to retrieval trials might be interesting but beyond the scope of this paper.

      We agree. We have changed the wording to be more accurate. Indeed, we do not examine graph learning but instead examine retrieval from a graph, after graph learning. The mentioned sentence now read

      “[…] relationship of retrieval from a learned graph structure to reactivation [...]”

      e) It is sometimes difficult to follow what phase of the experiment you refer to since you use the terms retrieval and test synonymously. Not a huge problem at all but maybe you want to stick to one term throughout the whole paper.

      Thank you for pointing this out. We have now adapted the manuscript to exclusively refer to “retrieval” and not to “test”.

      (2) Is your reactivation clustered?

      In Figure 5A, you compare the reactivation strength of the two items following the cue image (i.e., current image trials) with items further away on the graph. I do not completely understand why your results are evidence for clustered reactivation in contrast to replay.

      First, it would be interesting to see the reactivation of near vs. distant items before taking the difference (time course of item probabilities).

      (copied answer from response to Reviewer 1, as the same remark was raised)

      We have added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have chosen to replace Figure 5B with the new figure as we think that it offers more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

      “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median. .”

      Second, could it still be that the first item is reactivated before the second item? By averaging across both items, it becomes not apparent what the temporal courses of probabilities of both items look like (and whether they follow a sequential pattern). Additionally, the Gaussian smoothing kernel across the time dimension might diminish sequential reactivation and favour clustered reactivation. (In the manuscript, what does a Gaussian smoothing kernel of  = 1 refer to?). Could you please explain in more detail why you assume non-sequential clustered reactivation here and substantiate this with additional analyses?

      We apologise for the unclear description. Note the Gaussian kernel is in fact only used for the reactivation analysis and not the replay analysis, so any small temporal successions would have been picked up by the sequential analysis. We now clarify this in the respective section of the sequential analysis and also explain the parameter of delta= 1 in the reactivation analysis section. The paragraph now reads

      “[…] As input for the sequential analysis, we used the raw probabilities of the ten classifiers corresponding to the stimuli. [...]

      […] Therefore, to address this we applied a Gaussian smoothing kernel (using scipy.ndimage.gaussian_filter with the default parameter of σ=1 which corresponds approximately to taking the surrounding timesteps in both direction with the following weighting: current time step: 40%, ±1 step: 25%, ±2 step: 5%, ±3 step: 0.5%) [...]”

      (3) Replay and/or clustered reactivation?

      The relationship between the sequential forward replay, differential reactivation, and graph reactivation analysis is not really apparent. Wimmer et al. demonstrated that high performers show clustered reactivation rather than sequential reactivation. However, you did not differentiate in your differential reactivation analysis between high vs. low performers. (You point out in the discussion that this is due to a low number of low performers.)

      We agree that a split into high vs low performers would have been preferably for our analysis. However, there is one major obstacle that made us opt for a correlational analysis instead: We employed criteria learning, rendering a categorical grouping conceptually biased. Even though not all participants reached the criteria of 80%, our sample did not naturally split between high and low performers but was biased towards higher performance, leaving the groups uneven. The median performance was 83% (mean ~81%), with six of our subjects (~1/4th of included participant) having this exact performance. This makes a median or mean split difficult, as either binning assignment choice would strongly affect the results. We have added a limitations section in which we extensively discuss this shortcoming and reasoning for not performing a median split as in Wimmer et al (2020). The section now reads:

      “There are some limitations to our study, most of which originate from a suboptimal study design. [...], as we performed criteria learning, a sub-group analysis as in Wimmer et al., (2020) was not feasible, as median performance in our sample would have been 83% (mean 81%), with six participants exactly at that threshold. [...]”

      It might be worth trying to bring the analysis together, for example by comparing sequential forward replay and differential reactivation at the beginning of graph learning (when performance is low) vs. retrieval (when performance is high).

      Thank you for the suggestion to include the learning segments, which we think improves the paper quite substantially. However, analysis of the learning data turned out to be quite tricky> We had decided that each participant should perform as many blocks as necessary to reach at least 80% accuracy (with a limit of six and lower bound of two, see Supplement figure 4). Some participants learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). This in hindsight is an unfortunate design feature in relation to learning as it means different blocks are not directly comparable between participants.

      In theory, we would expect that replay emerges in parallel with learning and then gradually change to clustered reactivation, as memory traces get consolidated/stronger. However, it is unclear when replay would emerge and when the switch to reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper at all.

      Nevertheless, to give some insight into the learning process and to see how consolidation effects differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track measures of interest on a block basis, it gives some (albeit limited) insight into the hypothesis outlined in our discussion.

      For reactivation, we see a clear increase, further strengthening the outlined hypothesis, However, for replay the evidence is less obvious, potentially due to that fact that we do not know across how many learning blocks replay is to be expected.

      The added section(s) now read:

      “To examine how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures during learning trials in contrast to retrieval trials. For all learning trial, for each participant, we calculated differential reactivation for the time point we found significant during the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D).

      […]

      Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), our data does not enable us to show an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

      Additionally, the main research question is not that clear to me. Based on the introduction, I thought the focus was on replay vs. clustered reactivation and high vs. low performance (which I think is really interesting). However, the title is more about reactivation strength and graph distance within cognitive maps. Are these two research questions related? And if so, how?

      We agree we need to be clearer on this point. We have added two sentences to the introduction, which should address this point. The section now reads:

      “[…] In particular, the question remains how the brain keeps track of graph distances for successful recall and whether the previously found difference between high and low performers also holds true within a more complex graph learning context.”

      (4) Learning the graph structure.

      I was wondering whether you have any behavioural measures to show that participants actually learn the graph structure (instead of just pairs or triplets of objects). For example, do you see that participants chose the distractor image that was closer to the target more frequently than the distractor image that was further away (close vs. distal target comparison)? It should be random at the beginning of learning but might become more biased towards the close target.

      Thanks, this is an excellent suggestion. Our analysis indeed shows that people take the near lure more often than the far lure in later blocks, while it is random in the first block.

      Nevertheless, we have decided to put these data into the supplement and reference it in the text. This is because analysis of the learning blocks is challenging and biased in general. Each participant had a different number of learning blocks based on their learning rate, and this makes it difficult to compare learning across participants. We have tried our best to accommodate and explain these difficulties in the figure legend. Nevertheless, we thank the referee for guidance here and this analysis indeed provides further evidence that participants learned the actual graph structure.

      The added section reads

      “Additionally, we have included an analysis showing how wrong answers participants provided were random in the first block and biased towards closer graph nodes in later blocks. This is consistent with participants actually learning the underlying graph structure as opposed to independent triplets (see figure and legend of Supplement 6 for details).”

      (5) Minor comments

      a) "Replay analysis relies on a successive detection of stimuli where the chance of detection exponentially decreases with each step (e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting the replay event). " Could you explain in more detail why 30% is a good threshold then?

      Thank you. We have further clarified the section. As we are working mainly with probabilities, it is useful to keep in mind that accuracy is a class metric that only provides a rough estimate of classifier ability. Alternatively, something like a Top-3-Accuracy would be preferable, but also slightly silly in the context of 10 classes.

      Nevertheless, subtle changes in probability estimates are present and can be picked up by the methods we employ. Therefore, the 30% is a rough lower bound and decided based on pilot data that showed that clean MEG data from attentive participants can usually reach this threshold. The section now reads:

      “(e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting a replay event). However, one needs to bear in mind that accuracy is a “winnertakes-all” metric indicating whether the top choice also has the highest probability, disregarding subtle, relative changes in assigned probability. As the methods used in this analysis are performed on probability estimates and not class labels, one can expect that the 30% are a rough lower bound and that the actual sensitivity within the analysis will be higher. Additionally, based on pilot data, we found that attentive participants were able to reach 30% decodability, allowing us to use decodability as a data quality check. “

      b) Could you make explicit how your decoders were designed? Especially given that you added null data, did you train individual decoders for one class vs. all other classes (n = 9 + null data) or one class vs. null data?

      We added detail to the decoder training. The section now reads

      “Decoders were trained using a one-vs-all approach, which means that for each class, a separate classifier was trained using positive examples (target class) and negative examples (all other classes) plus null examples (data from before stimulus presentation, see below). In detail, null data was.”

      c) Why did you choose a ratio of 1:2 for your null data?

      Our choice for using a higher ratio was based upon previous publications reporting better sensitivity of TDLM using higher ratios, as spatial sensor correlations are decreasing. Nevertheless, this choice was not well investigated beforehand. We have added more information to this to the manuscript

      d) You could think about putting the questionnaire results into the supplement if they are sanity checks.

      We have added the questionnaire results. However, due to the size of the tables, we have decided to add them as excel files into the supplementary files of the code repository. We have mentioned the existence file in the publication.

      e) Figure 2. There is a typo in D: It says "Precessor Image" instead of "Predecessor Image".

      Fixed typo in figure.

      f) You write "Trials for the localizer task were created from -0.1 to 0.5 seconds relative to visual stimulus onset to train the decoders and for the retrieval task, from 0 to 1.5 seconds after onset of the second visual cue image." But the Figure legend 3D starts at -0.1 seconds for the retrieval test.

      We have now clarified this. For the classifier cross-validation and transfer sanity check and clustered analysis we used trials from -0.1 to 0.5s, whereas for the sequenceness analysis of the retrieval, we used trials from 0 to 1.5 seconds

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all reviewers for their thorough assessment and constructive comments. We are glad that the reviewers appreciate that our findings are of interest to the nuclear transport field and that our extension of the use of the RITE methodology can be a valuable tool for the further characterization of NPCs that differ in composition and potentially function. In response to the reviewers’ comments, we have revised the text to incorporate their suggestions and improve overall readability and clarity. Furthermore, we propose to perform a set of additional experiments to address the reviewers’ most important critiques. Below we list our response with the reviewer comments reprinted in dark grey and our response in blue for easier orientation. We have added numbering of the comments for easier orientation.

      Many of the comments made by the reviewers have already been implemented, additional points will be addressed in a revised version of the manuscript as detailed below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors extended the existing recombination-induced tag exchange (RITE) technology to show that they can image a subset of NPCs, improving signal-to-noise ratios for live cell imaging in yeast, and to track the stability or dynamics of specific nuclear pore proteins across multiple cell divisions. Further, the authors use this technology to show that the nuclear basket proteins Mlp1, Mlp2 and Pml39 are stably associated with "old NPCs" through multiple cell cycles. The authors show that the presence of Mlp1 in these "old NPCs" correlates with exclusion of Mlp1-positive NPCs from the nucleolar territory. A surprising result is that basket-less NPCs can be excluded from the non-nucleolar region, an observation that correlates with the presence of Nup2 on the NPC regardless of maturation state of the NPC. In support of the proposal that retention of NPCs via Mlp1 and Nup2 in non-nucleolar regions, simulation data is presented to suggest that basket-less NPCs diffuse faster in the plane of the nuclear envelope.

      However, there are some points that do need addressing:

      Major Points 1. Taking into account that the Nup2 result in Figure 4B forms the basis for one half of the proposed model in Figure 6 regarding the exclusion of NPCs from the nucleolar region of the NE, there is a relatively small amount of data in support of this finding and this proposed model. For example, the only data for Nup2 in the manuscript is a column chart in Figure 4B with no supporting fluorescence microscopy examples for any Nup2 deletion. Further, the Nup60 deletion mutant will have zero basket-containing NPCs, whereas the Nup2 deletion will be a mixture of basket-containing and basket-less NPCs. The only support for the localization of basket-containing NPCs in the Nup2 deletion mutant is through a reference "Since Mlp1-positive NPCs remain excluded from the nucleolar territory in nup2Δ cells (Galy et al., 2004), the homogenous distribution observed in this mutant must be caused predominantly by the redistribution of Mlp-negative NPCs into the nucleolar territory."

      We have already added fluorescent images of the nup2d strain to figure 4A in the preliminary revision.

      In addition, we will repeat the experiment from Galy et al. 2004 to test whether Mlp-positive NPCs are excluded from nucleoli in our hands as well.

      Furthermore, we propose to carry out more experiments to pinpoint which domains of Nup2 contribute to nucleolar exclusion, which will provide more insight into the mechanism behind this effect. We propose to do this by analyzing NPC localization in mutants expressing truncations of Nup2 with deletions for individual domains as their only copy of Nup2. Regardless of whether we find a single domain of Nup2 responsible of a combinatorial action, this experiment will indicate a potential molecular mechanism for nucleolar exclusion.

      1. The authors could consider utilizing this opportunity to discuss their technological innovations in the context of the prior work of Onischenko et al., 2020. This work is referenced for the statement "RITE can be used to distinguish between old and new NPCs" Page 2, Line 43. However, it is not referenced for the statement "We constructed a RITE-cassette that allows the switch from a GFP-labelled protein to a new protein that is not fluorescently labelled (RITE(GFP-to-dark))" despite Onischenko et al., 2020 having already constructed a RITE-cassette for the GFP-to-dark transition. The authors could consider taking this opportunity to instead focus on their innovative approach to apply this technology to decrease the number of fluorescently-tagged NPCs by dilution across multiple cell divisions and to interpret this finding as a measure of the stability of nuclear pore proteins within the broader NPC.

      We apologize for this imprecise citation. We have modified the text to indicate that our RITE cassette was previously used in two publications. It now reads: “We used a RITE-cassette that allows the switch from a GFP-labelled protein to a new protein that is not fluorescently labelled (RITE(GFP-to-dark)) (Onischenko et al., 2020, Kralt et al., 2022). “

      1. The authors could also consider taking this opportunity to discuss their results in the context of the Saccharomyces cerevisiae nuclear pore complex structures published e.g. in Kim et al., 2018, Akey et al., 2022, Akey et al., 2023 in which the arrangement of proteins in the nuclear basket is presented, and also work from the Kohler lab (Mészáros et al., 2015) on how the basket proteins are anchored to the NPC. There is additional literature that also might help provide some perspective to the findings in the current manuscript, such as the observation that a lesser amount of Mlp2 to Mlp1 observed is consistent with prior work (e.g. Kim et al., 2018) and that intranuclear Mlp1 foci are also formed after Mlp1 overexpression (Strambio-de-Castillia et al., 1999).

      Following the reviewer’s suggestion, we extended our discussion of basket Nup stoichiometry and organization in the discussion section including several of the citations mentioned. At this point, we did not see a good way to incorporate discussion about the nuclear Mlp1 foci formed after Mlp1 overexpression. However, this observation is in line with the foci formed in cells lacking Nup60, suggesting that Mlp1 that cannot be incorporated into NPCs forms nuclear foci.

      Minor Points 1. What is the "lag time" of the doRITE switching? Do the authors believe that it is comparable to the approximate 1-hour timeframe following beta-estradiol induction as shown previously in Chen et al. Nucleic Acids Research, Volume 28, Issue 24, 15 December 2000, Page e108, https://doi.org/10.1093/nar/28.24.e108

      Our data (e.g. newRITE, Figure S3B) suggest that the switch occurs on a similar timeframe at

      1. The authors could consider a brief explanation of radial position (um) for the benefit of the reader, in Figures 1E (right panel) and 2B (right panel), perhaps using a diagram to make it easier to understand the X-axis (um).

      To address this, we have now included a diagram and refer to it in the figure legend.

      1. In Figure 1G, would the authors consider changing the vertical axis title and the figure legend wording from "mean number of NPCs per cell" to "mean labeled NPC # per cell" to reflect that what is being characterized are the remaining GFP-bearing NPCs over time?

      Thank you for spotting this inaccuracy. We have changed the label to “mean # of labeled NPCs per cell”.

      1. In Figure 2C, the magenta-labeled protein in the micrographs is not described in the figure or the legend.

      As requested, a description has been added in figure and legend.

      1. In Figure S2A, there is an arrow indicating a Nup159 focus, but this is not described in the figure legend, as is done in Figure 2C.

      A description has been added to the legend.

      1. In Figure S3C, the figure legend does not match the figure. Was this supposed to be designed like Figure 3C and is missing part of the figure? Or is the legend a typographical error?

      We apologize for this error and thank the reviewer for spotting it. The legend has been corrected.

      1. In Figure S4B, the spontaneously recombined RITE (GFP-to-dark) Nup133-V5 appears in the western blot as equally abundant to pre-recombined Nup133-V5-GFP. In the figure legend, this is explained as cells grown in synthetic media without selection to eliminate cells that have lost their resistance marker from the population. In Cheng et al. Nucleic Acids Res. 2000 Dec 15; 28(24): e108, Cre-EBD was not active in the absence of B-estradiol, despite galactose-induced Cre-EBD overexpression. Would the authors be able to comment further on the Cre-Lox RITE system in the manuscript?

      We note that also in the cited publication, cells are grown in the presence of selection to select (as stated in this publication) “against pre-excision events that occur because of low but measurable basal expression of the recombinase”. Although the authors report that spontaneous recombination is reduced with the b-estradiol inducible system (compared to pGAL expression control of the recombinase only), they show negligible spontaneous recombination only within a two-hour time window. Indeed, we also observe low levels of uninduced recombination on a short timeframe, but occasional events can become significant in longer incubation times (e.g. overnight growth) in the absence of selection. It should be noted that in our system, Cre expression is continuously high (TDH3-promoter) and not controlled by an inducible GAL promoter. We have added the information about the promoter controlling Cre-expression in the methods section.

      1. In Figure 6, the authors may want to consider inverting the flow of the cartoon model to start from the wild type condition and apply the deletion mutations at each step to "arrive" at the mutant conditions, rather than starting with mutant conditions and "adding back" proteins.

      Following the suggestions of the reviewer, we have modified our model to more clearly represent the contributions of the different basket components.

      Reviewer #1 (Significance (Required)):

      Recent work has drawn attention to the fact that not all NPCs are structurally or functionally the same, even within a single cell. In this light, the work here from Zsok et al. is an important demonstration of the kind of methodologies that can shed light on the stability and functions of different subpopulations of NPCs. Altogether, these data are used to support an interesting and topical model for Nup2 and nuclear-basket driven retention of NPCs in non-nucleolar regions of the nuclear envelope.

      We thank the reviewer for this positive assessment of our work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study, Zsok et al. develop innovative methods to examine the dynamics of individual nuclear pore complexes (NPCs) at the nuclear envelope of budding yeast. The underlying premise is that with the emergence of biochemically distinct NPCs that co-exist in the same cell, there is a need to develop tools to functionally isolate and study them. For example, there is a pool of NPCs that lack the nuclear basket over the nucleolus. Although the nature of this exclusion has been investigated in the past, the authors take advantage of a modification of recombination induced tag exchange (RITE), the slow turnover of scaffold nups, the closed mitosis of budding yeast, and extensive high quality time lapse microscopy to ultimately monitor the dynamics of individual NPCs over the nucleolus. By leveraging genetic knockout approaches and auxin-induced degradation with sophisticated quantitative and rigorous analyses, the authors conclude that there may be two mechanisms dependent on nuclear basket proteins that impact nucleolar exclusion. They also incorporate some computational simulations to help support their conclusions. Overall, the data are of the highest quality and are rigorously quantified, the manuscript is well written, accessible, and scholarly - the conclusions are thus on solid footing.

      We thank the reviewer for this assessment.

      Reviewer #2 (Significance (Required)):

      I have no concerns about the data or the conclusions in this manuscript. However, the significance is not overly clear as there is no major conceptual advance put forward, nor is there any new function suggested for the NPCs over nucleoli. As NPCs are immobile in metazoans, the significance may also be limited to a specialized audience.

      We respectfully disagree with this assessment. It is becoming increasingly clear that NPC variants are also present in other model systems. We characterize the interaction between conserved nuclear components, the NPC, the nucleolus and chromatin. While the specific architecture of the nucleus varies between species, many of these interactions are conserved. For example, Nup50, the homologue of Nup2, interacts with chromatin also in other systems including mammalian cells and thus may contribute to regulating the interplay between the nuclear basket and adjoining chromatin. Furthermore, our work demonstrates the use of a novel approach in the application of RITE that can be useful for other researchers in the field of NPC biology and beyond. For example, doRITE could be applied to study the properties of aged NPCs in the context of young cells. In the revised manuscript, we attempt to better highlight and discuss the conceptual advances of our manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript of Zsok et al. describes the role of nuclear basket proteins in the distribution and mobility of nuclear pore complexes in budding yeast. In particular, the authors showed that the doRITE approach can be used for the analysis of stable and dynamically associated NUPs. Moreover, it can distinguish individual NUPs and follow the inheritance of individual NPCs from mother to daughter cells. The author's findings highlight that Mlp1, Mlp2, and Pml39 are stably associated with the nuclear pore; deletion of Mlp1-Mlp2 and Nup60 leads to the higher NPC density in the nucleolar territory; and NPCs exhibit increased mobility in the absence of the nuclear basket components.

      The manuscript contains most figures supporting the data, and data supports the conclusions. However, authors need to include better explanations for figures in the text and figure legends. Lack of detailed explanation can pose challenges for non-experts. In addition, the authors jump over figures and shuffle them through the manuscript, which disrupts the flow and coherence of the manuscript.

      We thank the reviewer for pointing this out. We have modified the figure legends throughout the manuscript in an attempt to make them more accessible to the reader. In addition, we will revise the figure order and text as suggested to improve the flow of the manuscript.

      Major comments: 1) The nuclear basket contains Nup1, Nup2, Nup60, Mlp1, and Mlp2 in yeast. Nup60 works as a seed for Mlp1/Mlp2 and Nup2 recruitment and plays a key role in the assembly of nuclear pore basket scaffold (PMID: 35148185). Logically, the authors focused primarily on Nup60 in the current manuscript. However, NUP153 has another ortholog of yeast - Nup1, which has not been studied in this work. I recommend adjusting the title of the manuscript to: Nup60 and Mlp1/Mlp2 regulate the distribution and mobility of nuclear pore complexes in budding yeast. I also suggest discussing why work on Nup1 was not included/performed in the manuscript.

      We have changed the title to “Nuclear basket proteins regulate the distribution and mobility of nuclear pore complexes in budding yeast”. We think that this better captures the essence of our manuscript than listing all four proteins (Mlp1/2, Nup60 and Nup2) in the title.

      We initially focused on the network that is involved in Mlp1/2 interaction at the NPC. However, we agree that it would be interesting to test, whether Nup1 plays a role in the analyzed processes as well. Since Nup1 is essential in our yeast background, we will use auxin-inducible degradation of Nup1 to test its involvement in NPC distribution.

      2) Figure 2B: I suggest choosing a more representative image for Pml39. It looks not like a stable component but rather dynamic as NUP60 or Gle1 based on figure showed in Figure 2B.

      Due to its lower copy number, Pml39 is much more difficult to visualize than the other Nups. To guide the reader, we have now added arrow heads to point to remaining Pml39 foci at the 14 hour timepoint. The 11 hour time point most clearly show that Pml39 is less dynamic than other Nups such as Nup116, Nup60 or Gle1. At this time point, clear dots for Pml39 can be detected, while e.g. Nup116 in the same figure exhibits a more distributed signal and the signal for Nup60 and Gle1 is no longer visible. We will describe this more clearly in our revised manuscript as well.

      3) Depletion of AID-tagged proteins needs to be supported by Western blot analysis with protein-specific antibodies, and PCR results should be included in supplementary data to demonstrate the homozygosity of the strains.

      The correct genomic tagging of the depleted proteins by AID was confirmed by PCR. We will include this PCR analysis in the supplemental data. Please note that we are working with haploid yeast cells. Therefore, all strains only carry a single copy of the genes. Unfortunately, we do not have protein-specific antibodies against the depleted proteins. However, the Mlp1-mislocalization phenotype demonstrates that depletion of Nup60 is successful and the depletion strain for PolII depletion was used and characterized previously (PMID: 31753862, PMID: 36220102).

      4) Figure 5B: Snapshots of images from the movie are required. There are no images, only quantifications.

      We have replaced the supplemental movie with a movie showing the detection by Trackmate as well as overlaid tracks. As requested, a snapshot of this movie was inserted in figure 5B. We have also moved the example tracks from the supplement to the main figure. Furthermore, we will deposit the tracking dataset in the ETH Research Collection to make it available to the community.

      5) Description of figure legends is more technical than supporting/explaining the figure. For example, below my suggestions for Figure 1D. Please, consider more detailed explanation for other figures. (D) Left: Schematic of the RITE cassette. NUP of interest is tagged with V5 tag and eGFP fluorescent protein where LoxP sites flank eGFP. Before the beta-estradiol-induced recombination, the old NPCs are marked with eGFP signal, whereas new NPCs lack an eGFP signal after the recombination. ORF: open reading frame; V5: V5-tag; loxP: loxP recombination site; eGFP: enhanced green fluorescent protein. Right: doRITE assay schematic of stable or dynamic Nup behavior over cell divisions in yeast after the recombination.

      We have modified the figure legends throughout the manuscript to make them more explanatory and helpful for the reader.

      In addition, I recommend highlighting the result in the title of the figures. Please, re-consider titles for Figure S3.

      We have revised the title for Figure S3 to state a result. It now reads: “Mlp1 truncations localize preferentially to non-nucleolar NPCs.”

      Minor: i) P.1 Line 31. Extra period symbol before the "(Figure 1A)".

      Fixed

      ii) P.2 Line 10. Inconsistent writing of PML39 and MLP1. Both genes are capitalized. The same for P.4 Line 16. In some cases all letters are capitalized in other only the first one.

      We are following the official yeast gene nomenclature by spelling gene names in italicized capitals and protein names with only the first letter capitalized. We are sorry that this can be confusing for readers more familiar with other model systems but we adhere to the accepted yeast nomenclature standards.

      iii) P.2 Line 18-22. The sentence is too long and hard to read. I recommend splitting it into two sentences.

      We agree and have fixed this.

      iv) P.2-3 Line 46-47. The sentence is unclear. Suggestion: We expected that successive cell divisions would dilute the signal of labelled and stably associated with the NPC nucleoporins. By contrast, ...

      We have modified the sentence to read: “When tagging a Nup that stably associates with the NPC, we expected that successive cell divisions would dilute labelled NPCs by inheritance to both mother and daughter cells leading to a low density of labelled NPCs. By contrast,…”

      v) P.4 Line 17-21. Please, consider adding extra information and clarifying lines 19-21. For example, in Line 19 Figure 2B you can add that the reader needs to compare row 1 and row 4.

      Thank you, we have fixed this as suggested.

      vi) P. 5 Line 15. When a number begins a sentence, that number should always be spelled out. You can pe-phrase the sentence to avoid it. Also, I recommend adding an explanation/hypothesis of why new NPCs are less frequently detected in nucleolar territory.

      We have formatted the text. Interestingly, new NPCs are more frequently detected in the nucleolar territory. We have reformulated this section to make it clearer, also in response to the next comment.

      vii) P.5 Line 17-22. I recommend re-phrasing these two sentences. Logically, it is clear that Mlp1/Mlp2 loss mimics "old NPCs" to look more like "new NPCs", and for that reason, they are more frequently included in the nucleolar territory, but it is not clear when you read these two sentences from the first time.

      We have reformulated this section to make it clearer.

      viii) P6. Line 16. No figure supporting data on graph (Figure 3B).

      We have added fluorescent images of the nup2d strain to figure 4A.

      ix) P.7 Line 10-13. The sentence is unclear.

      We have shortened the sentence and moved part of the content to the discussion in the next paragraph.

      x) P.13,14 etc. If 0h timepoint has been used for normalization, why is it present on the graph?

      The 0h timepoint is shown for comparison and to illustrate the standard deviation in the data.

      xi) P.15. Line 32-33. There is no image here. Potentially wrong description of the figure.

      Thank you for spotting this. This was fixed.

      xii) Figures: - Inconsistent labeling of figures. For example, Fig.1, Fig.1S, Figure 2 etc.

      Thank you, this has been corrected.

      • Inconsistent labeling of figures. For example, Fig.1 G "mean number of NPCs per cell" - no capitalization of the first letter. Fig.1S "Fraction in population" is capitalize d. In general, titles of axis should be capitalized.

      Thank you for spotting this. This was fixed.

      Suggestions for Figure 1D and Figure 6 are attached as a separate file.

      We thank the reviewer for their suggestions to improve these figures. We have taken their recommendation and revised the figures accordingly (see also response to reviewer 1, minor point 8).

      Reviewer #3 (Significance (Required)):

      Zsok et al. used the recombination-induced tag exchange (RITE) approach, which is an interesting and powerful method to follow individual NUPs over time with respect to their localization and abundance. This approach has been used before in PMID: 36515990 to distinguish pre-existing and newly synthesized Nup2 populations and has been extended to other basket NUPs in this work. Using this method, the authors support the earlier data on basket nucleoporins and highlight new insights on how basket nucleoporins regulate NPCs distribution and mobility. Overall, the manuscript provides new details on the stability of nucleoporins in yeast and how these data align with the mass spectrometry and FRAP data performed earlier in other studies. The limitation of this study is the absence of data on Nup1. It was unclear why these data were not present. Additional data can be included on the dynamics of Pml39, for example, using the FRAP method. The dynamic of Pml39 at the pore was shown only using the doRITE method.

      As suggested, we propose to test whether Nup1 influences NPC organization (see also above). Unfortunately, we are not able to provide orthologous data for the dynamics of Pml39. As we have discussed in the manuscript, FRAP is not suitable for the analysis of the dynamics of most nucleoporins in yeast due to the high lateral mobility of NPCs in the nuclear envelope and has previously generated misleading results for Mlp1. Furthermore, the low expression levels of Pml39 will make it difficult to obtain reliable FRAP curves for this protein. We therefore do not think that adding FRAP experiments with Pml39 will provide valuable insight.

      However, in addition to the Pml39 doRITE result itself, our observation that the Pml39-dependent pool of Mlp1 exhibits stable association with the NPC supports the interpretation of Pml39 as a stable protein as well.

      In general, this study represents a unique research study of basic research on nuclear pore proteins that will be of general interest to the nuclear transport field.

      Field of expertise: nuclear-cytoplasmic transport, nuclear pore, inducible protein degradation. I do not have sufficient expertise in ExTrack.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study advances our understanding of how past and future information is jointly considered in visual working memory by studying gaze biases in a memory task that dissociates the locations during encoding and memory tests. The evidence supporting the conclusions is convincing, with state-of-the-art gaze analyses that build on a recent series of experiments introduced by the authors. This work, with further improvements incorporating the existing literature, will be of broad interest to vision scientists interested in the interplay of vision, eye movements, and memory.

      We thank the Editors and the Reviewers for their enthusiasm and appreciation of our task, our findings, and our article. We also wish to thank the Reviewers for their constructive comments that we have embraced to improve our article. Please find below our point-by-point responses to this valuable feedback, where we also state relevant revisions that we have made to our article.

      In addition, please note that we have now also made our data and code publicly available.

      Reviewer 1, Comments:

      In this study, the authors offer a fresh perspective on how visual working memory operates. They delve into the link between anticipating future events and retaining previous visual information in memory. To achieve this, the authors build upon their recent series of experiments that investigated the interplay between gaze biases and visual working memory. In this study, they introduce an innovative twist to their fundamental task. Specifically, they disentangle the location where information is initially stored from the location where it will be tested in the future. Participants are tasked with learning a novel rule that dictates how the initial storage location relates to the eventual test location. The authors leverage participants' gaze patterns as an indicator of memory selection. Intriguingly, they observe that microsaccades are directed toward both the past encoding location and the anticipated future test location. This observation is noteworthy for several reasons. Firstly, participants' gaze is biased towards the past encoding location, even though that location lacks relevance to the memory test. Secondly, there's a simultaneous occurrence of an increased gaze bias towards both the past and future locations. To explore this temporal aspect further, the authors conduct a compelling analysis that reveals the joint consideration of past and future locations during memory maintenance. Notably, microsaccades biased towards the future test location also exhibit a bias towards the past encoding location. In summary, the authors present an innovative perspective on the adaptable nature of visual working memory. They illustrate how information relevant to the future is integrated with past information to guide behavior.

      Thank you for your enthusiasm for our article and findings as well as for your constructive suggestions for additional analyses that we respond to in detail below.

      This short manuscript presents one experiment with straightforward analyses, clear visualizations, and a convincing interpretation. For their analysis, the authors focus on a single time window in the experimental trial (i.e., 0-1000 ms after retro cue onset). While this time window is most straightforward for the purpose of their study, other time windows are similarly interesting for characterizing the joint consideration of past and future information in memory. First, assessing the gaze biases in the delay period following the cue offset would allow the authors to determine whether the gaze bias towards the future location is sustained throughout the entire interval before the memory test onset. Presumably, the gaze bias towards the past location may not resurface during this delay period, but it is unclear how the bias towards the future location develops in that time window. Also, the disappearance of the retro cue constitutes a visual transient that may leave traces on the gaze biases which speaks again for assessing gaze biases also in the delay period following the cue offset.

      Thank you for raising this important point. We initially focused on the time window during the cue given that our central focus was on gaze-biases associated with mnemonic item selection. By zooming in on this window, we could best visualize our main effects of interest: the joint selection (in time) of past and future memory attributes.

      At the same time, we fully agree that examining the gaze biases over a more extended time window yields a more comprehensive view of our data. To this end, we have now also extended our analysis to include a wider time range that includes the period between cue offset (1000 ms after cue onset) and test onset (1500 ms after cue onset). We present these data below. Because we believe our future readers are likely to be interested in this as well, we have now added this complementary visualization as Supplementary Figure 4 (while preserving the focus in our main figure on the critical mnemonic selection period of interest).

      Author response image 1.

      Supplementary Figure 4. Gaze biases in extended time window as a complement to Figure 1 and Supplementary Figure 2. This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset, the gaze bias towards the future location persists (panel a) and that while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus (panel b).

      This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset (consistent with our prior reports of this bias), the gaze bias towards the future location persists. Moreover, as revealed by the data in panel b above, while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus.

      We now also call out these additional findings and figure in our article:

      Page 2 (Results): “Gaze biases in both axes were driven predominantly by microsaccades (Supplementary Fig. 2) and occurred similarly in horizontal-to-vertical and vertical-tohorizontal trials (Supplementary Fig. 3). Moreover, while the past bias was relatively transient, the future bias continued to increase in anticipation of the of the test stimulus and increasingly incorporated eye-movements beyond the microsaccade range (see Supplementary Fig. 4 for a more extended time range)”.

      Moreover, assessing the gaze bias before retro-cue onset allows the authors to further characterize the observed gaze biases in their study. More specifically, the authors could determine whether the future location is considered already during memory encoding and the subsequent delay period (i.e., before the onset of the retro cue). In a trial, participants encode two oriented gratings presented at opposite locations. The future rule indicates the test locations relative to the encoding locations. In their example (Figure 1a), the test locations are shifted clockwise relative to the encoding location. Thus, there are two pairs of relevant locations (each pair consists of one stimulus location and one potential test location) facing each other at opposite locations and therefore forming an axis (in the illustration the axis would go from bottom left to top right). As the future rule is already known to the participants before trial onset it is possible that participants use that information already during encoding. This could be tested by assessing whether more microsaccades are directed along the relevant axis as compared to the orthogonal axis. The authors should assess whether such a gaze bias exists already before retro cue onset and discuss the theoretical consequences for their main conclusions (e.g., is the future location only jointly used if the test location is implicitly revealed by the retro cue).

      Thank you – this is another interesting point. We fully agree that additional analysis looking at the period prior to retrocue onset may also prove informative. In accordance with the suggested analysis, we have therefore now also analysed the distribution of saccade directions (including in the period from encoding to retrocue) as a function of the future rule (presented below, and now also included as Supplementary Fig. 5). Complementary recent work from our lab has shown how microsaccade directions can align to the axis of memory contents during retention (see de Vries & van Ede, eNeuro, 2024). Based on this finding, one may predict that if participants retain the items in a remapped fashion, their microsaccades may align with the axis of the future rule, and this could potentially already happen prior to cue onset.

      These complementary analyses show that saccade directions are predominantly influenced by the encoding locations rather than the test locations, as seen most clearly by the saccade distribution plots in the middle row of the figure below. To obtain time-courses, we categorized saccades as occurring along the axis of the future rule or along the orthogonal axis (bottom row of the figure below). Like the distribution plots, these time course plots also did not reveal any sign of a bias along the axis of the future rule itself.

      Importantly, note how this does not argue against our main findings of joint selection of past and future memory attributes, as for that central analysis we focused on saccade biases that were specific to the selected memory item, whereas the analyses we present below focus on biases in the axes in which both memory items are defined; not only the cued/selected memory item.

      Author response image 2.

      Supplementary Figure 5. Distribution of saccade directions relative to the future rule from encoding onset. (Top panel) The spatial layouts in the four future rules. (Middle panel) Polar distributions of saccades during 0 to 1500 ms after encoding onset (i.e., the period between encoding onset and cue onset). The purple quadrants represent the axis of the future rule and the grey quadrants the orthogonal axis. (Bottom panel) Time courses of saccades along the above two axes. We did not observe any sign of a bias along the axis of the future rule itself.

      We agree that these additional results are important to bring forward when we interpret our findings. Accordingly, we now mention these findings at the relevant section in our Discussion:

      Page 5 (Discussion): “First, memory contents could have directly been remapped (cf. 4,24–26) to their future-relevant location. However, in this case, one may have expected to exclusively find a future-directed gaze bias, unlike what we observed. Moreover, using a complementary analysis of saccade directions along the axis of the future rule (cf. 24), we found no direct evidence for remapping in the period between encoding and cue (Supplementary Fig. 5)”.

      Reviewer 2, Comments:

      The manuscript by Liu et al. reports a task that is designed to examine the extent to which "past" and "future" information is encoded in working memory that combines a retro cue with rules that indicate the location of an upcoming test probe. An analysis of microsaccades on a fine temporal scale shows the extent to which shifts of attention track the location of the location of the encoded item (past) and the location of the future item (test probe). The location of the encoded grating of the test probe was always on orthogonal axes (horizontal, vertical) so that biases in microsaccades could be used to track shifts of attention to one or the other axis (or mixtures of the two). The overall goal here was then to (1) create a methodology that could tease apart memory for the past and future, respectively, (2) to look at the time-course attention to past/future, and (3) to test the extent to which microsaccades might jointly encode past and future memoranda. Finally, some remarks are made about the plausibility of various accounts of working memory encoding/maintenance based on the examination of these time courses.

      Strengths:

      This research has several notable strengths. It has a clear statement of its aims, is lucidly presented, and uses a clever experimental design that neatly orthogonalizes "past" and "future" as operationalized by the authors. Figure 1b-d shows fairly clearly that saccade directions have an early peak (around 300ms) for the past and a "ramping" up of saccades moving in the forward direction. This seems to be a nice demonstration the method can measure shifts of attention at a fine temporal resolution and differentiate past from future-oriented saccades due to the orthogonal cue approach. The second analysis shown in Figure 2, reveals a dependency in saccade direction such that saccades toward the probe future were more likely also to be toward the encoded location than away from the encoded direction. This suggests saccades are jointly biased by both locations "in memory".

      Thank you for your overall appreciation of our work and for highlighting the above strengths. We also thank you for your constructive comments and call for clarifications that we respond to below.

      Weaknesses:

      (1) The "central contribution" (as the authors characterize it) is that "the brain simultaneously retains the copy of both past and future-relevant locations in working memory, and (re)activates each during mnemonic selection", and that: "... while it is not surprising that the future location is considered, it is far less trivial that both past and future attributes would be retained and (re)activated together. This is our central contribution." However, to succeed at the task, participants must retain the content (grating orientation, past) and probe location (future) in working memory during the delay period. It is true that the location of the grating is functionally irrelevant once the cue is shown, but if we assume that features of a visual object are bound in memory, it is not surprising that location information of the encoded object would bias processing as indicated by microsaccades. Here the authors claim that joint representation of past and future is "far less trivial", this needs to be evaluaed from the standpoint of prior empirical data on memory decay in such circumstances, or some reference to the time-course of the "unbinding" of features in an encoded object.

      Thank you. We agree that our participants have to use the future rule – as otherwise they do not know to which test stimulus they should respond. This was a deliberate decision when designing the task. Critically, however, this does not require (nor imply) that participants have to incorporate and apply the rule to both memory items already prior to the selection cue. It is at least as conceivable that participants would initially retain the two items at their encoded (past) locations, then wait for the cue to select the target memory item, and only then consider the future location associated with the target memory item. After all, in every trial, there is only 1 relevant future location: the one associated with the cued memory item. The time-resolved nature of our gaze markers argues against such a scenario, by virtue of our observation of the joint (simultaneous) consideration of past and future memory attributes (as opposed to selection of past-before-future). These temporal dynamics are central to the insights provided by our study.

      In our view, it is thus not obvious that the rule would be applied at encoding. In this sense, we do not assume that the future location is part of both memory objects from encoding, but rather ask whether this is the case – and, if so, whether the future location takes over the role of the past location, or whether past and future locations are retained jointly.

      Our statements regarding what is “trivial” and what is “less trivial” regard exactly this point: it is trivial that the future is considered (after all, our task demanded it). However, it is less trivial that (1) the future location was already available at the time of initial item selection (as reflected in the simultaneous engagement of past and future locations), and (2) that in presence of the future location, the past location was still also present in the observed gaze biases.

      Having said that, we agree that an interesting possibility is that participants remap both memory items to their future-relevant locations ahead of the cue, but that the past location is not yet fully “unbound” by the time of the cue. This may trigger a gaze bias not only to the new future location but also to the “sticky” (unbound) past location. We now acknowledge this possibility in our discussion (also in response to comment 3 below) where we also suggest how future work may be able to tap into this:

      Page 6 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”

      (2) The authors refer to "future" and "past" information in working memory and this makes sense at a surface level. However, once the retrocue is revealed, the "rule" is retrieved from long-term memory, and the feature (e.g. right/left, top/bottom) is maintained in memory like any other item representation. Consider the classic test of digit span. The digits are presented and then recalled. Are the digits of the past or future? The authors might say that one cannot know, because past and future are perfectly confounded. An alternative view is that some information in working memory is relevant and some is irrelevant. In the digit span task, all the digits are relevant. Relevant information is relevant precisely because it is thought be necessary in the future. Irrelevant information is irrelevant precisely because it is not thought to be needed in the immediate future. In the current study, the orientation of the grating is relevant, but its location is irrelevant; and the location of the test probe is also relevant.

      Thank you for this stimulating reflection. We agree that in our set-up, past location is technically “task-irrelevant” while future location is certainly “task-relevant”. At the same time, the engagement of the past location suggests to us that the brain uses past location for the selection – presumably because the brain uses spatial location to help individuate/separate the items, even if encoded locations are never asked about. Therefore, whether something is relevant or irrelevant ultimately depends on how one defines relevance (past location may be relevant/useful for the brain even if technically irrelevant from the perspective of the task). In comparison, the use of “past” and “future” may be less ambiguous.

      It is also worth noting how we interpret our findings in relation to demands on visual working memory, inspired by dynamic situations whereby visual stimuli may be last seen at one location but expected to re-appear at another, such as a bird disappearing behind a building (the example in our introduction). Thus, past for us does not refer to the memory item perse (like in the digit span analogue) but, rather, quite specifically to the past location of a dynamic visual stimulus in memory (which, in our experiment, was operationalised by the future rule, for convenience).

      (3) It is not clear how the authors interpret the "joint representation" of past and future. Put aside "future" and "past" for a moment. If there are two elements in memory, both of which are associated with spatial bindings, the attentional focus might be a spatial average of the associated spatial indices. One might also view this as an interference effect, such that the location of the encoded location attracts spatial attention since it has not been fully deleted/removed from working memory. Again, for the impact of the encoded location to be exactly zero after the retrieval cue, requires zero interference or instantaneous decay of the bound location information. It would be helpful for the authors to expand their discussion to further explain how the results fit within a broader theoretical framework and how it fits with empirical data on how quickly an irrelevant feature of an object can be deleted from working memory.

      Thank you also for this point (that is related to the two points above). As we stated in our reply to comment 1 above, we agree that one possibility is that the past location is merely “sticky” and pulls the task-relevant future bias toward the past location. If so, our time courses suggest that such “pulling” occurs only until approximately 600 ms after cue onset, as the past bias is only transient. An alternative interpretation is that the past location may not be merely a residual irrelevant trace, but actually be useful and used by the brain.

      For example, the encoded (past) item locations provide a coordinate system in which to individuate/separate the two memory items. While the future locations also provide such a coordinate system, the brain may benefit from holding onto both coordinate systems at the same time, rendering our observation of joint selection in both frames. Indeed, in a recent VR experiment in which we had participants (rather than the items) rotate, we also found evidence for the joint use of two spatial frames, even if neither was technically required for the upcoming task (see Draschkow, Nobre, van Ede, Nature Human Behaviour, 2022). Though highly speculative at this stage, such reliance on multiple spatial frames may make our memories more robust to decay and/or interference. Moreover, while past location was never explicitly probed in our task, in daily life the past location may sometimes (unexpectedly) become relevant, hence it may be useful to hold onto it, just in case. Thus, considering the past location merely as an “irrelevant feature” (that takes time to delete) may not do sufficient justice to the potential roles of retaining past locations of dynamic visual objects held in working memory.

      As also stated in response to comment 1 above, we now added these relevant considerations to our Discussion:

      Page 5 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”

      Reviewer 3, Comments:

      This study utilizes saccade metrics to explore, what the authors term the "past and future" of working memory. The study features an original design: in each trial, two pairs of stimuli are presented, first a vertical pair and then a horizontal one. Between these two pairs comes the cue that points the participant to one target of the first pair and another of the second pair. The task is to compare the two cued targets. The design is novel and original but it can be split into two known tasks - the first is a classic working memory task (a post-cue informs participants which of two memorized items is the target), which the authors have used before; and the second is a classic spatial attention task (a pre-cue signal that attention should be oriented left or right), which was used by numerous other studies in the past. The combination of these two tasks in one design is novel and important, as it enables the examination of the dynamics and overlapping processes of these tasks, and this has a lot of merit. However, each task separately is not new. There are quite a few studies on working memory and microsaccades and many on spatial attention and microsaccades. I am concerned that the interpretation of "past vs. future" could mislead readers to think that this is a new field of research, when in fact it is the (nice) extension of an existing one. Since there are so many studies that examined pre-cues and post-cues relative to microsaccades, I expected the interpretation here to rely more heavily on the existing knowledge base in this field. I believe this would have provided a better context of these findings, which are not only on "past" vs. "future" but also on "working memory" vs. "spatial attention".

      Thank you for considering our findings novel and important, while at the same time reminding us of the parallels to prior tasks studying spatial attention in perception and working memory. We fully agree that our task likely engages both attention to the (past) memory item as well as spatial attention to the upcoming (future) test stimulus. At the same time, there is a critical difference in spatial attention for the future in our task compared with ample prior tasks engaging spatial cueing of attention for perception. In our task, the cue never directly cues the future location. Rather, it exclusively cues the relevant memory item. It is the memory item that is associated with the relevant future location, according to the future rule. This integration of the rule-based future location into the memory representation is distinct from classical spatial-attention tasks in which attention is cued directly to a specific location via, for example, a spatial cue such as an arrow.

      Thus, if we wish to think about our task as engaging cueing of spatial attention for perception, we have to at least also invoke the process of cueing the relevant location via the appropriate memory item. We feel it is more parsimonious to think of this as attending to both the past and future location of a dynamic visual object in working memory.

      If we return to our opening example, when we see a bird disappear behind a building, we can keep in working memory where we last saw it, while anticipating where it will re-appear to guide our external spatial attention. Here too, spatial attention is fully dependent on working-memory content (the bird itself) – mirroring the dynamic semng in our study. Thus, we believe our findings contribute a fresh perspective, while of course also extending established fields. We now contextualize our finding within the literature and clarify our unique contribution in our revised manuscript:

      Page 5 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”

      Reviewer 2, Recommendations:

      It would be helpful to set up predictions based on existing working memory models. Otherwise, the claim that the joint coding of past/future is "not trivial" is simply asserted, rather than contradicting an existing model or prior empirical results. If the non-trivial aspect is simply the ability to demonstrate the joint coding empirical through a good experimental design, make it clear that this is the contribution. For example, it may be that prevailing models predict exactly this finding, but nobody has been able to demonstrate it cleanly, as the authors do here. So the non-triviality is not that the result contradicts working memory models, but rather relates to the methodological difficulty of revealing such an effect.

      Thank you for your recommendation. First, please see our point-by-point responses to the individual comments above, where we also state relevant changes that we have made to our article, and where we clarify what we meant with “non trivial”. As we currently also state in our introduction, our work took as a starting point the framework that working memory is inherently about the past while being for the future (cf. van Ede & Nobre, Annual Review of Psychology, 2023). By virtue of our unique task design, we were able to empirically demonstrate that visual contents in working memory are selected via both their past and their future-relevant locations – with past and future memory attributes being engaged together in time. With “not trivial” we merely intend to make clear that there are viable alternatives than the findings we observed. For example, past could have been replaced by the future, or it could have been that item selection (through its past location) was required before its future-relevant location could be considered (i.e. past-before-future, rather than joint selection as we reported). We outline these alternatives in the second paragraph of our Discussion:

      Page 5 (Discussion): “Our finding of joint utilisation of past and future memory attributes emerged from at least two alternative scenarios of how the brain may deal with dynamic everyday working memory demands in which memory content is encoded at one location but needed at another.

      First, [….]”

      Our work was not motivated from a particular theoretical debate and did not aim to challenge ongoing debates in the working-memory literature, such as: slot vs. resource, active vs. silent coding, decay vs. interference, and so on. To our knowledge, none of these debates makes specific claims about the retention and selection of past and future visual memory attributes – despite this being an important question for understanding working memory in dynamics everyday semngs, as we hoped to make clear by our opening example.

      Reviewer 3, Recommendations:

      I recommend that the present findings be more clearly interpreted in the context of previous findings on working memory and attention. The task design includes two components - the first (post-cue) is a classic working memory task and the second (the pre-cue) is a classic spatial attention design. Both components were thoroughly studied in the past and this previous knowledge should be better integrated into the present conclusions. I specifically feel uncomfortable with the interpretation of past vs. future. I find this framework to be misleading because it reads like this paper is on a topic that is completely new and never studied before, when in fact this is a study on the interaction between working memory and spatial attention. I recommend the authors minimize this past-future framing or be more explicit in explaining how this new framework relates to the more common terminology in the field and make sure that the findings are not presented in a vacuum, as another contribution to the vibrant field that they are part of.

      Thank you for these recommendations. Please also see our point-by-point responses to the individual comments above. Here, we explained our logic behind using the terminology of past vs. future (in addition, see also our response to point 2 or reviewer 2). Here, we also stated relevant changes that we have made to our manuscript to explain how our findings complement – but are also distinct from – prior tasks that used pre-cues to direct spatial attention to an upcoming stimulus. As we explained above, in our task, the cue itself never contained information about the upcoming test location. Rather, the upcoming test location was a property of the memory item (given the future rule). Hence, we referred to this as a “future attribute” of the cued memory item, rather than as the “cued location” for external spatial attention. Still, we agree the future bias likely (also) reflects spatial allocation to the upcoming test array, and we explicitly acknowledge this in our discussion. For example:

      Page 5 (Discussion): “This signal may reflect either of two situations: the selection of a future-copy of the cued memory content or anticipatory attention to its the anticipated location of its associated test-stimulus. Either way, by the nature of our experimental design, this future signal should be considered a content-specific memory attribute for two reasons. First, the two memory contents were always associated with opposite testing locations, hence the observed bias to the relevant future location must be attributed specifically to the cued memory content. Second, we cued which memory item would become tested based on its colour, but the to-be-tested location was dependent on the item’s encoding location, regardless of its colour. Hence, consideration of the item’s future-relevant location must have been mediated by selecting the memory item itself, as it could not have proceeded via cue colour directly.”

      Page 6 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      We are grateful for the overall positive feedback from the reviewer.

      We agree with the reviewer that our data showing cellular co-localization between PRC1 and BIN1 requires further investigation in future studies, however, we are confident that in the current form, our manuscript already presents multiple evidences for the role of BIN1 in mitotic processes. We would like to emphasize that PRC1 is not the sole BIN1 partner that connects it to mitotic processes, but it is only one out of more than a dozen that we identified in our study. Furthermore, the mitotic connection with BIN1 is not absolutely novel as BIN1 levels are mildly fluctuating during the cell cycle, similar to other proteins involved in the regulation of the cell cycle (Santos et al., 2015) and because DNM2 is also a well-accepted actor during mitosis (Thompson et al., 2002).

      The less marked co-localization between BIN1 and PRC1 compared to the strong co-localization between BIN1 and DNM2 can be a consequence of their weaker affinity and their partial binding. Yet, this does not necessarily imply that stronger interactions have more biological significance. For example, weaker affinities can be compensated by local concentrations to achieve an even higher degree of cellular complexes than of strongly binding interactions that are separated within the cell. Furthermore, even the degree of complex formation cannot be used intuitively to estimate the biological significance of a complex because complexes can trigger very important biological processes even at very low abundances, e.g. by catalyzing enzymatic reactions. Deciding what is and what is not “biologically significant” among the identified interactions remains to be answered in the future, once we are able to overview complex biological processes in a holistic manner.

      In the revised version, we implemented minor changes to further clarify the raised points.

      Reviewer #2:

      We thank the reviewer for the careful assessment and we are pleased to see the positive enthusiasm regarding our affinity interactomic strategy.

      The reviewer points out that affinities were only measured with a single technique, which is relatively unproven. While it is true that our work uses two techniques building on the same holdup concept, we rather believe that this approach is well-proven. The original holdup method was described almost 20 years ago and since then, it has been used in more than 10 publications for quantitative interactomics. Over the years, at least five distinct generations of the assay were developed, all building on the expertise of the preceding one. In the past, we extensively proved that the resulting affinities show excellent agreement with affinities measured with other methods, such as fluorescence polarization, isothermal titration calorimetry, or surface plasmon resonance (for example in Vincentelli et al. Nat. Meth. 2015; Gogl et al. 2020 Structure; Gogl et al. 2022 Nat.Com.). However, it is true that the most recent variation of this method family, called native holdup, is a fairly new approach published just a bit more than a year ago and this is only the third work that utilizes this method. Yet, in our original work describing the method, we demonstrated good agreement with the results of previous holdup experiments, as well as with orthogonal affinity measurements (Zambo et al. 2022).

      Importantly, the reviewer raises concerns regarding the number of replicates used in our study, as well as the reliability of our methodology. We are glad for such a comment as it allows us to explain our motives behind experimental design which is most often left out from scientific works to save space and keep focus on results. The reason why we use technical replicates instead of the typical biological replicates lies in the nature of the holdup assay. In a typical interactomic assay, such as immunoprecipitation, a lot of variables can perturb the outcome of the measurement, such as bait immobilization, or captured prey leakage during washing steps. The output of such an experiment is a list of statistically significant partners and to minimize these variabilities, biological replicates are used. In the case of a native holdup approach, a panel of an equal amount of resins, all saturated with different baits or controls, is mixed with an equal amount of cell extract, taken from a single tube, and after a brief incubation, the supernatant of this mixture is analyzed. The output of such an experiment is a list of relative concentrations of prey and to maximize its accuracy, we use technical replicates. Using an ideal analytical method, such as fluorescence, it is not necessary to use technical replicates to reach accurate results. For example, the general accuracy of a holdup experiment coupled with a robust analytical approach can be seen clearly in our fragmentomic holdup data shown in Figure 7C where mutant domains that do not have any impact on the interactome show extreme agreement in affinities. Unfortunately, mass spectrometry is less accurate as an analytical method, hence we use technical triplicates to compensate for this. Finally, in the case of BIN1, an independent nHU measurement was also performed using a less capable mass spectrometer. Not counting the 117 detected partners of BIN1 that were only detected in only one of these proteomic measurements, 29 partners were identified as common significant partners in both of these measurements showing nearly identical affinities with a mean standard deviation between measured pKapp values of 0.18, meaning that the obtained dissociation constants are within a <2.5-fold range with >95% probability. There were also 61 BIN1 partners that were detected in both proteomic measurements but were only identified as a significant interaction partner in one of these experiments. Yet many of them show binding in both assays, albeit were found to be not significant in one of these assays. For example, CDC20 shows 66% depletion in one assay (significant binding) while it shows 54% depletion in the other (not significant binding), or CKAP2 shows 58% depletion in one assay (significant binding) while it shows 41% depletion in the other (not significant binding). We hope that these examples show that statistical significance in nHU experiments rather signifies how certain we are in a particular affinity measurement and not the accuracy of the affinity measurement itself. While there are true discrepancies between some of the affinity measurements between these experiments, that would be possible to clarify with more experimental replicates, the raw data presented in our work clearly demonstrate the strength and robustness of a fully quantitative interactomic assay.

      In the revised version, we clarified the number of replicates in the text, in the figure legends, and included some of this discussion in the method section.

      The reviewer had some very useful comments regarding affinity differences between short fragments and full-length proteins. In his comment, he possibly made a typo as we find that fulllength proteins typically interact with higher affinities compared to short PxxP motif fragments in isolation and not weaker. The reviewer also comments that we explain this difference with cooperativity. In a previous preprint version, which the reviewer may have seen, this was indeed the case, but since we realized that we did not have sufficient evidence supporting this model, therefore we did not discuss this in detail in the last version submitted to eLife. To clarify this, we included more discussion about the observed differences in the affinities between fragments and full-length proteins, but since we have limited data to make solid conclusions, we do not go into details about underlying models.

      Instead of cooperativity, the reviewer suggests that the observed differences may originate from additional residues that were not included in our peptides. Indeed, many similar experiments fail because of suboptimal peptide library design. Our peptide library was constructed as 15-mer, xxxxxxPxxPxxxxx motifs and we do not see a strong contribution of residues at the far end of these peptides. Specificity logo reconstructions are expected to identify all key residues that participate in SH3 domain binding, and based on this, all key residues of the identified motifs can be included in shorter 10-mer, xxxPxxPxxx motifs. Therefore, it is unlikely that residues outside our peptide regions will greatly contribute to the site-specific interactions of SH3 domains. It is however possible that other sites, that are sequentially far away from the studied PxxP motifs, are also capable of binding to SH3 through a different surface, but in light of the small size of an isolated SH3 domain, we believe it is very unlikely. It is also possible that BIN1 could also interact with other types of SH3 binding motifs that were not included in our peptide library. We think a more likely explanation is some sort of cooperativity. Cooperativity, or rather synergism between different sites can be easily explained in typical situations, such as in the case of a bimolecular interaction that is mediated by two independent sites. In such an event, once one site is bound, the second binding event will likely also occur because of the high effective local concentration of the binding sites. However, cooperativity can also form in atypical conditions and a molecular explanation for these events is rather elusive. As BIN1 contains a single SH3 domain, its binding to targets containing more binding sites can be challenging to interpret. If these sites are part of a greater Pro-rich region, such as in the case of DNM2, it is possible that the entire region adopts a fuzzy, malleable, yet PPII-like helical conformation. Once the SH3 domain is recruited to this helical region, it can freely trans-locate within this region via lateral diffusion and it will pause on optimal PxxP motifs. As an alternative to this sliding mechanism, a diffusion-limited cooperative binding can also occur. If the two motifs are not part of the same Pro-rich region, but are relatively close in space, such as in the case of ITCH or PRC1, once a BIN1 molecule dissociates from one site, it has a higher chance to rebind to the second site due to higher local concentrations. Such an event can more likely occur if a transient, but relatively stable encounter complex exists between the two molecules, from which complex formation can occur at both sites (A+B↔AB; AB↔ABsite1; AB*↔ABsite2). However, this large effective local concentration in this encounter complex is only temporary because diffusion rapidly diminishes it, although weak electrostatic interactions can increase the lifetime of such encounter complexes. In contrast, the large effective local concentration in conventional multivalent binding is time-independent and only determined by the geometry of the complex. Finally, it may also occur that our empirical bait concentration estimation for immobilized biotinylated proteins is less accurate than the concentration estimation of peptide baits because we approximate this value based on peptide baits. For this technical reason, which was discussed in detail in the original paper describing the nHU approach, we are carefully using apparent affinities for nHU experiments. Nevertheless, even without accurate bait concentrations, our nHU experiment provides precise relative affinities and, thus partner ranking. Either of the mechanisms underlying the interactions we study would be difficult to further explore experimentally, especially at the proteomic level.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We greatly appreciate the comments from the editor and the reviewers, based on which we have made the revisions. We have responded to all the questions and summarized the revisions below. The changes are also highlighted in the manuscript.

      Additionally, we’ve noticed a few typos in the manuscript presented on the eLife website, which were not there in our originally submitted file.

      (1) In both the “Full text” presented on the eLife website and the pdf file generated after clicking “Download”: the last FC1000 in the second paragraph of the “Extensive induction curves fitting of TetR mutants” section should be FC1000WT .

      (2) In the pdf file generated after clicking “Download”: the brackets are all incorrectly formatted in the captions of Figure 4 and Figure 3—figure supplement 6.

      eLife assessment

      The fundamental study presents a two-domain thermodynamic model for TetR which accurately predicts in vivo phenotype changes brought about as a result of various mutations. The evidence provided is solid and features the first innovative observations with a computational model that captures the structural behavior, much more than the current single-domain models.

      We appreciate the supportive comments by the editor and reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors’ earlier deep mutational scanning work observed that allosteric mutations in TetR (the tetracycline repressor) and its homologous transcriptional factors are distributed across the structure instead of along the presumed allosteric pathways as commonly expected. Especially, in addition, the loss of the allosteric communications promoted by those mutations, was rescued by additional distributed mutations. Now the authors develop a two-domain thermodynamic model for TetR that explains these compelling data. The model is consistent with the in vivo phenotypes of the mutants with changes in parameters, which permits quantification. Taken together their work connects intra- and inter-domain allosteric regulation that correlate with structural features. This leads the authors to suggest broader applicability to other multidomain allosteric proteins. Here the authors follow their first innovative observations with a computational model that captures the structural behavior, aiming to make it broadly applicable to multidomain proteins. Altogether, an innovative and potentially useful contribution.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      None that I see, except that I hope that in the future, if possible, the authors would follow with additional proteins to further substantiate the model and show its broad applicability. I realize however the extensive work that this would entail.

      We thank the reviewer for the supportive comments and the suggestion to extend the model to other proteins, which we indeed plan to pursue in future studies.

      Reviewer #2 (Public Review):

      Summary:

      This combined experimental-theoretical paper introduces a novel two-domain statistical thermodynamic model (primarily Equation 1) to study allostery in generic systems but focusing here on the tetracycline repressor (TetR) family of transcription factors. This model, building on a function-centric approach, accurately captures induction data, maps mutants with precision, and reveals insights into epistasis between mutations.

      Strengths:

      The study contributes innovative modeling, successful data fitting, and valuable insights into the interconnectivity of allosteric networks, establishing a flexible and detailed framework for investigating TetR allostery. The manuscript is generally well-structured and communicates key findings effectively.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The only minor weakness I found was that I still don’t have a better sense into (a) intuition and (b) mathematical derivation of Equation 1, which is so central to the work. I would recommend that the authors provide this early on in the main text.

      We thank the reviewer for the suggestion. The full mathematical derivation of Equation 1 is given in the first section of the supplementary file. Given the length of the derivation, we think it’s better to keep it in the supplementary file rather than the main text. In the main text, the first subsection (overview of the two-domain thermodynamic model of allostery) of the Results section and the paragraph right before Equation 1 are meant for providing intuitive understandings of the two-domain model and the derivation of Equation 1, respectively.

      We would also like to point the reviewer to Figure 2-figure supplement 2 and Equations (12) to (18) in the supplementary file for an alternative derivation. They show that the equilibria among all molecular species containing the operator are dictated by the binding free energies, the ligand concentration, and the allosteric parameters. The probability of an unbound operator (proportional to the probability that the promoter is bound by a RNA polymerase, or the gene expression level) can thus be calculated using Equation (12), which then leads to main text Equation 1 following the derivation given there.

      Additionally, we’ve added a paragraph to the main text (line 248-260) to aid an intuitive understanding of Equation 1.

      “The distinctive roles of the three biophysical parameter on the induction curve as stipulated in Equation 1 could be understood in an intuitive manner as well. First, the value of εD controls the intrinsic strength of binding of TetR to the operator, or the intrinsic difficulty for ligand to induce their separation. Therefore, it controls how tightly the downstream gene is regulated by TetR without ligands (reflected in leakiness) and affects the performance limit of ligands (reflected in saturation). Second, the value of εL controls how favorable ligand binding is in free energy. When εL increases, the binding of ligand at low concentrations become unfavorable, where the ligands cannot effectively bind to TetR to induce its separation from the operator. Therefore, the fold-change as a function of ligand concentration only starts to noticeably increase at higher ligand concentrations, resulting in larger EC50. Third, as discussed above, γ controls the level of anti-cooperativity between the ligand and operator binding of TetR, which is the basis of its allosteric regulation. In other words, γ controls how strongly ligand binding is incompatible with operator binding for TetR, hence it controls the performance limit of ligand (reflected in saturation).”

      We hope that the reviewer will find this explanation helpful.

      Reviewer #3 (Public Review):

      Summary:

      Allosteric regulations are complicated in multi-domain proteins and many large-scale mutational data cannot be explained by current theoretical models, especially for those that are neither in the functional/allosteric sites nor on the allosteric pathways. This work provides a statistical thermodynamic model for a two-domain protein, in which one domain contains an effector binding site and the other domain contains a functional site. The authors build the model to explain the mutational experimental data of TetR, a transcriptional repress protein that contains a ligand and a DNA-binding domain. They incorporate three basic parameters, the energy change of the ligand and DNA binding domains before and after binding, and the coupling between the two domains to explain the free energy landscape of TetR’s conformational and binding states. They go further to quantitatively explain the in vivo expression level of the TetR-regulated gene by fitting into the induction curves of TetR mutants. The effects of most of the mutants studied could be well explained by the model. This approach can be extended to understand the allosteric regulation of other two-domain proteins, especially to explain the effects of widespread mutants not on the allosteric pathways. Strengths: The effects of mutations that are neither in the functional or allosteric sites nor in the allosteric pathways are difficult to explain and quantify. This work develops a statistical thermodynamic model to explain these complicated effects. For simple two-domain proteins, the model is quite clean and theoretically solid. For the real TetR protein that forms a dimeric structure containing two chains with each of them composed of two domains, the model can explain many of the experimental observations. The model separates intra and inter-domain influences that provide a novel angle to analyse allosteric effects in multi-domain proteins.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      As mentioned above, the TetR protein is not a simple two-main protein, but forms a dimeric structure in which the DNA binding domain in each chain forms contacts with the ligand-binding domain in the other chain. In addition, the two ligand-binding domains have strong interactions. Without considering these interactions, especially those mutants that are on these interfaces, the model may be oversimplified for TetR.

      We thank the reviewer for this valid concern and acknowledge that TetR is a homodimer. However, we’ve deliberately chosen to simplify this complexity in our model for the following reasons.

      (1) In this work, we aim to build a minimalist model for two-domain allostery withonly the most essential parameters for capturing experimental data. The simplicity of the model helps promote its mechanistic clarity and potential transferability to other allosteric systems.

      (2) Fewer parameters are needed in a simpler model. Our two-domain modelcurrently uses only three biophysical parameters, which are all demonstrated to have distinct influences on the induction curve (see the main text section “System-level ramifications of the two-domain model”). This enables the inference of parameters with high precision for the mutants, and the quantification of the most essential mechanistic effects of their mutations, provided that the model is shown to accurately recapitulate the comprehensive dataset. Thus, we found it was unnecessary to add another parameter for explicitly describing inter-chain coupling, which would likely incur uncertainty in the inference of parameters due to the redundancy of their effects on induction data, and prevent the model from making faithful predictions.

      (3) From a more biological point of view, TetR is an obligate dimer, meaning thatthe two chains must synchronize for function, supporting the two-domain simplification of TetR for binding concerns.

      Additionally, as shown in the subsection “Inclusion of single-ligand-bound state of repressor” of section 1 of the supplementary file, incorporating the dimeric nature of TetR in our model by allowing partial ligand binding does not change the functional form of main text equation 1 in any practical sense. Therefore, considering all the factors stated above, we think that increasing the complexity of the two-domain model will only be necessary if additional data emerge to suggest the limitation of our model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is an excellent work. I have only one suggestion for the authors. Interestingly, the authors also note that the epistatic interactions that they obtain are consistent with the structural features of the protein, which is not surprising. Within this framework, have the authors considered rescue mutations? Please see for example PMID: 18195360 and PMID: 15683227. If I understand right, this might further extend the applicability of their model. If so, the authors may want to add a comment to that effect.

      We thank the reviewer for the supportive comments and for pointing us to the useful references. We have added some comments to the main text regarding this point in line 332-336: “The diverse mechanistic origins of the rescuing mutations revealed here provide a rational basis for the broad distributions of such mutations. Integrating such thermodynamic analysis with structural and dynamic assessment of allosteric proteins for efficient and quantitative rescuing mutation design could present an interesting avenue for future research, particularly in the context of biomedical applications (PMID: 18195360, PMID: 15683227).”

      Reviewer #3 (Recommendations For The Authors):

      The authors should try to build a more realistic dimeric model for TetR to see if it could better explain experimental data. If it were too complicated for a revision, more discussions on the weakness of the current model should be given.

      We thank the reviewer for this valid concern and for the suggestion. The reasons for refraining from increasing the complexity of the model are fully discussed in our response to the reviewer’s public review given above. Primarily, we think that the value of a simple physical model is two-fold (e.g., the paradigm Ising model in statistical physics and the classic MWC model), first, its mechanistic clarity and potential transferability makes it a useful conceptual framework for understanding complex systems and establishing universal rules by comparing seemingly unrelated phenomena; second, it provides useful insights and design principles of specific systems if it can quantitatively capture the corresponding experimental data. Thus, given the current experimental data set, we believe it is justified to keep the two-domain model in its current form, while additional experimental data could necessitate a more complex model for TetR allostery in the future. Relevant discussions are added to the main text (line 443-446) and section 8 of the supplementary file.

      “It’s noted that the homodimeric nature of TetR is ignored in the current two-domain model to minimize the number of parameters, and additional experimental data could necessitate a more complex model for TetR allostery in the future (see supplementary file section 8 for more discussions).”

      Minor issues:

      (1) There is an error in Figure 3A, the 13th and 14th subgraphs are the same and should be corrected.

      We thank the reviewer for capturing this error, which has been corrected in the revised manuscript.

      (2) The criteria for the selection of mutants for analysis should be clearly given. Apart from deleting mutants that are in direct contact with the ligand of DNA, how many mutants are left, and how far are they are from the two sites? In line 257, what are the criteria for selecting these 15 mutants? Similarly, in line 332, what are the criteria for selecting these 8 mutants?

      We thank the reviewer for this comment. The data selection criteria are now added in section 7 of the supplementary file. The distances to the DNA operator and ligand of the 21 residues under mutational study are now added in Table 1 (Figure 3-figure supplement 9). The added materials are referenced in the main text where relevant.

      “7. Mutation selection for two-domain model analysis

      In this work, there are 24 mutants studied in total including the WT, and they contain mutations at 21 WT residues. We did not perform model parameter inference for the mutant G102D because of its flat induction curve (see the second subsection of section 2 and main text Figure 2—figure Supplement 3). Therefore, there are 23 mutants analyzed in main text Figure 5.

      Measuring the induction curve of a mutant involves a significant amount of experimental effort, which therefore is hard to be extended to a large number of mutants. Nonetheless, we aim to compose a set of comprehensive induction data here for validating our two-domain model for TetR allostery. To this end, we picked 15 individual mutants in the first round of induction curve measurements, which contains mutations spanning different regions in the sequence and structure of TetR (main text Figure 3—figure Supplement 1). Such broad distribution of mutations across LBD, DBD and the domain interface could potentially lead to diverse induction curve shapes and mutant phenotypes for validating the two-domain model. Indeed, as discussed in the main text section "Extensive induction curves fitting of TetR mutants", the diverse effects on induction curve from mutations perturbing different allosteric parameters predicted by the model, are successfully observed in these 15 experimental induction curves. Additionally, 5 of the 15 mutants contain a dead-rescue mutation pair, which helps us validate the model prediction that a dead mutation could be rescued by rescuing mutations that perturb the allosteric parameters in various ways.

      Eight mutation combinations were chosen for the second round of induction curve measurement for studying epistasis, where we paired up C203V and Y132A with mutations from different regions of the TetR structure. Such choice is largely based on two considerations. 1. As both C203V and Y132A greatly enhance the allosteric response of TetR, we want to probe why they cannot rescue a range of dead mutations as observed previously (PMID: 32999067). 2. C203V and Y132A are the only two mutants that show enhanced allosteric response in the first round of analysis. Combining detrimental mutations of allostery in a combined mutant could potentially lead to near flat induction curve, which is less useful for inference (see the second subsection of section 2).”

      Since the number of hotspots identified by DMS is not very large, why not analyze them all?

      We thank the reviewer for this comment. There are 41 hotspot residues in TetR (PMID: 36226916), which have 41*19=779 possible single mutations. It’s unfeasible to perform induction curve measurements for all of these 779 mutants in our current experiment. However, we agree that it would be helpful if we can obtain such a dataset in an efficient way.

      In line 257, there are 15 mutants mentioned, while in Figure 5, there are 23 mutants mentioned, in Figure 3-figure supplement 1, there are 21 mutants mentioned, and in line 226 of the supplementary file, there are 24 mutants mentioned, which is very confusing. Therefore, the data selection criteria used in this article should be given.

      We thank the reviewer for this comment. The data selection criteria are now given in section 7 of the supplementary file, which should clarify this confusion.

      (3) In Figure 4 of the Exploring epistasis between mutations section, the 6 weights of the additive models corresponding to each mutation combination are different. On one hand, it seems that there are no universal laws in these experimental data. On the other hand, unique parameters of a single mutation combination were not validated in other mutation combinations, which somewhat weakened the conclusions about the potential physical significance of these additive weights.

      We thank the reviewer for this comment. We admit that a quantitative universal law for tuning the 6 weights of the additive model does not manifest in our data, which indicates the mutation-specific nature of epistatic interactions in TetR as hinted in the different rescuing mutation distributions of different dead mutations (PMCID: PMC7568325). However, clear common trends in the weight tuning of combined mutants that contain common mutations do emerge, which comply with the structural features of the protein and provide explanations as to why C203V and Y132A don’t rescue a range of dead mutations (main text section “Exploring epistasis between mutations”). Additionally, the lack of a quantitative universal rule for tuning the 6 weights in our simple model doesn’t exclude the possibility of the existence of universal law for epistasis in TetR in another functional form, a point that could be explored in the future with more extensive joint experimental and computational investigations.

      In Eq. (27) of the supplementary file, the prior distribution of inter-domain coupling γ is given as a Gaussian distribution centered at 5 kBT. Since the absolute value of γ is important, can the authors explain why the prior distribution of γ is set to this value and what happens if other values are used?

      We thank the reviewer for the question. As explained in the corresponding discussions of Eq. (27) in the supplementary file, the prior of γ is chosen to serve as a soft constraint on its possible values based on the consideration that 1. inter-domain energetics for a TetR-like protein should be on the order of a few kBT; and 2. the prior distribution should reflect the experimental observation in the literature that γ has a small probability of adopting negative values upon mutations. Given our thorough validation of the statistical model and computational algorithm (see section 3 of the supplementary file), and the high precision in the parameter fitting results using experimental data (Figure 3 and Figure 4-figure supplement 2), we conclude that 1. the physical range of parameters encoded in their chosen prior distributions agrees well with the value reflected in the experimental data; 2. the inference results are predominantly informed by the data. Thus, changing the mean of the prior distribution of γ should not affect the inference results significantly given that it remains in the physical range.

      This point is explicitly shown in the added Table 2 (Figure 3-figure supplement 10), where we compare the current Bayesian inference results with those obtained after increasing the standard deviation of the Gaussian prior of γ from 2.5 to 5 kBT. As shown in the table, most inference results stay virtually unchanged at the use of this less informative prior, which confirms that they are predominantly informed by the data. The only exceptions are the slight increase of the inferred γ values for C203V, C203V-Y132A and C203V-G102D-L146A, reflecting the intrinsic difficulty of precise inference of large γ values with our model, as is already discussed in the second subsection of section 3 of the supplementary file. However, such observations comply with the common trend of epistatic interactions involving C203V presented in the main text and don’t compromise the ability of our model to accurately capture the induction curves of mutants. Relevant discussions are now added to the second subsection of section 3 of the supplementary file (line 368-385).

      “In our experimental dataset, such inference difficulty is only observed in the case of C203V, Y132A-C203V and C203V-G102D-L146A due to their large γ and γ + εL values (see main text Figure 3, Figure 3—figure Supplement 10 and Figure 4). As shown in main text Figure 3—figure Supplement 10, the inference results for the other 20 mutants stay highly precise and virtually unchanged after increasing the standard deviation of the Gaussian prior of γ (gstdγ ) from 2.5 to 5 kBT. This demonstrates that the inference results for these mutants are strongly informed by the induction data and there is no difficulty in the precise inference of the parameter values. On the other hand, the inferred γ values (especially the upper bound of the 95% credible region) for C203V, Y132A-C203V and C203V-G102D-L146A increased with gstdγ . This is because the induction curves in these cases are not sensitive to the value of γ given that it’s large enough as discussed above. Hence, when unphysically large γ values are permitted by the prior distribution, they could enter the posterior distribution as well. Such difficulty in the precise inference of γ values for these three mutants however, doesn’t compromise the ability of our model in accurately capturing the comprehensive set of induction data (see part iv below). Additionally, the increase of the inferred γ value of C203V at the use of larger gstdγ complies with the results presented in main text Figure 4, which show that the effect of C203V on γ tends to be compromised when combined with mutations closer to the domain interface."

    1. We can’t master knowledge. It’s what we live in. This requires a radical shift of worldview from colonialist to ecological. The colonial approach to knowledge is to capture it in order to profit from it. The ecological approach is to live within it as within a garden to be tended. The two worldviews may well be mutually incompatible, though this matter is hardly resolved yet.

      Vgl [[Netwerkleren Connectivism 20100421081941]] / [[Context is netwerk van betekenis 20210418104314]] [[Observator geeft betekenis 20210417124703]] . I think K as stock is prone to collector's fallacy. My working def of K is agency along lines of Sveiby. Such K is always situated in the interaction with the world, networks of meaning as context. This as K isn't merely purified I (DIKW pyramid is bogus), it's weaving I, experience, context, skills into a meaningful whole, and it needs an agent to decide on what's meaningful.

    1. Author Response

      The following is the authors’ response to the current reviews.

      At this stage the referees had only minor comments. Referee #1 asked whether archerfish indeed generalize in egocentric rather than allocentric coordinates. It might be that the current results do not rule out the idea that archerfish are unaware of changes in body position, they continue with previously successful actions, that seems as egocentric generalization. We agree with referee #1 and updated lines 255-260 in the results and added lines 329-336 in the discussion text that mentions this possibility. Referee #2 mentioned that a portion of fish did not make it to the final test which raises the question whether all individuals are able to solve the task. We agree with referee #2 and added paragraph at the discussion section to mention this point (lines 384-388). We also added the salinity of the water in the water tanks (line 98) as per suggestion of the Referee #2. Referee #2 suggested using a different term than “washout” in the behavioral experiments. Since the term “washout” is standard in the field, we keep the term in the text.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful study explores how archerfish adapt their shooting behavior to environmental changes, particularly airflow perturbations. It will be of interest to experts interested in mechanisms for motor learning. While the evidence for an internal model for adaptation is solid, evidence for adaptation to light refraction, as initially hypothesized, is inconclusive. As such, the evidence supporting an egocentric representation might be caused by alternative mechanisms to airflow perturbations.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors examined whether archerfish have the capacity for motor adaptation in response to airflow perturbations. Through two experiments, they demonstrated that archerfish could adapt. Moreover, when the fish flipped its body position with the perturbation remaining constant, it did not instantaneously counteract the error. Instead, the archerfish initially persisted in correcting for the original perturbation before eventually adapting, consistent with the notion that the archerfish's internal model has been adapted in egocentric coordinates.

      Evaluation:

      The results of both experiments were convincing, given the observable learning curve and the clear aftereffect. The ability of these fish to correct their errors is also remarkable. Nonetheless, certain aspects of the experiment's motivation and conclusions temper my enthusiasm.

      (1) The authors motivated their experiments with two hypotheses, asking whether archerfish can adapt to light refractions using an innate look-up table as opposed to possessing a capacity to adapt. However, the present experiments are not designed to arbitrate between these ideas. That is, the current experiments do not rule out the look-up table hypothesis, which predicts, for example, that motor adaptation may not generalize to de novo situations with arbitrary actionoutcome associations. Such look-up table operations may also show set-size effects, whereas other mechanisms might not. Whether their capacity to adapt is innate or learned was also not directly tested, as noted by the authors in the discussion. Could the authors clarify how they see their results positioned in light of the two hypotheses noted in the Introduction?

      We agree with the referee that look up tables only confuse the issue. The question we tested is whether or not the fish uses adaptation mechanisms to correct its shooting. We have now changed the introduction both to eliminate the entire question of look up tables and also to clarify that both innate mechanisms and learning mechanisms can contribute to fish shooting, and that our research focuses on the question of whether the fish can adapt to a perturbation in its shooting caused by a change in its physical environment.

      (2) The authors claim that archerfish use egocentric coordinates rather than allocentric coordinates. However, the current experiments do not make clear whether the archerfish are "aware" that their position was flipped (as the authors noted, no visual cues were provided). As such, for example, if the fish were "unaware" of the switch, can the authors still assert that generalization occurs in egocentric coordinates? Or simply that, when archerfish are ostensibly unaware of changes in body position, they continue with previously successful actions.

      The fish has access to the body position switch: there are clues in a water tank that can help the fish orient inside the water tank. Additionally, there are no clues to the presence or direction of the air flow above the water tank. Moreover, previous experience has shown that the fish is sensitive to the visual cues and uses them to achieve consistent orientation within the tank when possible. These points have been added to the main text [lines 143-144, 254-257]

      (3) The experiments offer an opportunity to examine whether archerfish demonstrate any savings from one session to another. Savings are often attributed to a faster look-up table operation. As such, if archerfish do not exhibit savings, it might indicate a scenario where they do not possess a refined look-up table and must rely on implicit mechanisms to relearn each time.

      This is an important question. Indeed, we looked for the ‘saving’ effect in the data, but its noisy nature prevented us from drawing a concrete conclusion. We now mention this in lines 247-249.

      We have also eliminated the discussion of look up tables from the article.

      (4) The authors suggest that motor adaptation in response to wind may hint at mechanisms used to adapt to light refraction. However, how strong of a parallel can one draw between adapting to wind versus adapting to light refraction? This seems important given the claims in this paper regarding shared mechanisms between these processes. As a thought experiment, what would the authors predict if they provided a perturbation more akin to light refraction (e.g., a film that distorts light in a new direction, rather than airflow)?

      This is an important point. Indeed, our project started by looking for options to distort the refraction index or distort the light in a new direction. However, given the available ways of distorting the light to a new direction, it is hard to achieve that on the technical level. Initially, we tried using prism goggles, however the archerfish found it hard to shoot with the heavy load on the head. We have also explored oil on the water surface. However, given the available oils and the width of the film above water, it is hard to achieve considerable perturbation.

      Fish response to the perturbation matches the response to what would be expected for a change in light refraction. Light refraction perturbation does not change with the change in fish body position relative to the target. However, in response to (and in agreement with) the referees, we have generalized the context in which we see our results and discuss the results in terms of adaptation of the fish shooting behavior to changes in physical factors including light refraction, wind, fatigue, and others.

      (5) The number of fish excluded was greater than those included. This raises the question as to whether these fish are merely elite specimens or representative of the species in general.

      The filtering of the fish was in the training stage. The requirements were quite strict: the fish had to produce enough shots each day in the experimental setup. Very few fish succeeded. But all fish that got to the stage of perturbation exhibited the adaptation effect. We do not see a reason to think that the motivation to shoot will have a strong interaction with the shooting adaptation mechanisms.

      Reviewer #2 (Public Review):

      Summary:

      The work of Volotsky et al presented here shows that adult archerfish are able to adjust their shooting in response to their own visual feedback, taking consistent alterations of their shot, here by an air flow, into account. The evidence provided points to an internal mechanism of shooting adaptation that is independent of external cues, such as wind. The authors provide evidence for this by forcing the fish to shoot from 2 different orientations to the external alteration of their shots (the airflow). This paper thus provides behavioral evidence of an internal correction mechanism, that underlies adaptive motor control of this behavior. It does not provide direct evidence of refractory index-associated shoot adjustance.

      Strengths:

      The authors have used a high number of trials and strong statistical analysis to analyze their behavioral data.

      Weaknesses:

      While the introduction, the title, and the discussion are associated with the refraction index, the latter was not altered, and neither was the position of the target. The "shot" was altered, this is a simple motor adaptation task and not a question related to the refractory index. The title, abstract, and the introduction are thus misleading. The authors appear to deduce from their data that the wind is not taken into account and thus conclude that the fish perceive a different refractory index. This might be based on the assumption that fish always hit their target, which is not the case. The airflow does not alter the position of the target, thus the airflow does not alter the refractive index. The fish likely does not perceive the airflow, thus alteration of its shooting abilities is likely assumed to be an "internal problem" of shooting. I am sorry but I am not able to understand the conclusion they draw from their data.

      This is an important point. Indeed, our project started by looking for options to distort the refraction index or distort the light in a new direction. However, given the available ways of distorting the light to a new direction, it is hard to achieve that on the technical level. Initially, we tried using prism goggles, however the archerfish found it hard to shoot with the heavy load on the head. We have also explored oil on the water surface. However, given the available oils and the width of the film above water, it is hard to achieve considerable perturbation.

      Fish response to the perturbation matches the response to what would be expected for a change in light refraction. Light refraction perturbation does not change with the change in fish body position relative to the target. However, in response to (and in agreement with) the referees, we have generalized the context in which we see our results and discuss the results in terms of adaptation of the fish shooting behavior to changes in physical factors including light refraction, wind, fatigue, and others.

      Reviewer #2 (Recommendations For The Authors):

      I have had a hard time trying to understand how the authors concluded that the RI is important here as it is not altered. Thus I did not understand the conclusions drawn from this paper. The experiments are well described, but the conclusions are not to me. Maybe schematics would help to clarify. I am from outside the field and represent a naïve reader with an average intellect. The authors need to do a better job of explaining their results if they want others to understand their conclusions.

      See response to the public comments.

      Minor comments:

      Line 9: omit the "an".

      Done.

      Line 11: this sentence would fit way better if it followed the next one.<br /> Done.

      Line 15: and all the rest of the paper: washout is a strange term and for me associated with pharmacological manipulations - might only be me. I suggest using recovery instead throughout the manuscript.

      The term ‘washout’ is often used in the field of motor adaptation to describe the return to original condition. For example:

      Kluzik J, Diedrichsen J, Shadmehr R, Bastian AJ (2008) Reach adaptation: what determines whether we learn an internal model of the tool or adapt the model of our arm? J Neurophysiol 100:1455-64. doi: 10.1152/jn.90334.2008

      Donchin O, Rabe K, Diedrichsen J, Lally N, Schoch B, Gizewski ER, Timmann D (2012) Cerebellar regions involved in adaptation to force field and visuomotor perturbation. J Neurophysiol 107:134-47

      Line 19: the fish does not expect the flow, it expects that it shoots too short- no?

      Done.

      Line 35: fix the citation - in your reference manager.

      Done.

      Line 52: provide some examples of the mechanisms you think of or papers of it for naive readers. Otherwise, this sentence is not helpful for the reader.

      Done.

      Line 183: it's unclear which parameter you mean. Rephrase.

      Done.

      Line 197: should read to test "the" - same sentence: you repeat yourself- rephrase the sentence.

      Done.

      Figure 4: it was unclear to me why the figure was differentiating between fishes until I read the legend. Why not include direct information in the figure? A schematic maybe? Legend: you have a double "that" in C.

      We added the title for each column with the information about the direction of air.

      Figures: in all figures, perturbation is wrongly spelled! Change the term washout to recovery.

      Done. We kept the term ‘washout’

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      I have only a few comments that I think will improve the manuscript and help readers better appreciate the context of the reported results.

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible.

      One paradox, that the authors point out, is that the drastic effects of TALK-1 L114P on plasma membrane potential do not result in a complete loss of insulin secretion. One important consideration is the role of intracellular stores in insulin secretion at physiological levels of hyperglycemia. This needs to be discussed more thoroughly, especially in the light of recent papers like Postic et al 2023 AJP and others. The authors do show an upregulation of IP3-induced Ca release. It is not clear whether they think this is a direct or indirect effect on the ER. Is there more IP3? More IP3R? Are the stores more full?

      The reviewer brings up an important point. Although we see a significant reduction in glucose-stimulated depolarization in most islets from TALK-1 L114P mice, some glucosestimulated calcium influx is still present (especially from female islets); this suggests that a subset of islet β-cells are still capable of depolarization. Because our original membrane potential recordings were done in whole islets without identification of the cell type being recorded, we have now repeated these electrical recordings in confirmed β-cells (see Supplemental figure 6). The new data shows that 33% of TALK-1 L114P β-cells show action potential firing in 11 mM glucose, which would be predicted to stimulate insulin secretion from a third of all TALK-1 L114P β-cells; this could be responsible for the remaining glucosestimulated insulin secretion observed from TALK-1 L114P islets. However, ER calcium store release could also allow for some of the calcium response in the TALK-1 L114P islets. We have now detailed this in the discussion; this now details the Postic et. al. study showing that glucose-stimulated beta-cell calcium increases involve ER calcium release as it occurs in the presence of voltage-dependent calcium channel inhibition. Future studies can assess this using SERCA inhibitors and determining if glucose-stimulated calcium influx in TALK-1 L114P islets is lost. We also find that muscarinic stimulated calcium influx from ER stores is greater in TALK-1 L114P mice. We currently do not have data to support the mechanism for this enhancement of muscarinic-induced islet calcium responses from islets expressing TALK1 L114P. Our hypothesis is that greater TALK-1 current on the ER membrane is enhancing ER calcium release in response to IP3R activation. There is an equivalent IP3R expression in control and TALK-1 L114P islets based on transcriptome analysis, which is now included in the manuscript. However, whether there is greater IP3 production, greater ER calcium storage, and/or greater ER calcium release requires further analysis. Because this finding was not directly related to the metabolic characterization of this TALK-1 L114P MODY mutation, we are planning to examine the ER functions of TALK-1L114P thoroughly in a future manuscript.

      The authors point to the possible roles of TALK-1 in alpha and delta cells. A limitation of the global knock-in approach is that the cell type specificity of the effects can't easily be determined. This should be more explicitly described as a limitation.

      We thank the reviewer for this suggestion and have added this to the discussion. This is now included in a paragraph at the end of the discussion detailing the limitations of this manuscript.

      The official gene name for TALK-1 is KCNK16. This reviewer wonders whether it wouldn't be better for this official name to be used throughout, instead of switching back and forth. The official name is used for Abcc8 for example.

      We thank the reviewer for this suggestion and have revised the manuscript to include Kcnk16 L114P. The instances of TALK-1 L114P that remain in the manuscript are in cases where the text specifically discusses TALK-1 channel function.

      There are several typos and mistakes in editing. For example, on page 5 it looks like "PMID:11263999" has not been inserted. I suggest an additional careful proofreading.

      We have revised this reference, thoroughly proofread the revised manuscript, and corrected typos.

      The difference in lethality between the strains is fascinating. Might be good to mention other examples of ion channel genes where strain alters the severe phenotypes? Additional speculation on the mechanism could be warranted. It also offers the opportunity to search for genetic modifiers. This could be discussed.

      We thank the reviewer for this suggestion and have added details on mutations where strain alters lethality.

      The sex differences are interesting. Of course, estrogen plays a role as mentioned at the bottom of page 16, but there have been more involved analyses of islet sex differences, including a recent paper from the Rideout group. Is there a sex difference in the islet expression of KCNK16 mRNA or protein, in mice or humans?

      We thank the reviewer for the important comments on the TALK-1 L114P sex differences. We have revised the manuscript to include greater discussion about female β cell resilience to stress, which may allow greater insulin secretion in the presence of the TALK-1 L114P channels; this is based on the Brownrigg et. al. study pointed out by the reviewer (PMID: 36690328). Because these sex differences in islet function were examined in mice, we looked at KCNK16 expression in mouse beta-cells. While there is a trend for greater KCNK16 expression in sorted male beta-cells (average RPKM 6296.25 +/-953.84) compared to sorted female beta-cells (5148.25 +/- 1013.22). Similarly, there was a trend toward greater KCNK16 expression in male HFD treated mouse beta-cells (average RPKM 8020.75 +/- 1944.41) compared to female HFD treated mouse beta-cells (average RPKM 7551 +/- 2952.70). We have now added this to the text.

      Page 15-16 "Indeed, it has been well established that insulin signaling is required for neonatal survival; for example, a similar neonatal lethality phenotype was observed in mice without insulin receptors (Insr-/-) where death results from hyperglycemia and diabetic ketoacidosis by P3 (40)." Formally, the authors are not examining insulin signaling. A better comparison is that of the Ins1/Ins2 double knockout model of complete hypoinsulinemia.

      We thank the reviewer for suggesting this as the appropriate comparison model and have now revised the manuscript to detail the 48-hour average life expectancy of Ins1/Ins2 double knockout mice (PMID: 9144203).

      There are probably too many abbreviations in the paper, making it harder to read by nonspecialists. I recommend writing out GOF, GSIS, WT, K2P, etc.

      We thank the reviewer for this suggestion and have revised the manuscript to reduce the use of most abbreviations.

      Reviewer #2:

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible. We have thoroughly addressed all the reviewer’s comments and revised the manuscript accordingly. These changes have strengthened the manuscript and are summarized below.

      (1) The authors perform an RNA-sequencing showing that the cAMP amplifying pathway is upregulated. Is this also true in humans with this mutation? Other follow-up comments and questions from this observation:

      a) Will this mean that the treatment with incretins will improve glucose-stimulated insulin secretion and Ca2+ signalling and lower blood glucose? The authors should at least present data on glucose-stimulated insulin secretion and/or Ca2+ signalling in the presence of a compound increasing intracellular cAMP.

      b) Will an OGTT give different results than the IPGTT performed due to the fact that the cAMP pathway is upregulated?

      c) Is the increased glucagon area and glucagon secretion a compensatory mechanism that increases cAMP? What happens if glucagon receptors are blocked?

      We thank the reviewer for the suggestions. Although cAMP pathways were upregulated in the TALK-1 L114P islets, the changes in expression were only modest as examined by qRTPCR. Thus, we are not sure if this plays a role in secretion. For humans with this mutation, there have been such a small number of patients and no islets isolated from these patients. Therefore, we are unaware if the cAMP amplifying pathway is upregulated in humans with the MODY associated TALK-1 L114P mutation. We have performed the suggested experiment assessing calcium from TALK-1 L114P islets in response to liraglutide (see Supplemental figure 10); there was no liraglutide response in TALK-1 L114P islets. We have also performed the OGTT experiments as suggested and these have now been added to the manuscript (see Supplemental figure 3). We do not believe that the increased glucagon is a compensatory response, because: 1. TALK-1 deficient islets have less glucagon secretion due to reduced SST secretion (see PMID: 29402588); 2. There is no change in insulin secretion at 7mM glucose, however, glucagon secretion is significantly elevated from islets isolated from TALK-1 L114P mice; 3. TALK-1 is highly expressed in delta-cells, and in these cells TALK-1 L114P would be predicted to cause significant hyperpolarization and significant reductions in calcium entry as well as SST secretion. Thus, reduced SST secretion may be responsible for the elevation of glucagon secretion. We plan to investigate delta-cells within islets from TALK-1 L114P mice in future studies to determine if changes in SST secretion are responsible for the elevated glucagon secretion from TALK-1 L114P islets.

      (2) The performance of measurements in both male and female mice is praiseworthy. However, despite differences in the response, the authors do not investigate the potential reason for this. Are hormonal differences of importance?

      We thank the reviewer for this important point. It is indeed becoming clear that there are many differences between male and female islet function and responses to stress. Thus, we have revised the manuscript to include greater discussion about these differences such as female β cell resilience to stress, which may allow greater insulin secretion in the presence of the TALK-1 L114P channels; this is based on the Brownrigg et. al. study pointed out by reviewer 1 (PMID: 36690328). While the differences in islet function and GTT between male and female L114P mice are clear, they both show diminished islet calcium handling, defective hormone secretion, and development of glucose intolerance. This manuscript was intended to demonstrate how the MODY TALK-1 L114P causing mutation caused glucose dyshomeostasis, which we have determined in both male and female mice. The mechanistic determination for the differences between male and female mice and islets with TALK-1 L114P could be due to multiple potential causes (as detailed in PMID: 36690328), thus, we believe that comprehensive studies are required to thoroughly determine how the TALK-1 L114P mutation differently impacts male and female mice and islets, which we plan to complete in a future manuscript.

      (3) MINOR: Page 5 .." channels would be active at resting Vm PMID:11263999.." The actual reference has not been added using the reference system.

      We thank the reviewer for noticing this mistake, which has now been corrected.

      Reviewer #3:

      The manuscript is overall clearly presented and the experimental data largely support the conclusions. However, there are a number of issues that need to be addressed to improve the clarity of the paper.

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible. We have thoroughly addressed all the reviewer’s comments and revised the manuscript accordingly. These changes have strengthened and improved the clarity of the manuscript.

      Specific comments:

      (1) Title: The terms "transient neonatal diabetes" and "glucose dyshomeostasis in adults" are used to describe the TALK-1 L114P mutant mice. Transient neonatal diabetes gives the impression that diabetes is resolved during the neonatal period. The authors should clarify the criteria used for transient neonatal diabetes, and the difference between glucose dyshomeostasis and MODY. Longitudinal plasma glucose and insulin data would be very informative and help readers to follow the authors' narrative.

      We appreciate the helpful comment and have added longitudinal plasma glucose from neonatal mice to address this (see Supplemental figure 2). The new data now shows the TALK-1 L114P mutant mice undergo transient hyperglycemia that resolves by p10 and then occurs again at week 15. Insulin secretion from P4 islets is also included that shows that male animals homozygous for the TALK-1 L114P mutation have the largest impairment in glucosestimulated insulin secretion, followed by male heterozygous TALK-1 L114P P4 islets that also have impaired insulin secretion (see Figure 1). The amount of hyperglycemia correlates with the defects in neonatal islet insulin secretion.

      (2) Another concern for the title is the term "α-cell overactivity." This could be taken to mean that individual α-cells are more active and/or that there are more α-cells to secrete glucagon. The study does not provide direct evidence that individual α-cells are more active. This should be clarified.

      We appreciate the helpful comment and have revised the manuscript title accordingly.

      (3) In the Introduction, it is stated that because TALK-1 activity is voltage-dependent, the GOF mutation is less likely to cause neonatal diabetes, yet the study shows the L114P TALK-1 mutation actually causes neonatal diabetes by completely abolishing glucose-stimulated Ca2+ entry. This seems to imply TALK-1 activity (either in the plasma membrane or ER membrane) has more impact on Vm or cytosolic Ca2+ in neonates than initially predicted. Some discussion on this point is warranted.

      These are important points and we have added details to the discussion about this. For example, the discussion now states that, “This suggests a greater impact of TALK-1 L114P in neonatal islets compared to adult islets. Future studies during β-cell maturation are required to determine if TALK-1 activity is greater on the plasma membrane and/or ER membrane compared with adult β-cells.” The introduction has also been revised to clarify the voltagedependence of TALK-1.

      (4) What is the relative contribution of defects in plasma membrane depolarization versus ER Ca2+ handling on defective insulin secretion response?

      We thank the reviewer for bringing up this important point. TALK-1 L114P islets show blunted glucose-stimulated depolarization and glucose-stimulated calcium entry, however, the L114P islets show equivalent Ca2+ entry as control islets in response high KCl (Figure 5GH). As the KCl stimulated Ca2+ influx is similar between control and TALK-1 L11P islets, this indicates that plasma membrane TALK-1 L114P has a hyperpolarizing role that significantly blunts glucose-stimulated depolarization and reduces activation of voltage-dependent calcium channels. We have further tested this by looking at glucose-stimulated β-cell membrane potential depolarization in TALK-1 L11P islets, which is significantly blunted (Figure4 A and B; Supplemental figure 6). However, 33% of TALK-1 L11P β-cells showed glucose-stimulated electrical excitability (Supplemental figure 6), which likely accounts for the modest GSIS from TALK-1 L11P islets. New data has also been included showing that KCl stimulation causes a significant depolarization of β-cells from TALK-1 L11P islets (Supplemental figure 6). Because plasma membrane TALK-1 L114P is largely responsible for the hyperpolarized membrane potential and blunted glucose-stimulated Ca2+ entry, this suggests that TALK-1 L11P on the plasma membrane is primarily responsible for the altered insulin secretion. The discussion has been revised to reflect this.

      (5) The Jacobson group has previously shown that another K2P channel TASK-1 is also involved in ER Ca2+ homeostasis and that TASK inhibitors restored ER Ca2+ in TASK-1 expressing cells. Is TASK-1 expressed in β-cell ER membrane? Can the mishandling of Ca2+ caused by TALK-1 L114P be reversed by TASK-1 inhibitors?

      We thank the reviewer for bringing up this important point in relation to ER calcium handling by K2P channels. We have found that TASK-1 channels expressed in alpha-cells enhance ER calcium release and that inhibitors or TASK-1 channels elevate alpha-cell ER calcium storage. We did not observe any significant changes in the gene (Kcnk3) encoding TASK-1 between islets from control or TALK-1 L11P mice, which has now been added to the manuscript. However, because the TALK-1 L11P-mediated reduction of glucose-stimulated depolarization and inhibition of calcium entry are both prevented in the presence of high KCl (see Figure X); this strongly suggests that TALK-1 L114P K+ flux at the membrane is hyperpolarizing the membrane potential and limiting depolarization and calcium entry. This suggests that TALK-1 L114P control of ER calcium handling is not the primary contributor to the blunted glucose-stimulate calcium handling. Furthermore, acetylcholine stimulation of islets from both control and TALK-1 L114P islets elicited ER calcium release, which indicates that for the most part ER calcium release is still responsive to cues that control release, but they are altered. Taken together this suggests that the TALK-1 L114P impact on ER calcium is not the primary mediator of blunted glucose-stimulated islet calcium entry and insulin secretion.

      (6) The electrical recording experiments were conducted using whole islets. The authors should comment on how the cells were identified as β-cells, especially in mutant islets in which there is an increased number of α-cells.

      The reviewer brings up an important point. As indicated, the original membrane potential recordings were conducted using whole islets. While the recorded cells could mostly be βcells based on mouse islets typically containing >80% β-cells, there is a possibility that some of the cells included in these recordings were α-cells or δ-cells (especially because of the noted α-cell hyperplasia in TALK-1 L114P islets). Thus, we have now included data from bcells that were identified with an adenoviral construct containing a rat insulin promoter driving a fluorescent reporter. This allowed the fluorescent β-cells to be monitored with electrophysiological membrane potential recordings. The new data (see Supplemental figure 6) shows a significant reduction in glucose-stimulated depolarization in 67% of β-cells with the L114P mutation compared to controls.

      Minor:

      (1) Some references need formatting.

      The references have been revised accordingly.

      (2) Please define glucose-stimulated phase 0 Ca2+ response for non-expert readers.

      This has been defined accordingly.

      (3) Page 14 bottom: The sentence "Unlike the only other MODY-associated.........., TALK-1 is not inhibited by sulfonylureas" seems out of place and lacks context.

      We thank the reviewer for this suggestion and have deleted this sentence.

      (4) Figure 6: It would be helpful to provide a protein name for the genes shown in panel D.

      The protein names for the genes have now been included in the discussion of these genes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the thoughtful review of our manuscript by the reviewers, along with their valuable suggestions for enhancing our work. In response to these suggestions, we conducted additional experiments and made significant revisions to both the text and figures. In the following sections, we first highlight the major changes made to the manuscript, and thereafter address each reviewer's comments point-by-point. We hope these additional data and revisions have improved the robustness and clarity of the study and manuscript. Please note that as part of a suggested revision we have changed the manuscript title to be: Bacterial vampirism mediated through taxis to serum.

      Major revisions and new data:

      (1) We conducted additional experiments testing taxis to serum using a swine ex vivo enterohemorrhagic lesion model in which we competed wildtype versus chemotaxis deficient strains (Fig. 8). We selected swine for these experiments due to their similarity in gastrointestinal physiology to humans. In these experiments we see that chemotaxis, and the chemoreceptor Tsr, mediate localization to, and migration into, the lesion. We also tested, and confirmed, taxis to serum from swine and serum from horse, that supporting that serum attraction is relevant in other host-pathogen systems.

      (2) We present additional experimental data and quantification of chemotaxis responses to human serum treated with serine-racemase (Fig. S3). This treatment reduces wildtype chemoattraction and the wildtype no longer possesses an advantage over the tsr strain, providing further evidence that L-serine is the specific chemoattractant responsible for Tsr-mediated attraction to serum.

      (3) We present additional data in the form of 17 videos of chemotaxis experiments with norepinephrine and DHMA showing null-responses under various conditions. These data provide additional support to the conclusion that these chemicals are not responsible for bacterial attraction to serum. We have included these raw data as a new supplementary file (Data S1) for those in the field that are interested in these chemicals.

      (4) Based on comments from Reviewer 2 regarding whether the position of the ligand and ligand-binding site residues in the previously-reported EcTsr LBD structure are incorrect, or whether these differences are due to the proteins being from different organisms, we performed paired crystallographic refinements to determine which positions result in model improvement (Fig. 7J). Altering the EcTsr structure to have the ligand and ligandbinding site positions from our new higher resolution and better-resolved structure of Salmonella Typhimurium Tsr results in a demonstrably better model, with both Rwork and Rfree lower by about 1% (Fig. 7J). These data support our conclusion that the correct positions for both structures are as we have modeled them in the S. Typhimurium Tsr structure. We also solved an additional crystal structure of SeTsr LBD captured at neutral pH (7-7.5) that confirms our structure captured with elevated pH (7.5-9.7) has no major changes in structure or ligand-binding interactions (Fig. S6, Table S2).

      (5) Based on comments from Reviewer 2 on the accuracy of the diffusion calculations, we present a new analysis (Fig. S2) comparing the experimentally-determined diffusion of A488 compared to its calculated diffusion. We found that:

      [line 111]: “As a test case of the accuracy of the microgradient modeling, we compared our calculated values for A488 diffusion to the normalized fluorescence intensity at time 120 s. We determined the concentration to be accurate within 5% over the distance range 70270 µm (Fig. S2). At smaller distances (<70 µm) the measured concentration is approximately 10% lower than that predicted by the computation. This could be due to advection effects near the injection site that would tend to enhance the effective local diffusion rate.”

      (6) Both reviewers asked us to better justify why we focused on the chemoreceptor Tsr, and had questions about why we did not investigate Tar. The low concentration of Asp in serum suggests Tar could have some effect, but less so than Trg or Tsr (see Fig. 4A). We have revised the text throughout to better convey that we agree multiple chemoreceptors are involved in the response and clarify our rationale for studying the role of Tsr:

      [line 178]: “We modeled the local concentration profile of these effectors based on their typical concentrations in human serum (Fig. 4B). Of these, by far the two most prevalent chemoattractants in serum are glucose (5 mM) and L-serine (100-300 µM) (Fig. 4B-F). This suggested to us that the chemoreceptors Trg and/or Tsr could play important roles in serum attraction.”

      [line 186]: “Since tsr mutation diminishes serum attraction but does not eliminate it, we conclude that multiple chemoattractant signals and chemoreceptors mediate taxis to serum. To further understand the mechanism of this behavior we chose to focus on Tsr as a representative chemoreceptor involved in the response, presuming that serum taxis involves one, or more, of the chemoattractants recognized by Tsr that is present in serum: L-serine, NE, or DHMA.”

      [line 468] “Serum taxis occurs through the cooperative action of multiple bacterial chemoreceptors that perceive several chemoattractant stimuli within serum, one of these being the chemoreceptor Tsr through recognition of L-serine (Fig. 4).”

      Point-by-point responses to reviewer comments:

      Reviewer #1:

      (1) Presumably in the stomach, any escaping serum will be removed/diluted/washed away quite promptly? This effect is not captured by the CIRA assay but perhaps it might be worth commenting on how this might influence the response in vivo. Perhaps this could explain why, even though the chemotaxis appears rapid and robust, cases of sepsis are thankfully relatively rare.

      To clarify, the Enterobacteriaceae species we have tested here are colonizers of the intestines, not the stomach, and cases of bacteremia from these species are presumably due to bloodstream entry through intestinal lesions. Whether or not intestinal flow acts as a barrier to bloodstream entry is not something we test here, and so we have not commented on this idea in the manuscript. We do demonstrate that attraction to serum occurs within seconds-to-minutes of exposure. We expect that the major protective effects against sepsis are the host antibacterial factors in serum, which are well-described in other work. We have been careful to state throughout the text that we see attraction responses, and growth benefits, to serum that is diluted in an aqueous media, which is different than bacterial growth in 100% serum or in the bloodstream.

      (2) The authors refer to human serum as a chemoattractant numerous times throughout the study (including in the title). As the authors acknowledge, human serum is a complex mixture and different components of it may act as chemoattractants, chemo-repellents (particularly those with bactericidal activities) or may elicit other changes in motility (e.g. chemokinesis). The authors present convincing evidence that cells are attracted to serine within human serum - which is already a well-known bacterial chemoattractant. Indeed, their ability to elucidate specific elements of serum that influence bacterial motility is a real strength of the study. However, human serum itself is not a chemoattractant and this claim should be re-phrased - bacteria migrate towards human serum, driven at least in part by chemotaxis towards serine.

      Throughout the text we have changed these statements, including in the title, to either be ‘taxis to serum’ or ‘serum attraction.’ On the timescales we tested our data support that chemotaxis, not chemokineses or other forms of direction motility, is what drives rapid serum attraction, since a motile but non-chemotactic cheY mutant cannot localize to serum (Fig. 4). We present evidence of one of these chemotactic interactions (L-Ser).

      (3) Linked to the previous point, several bacterial species (including E. coli - one of the bacterial species investigated here) are capable of osmotaxis (moving up or down gradients in osmolality). Whilst chemotaxis to serine is important here, could movement up the osmotic gradient generated by serum injection play a more general role? It could be interesting to measure the osmolality of the injected serum and test whether other solutions with similar osmolality elicit a similar migratory response. Another important control here would be to treat human serum with serine racemase and observe how this impacts bacterial migration.

      As addressed above, we have added additional experiments of serum taxis treated with serine racemase showing competition between WT and cheY, and WT and tsr (Fig. S3). These data support a role for L-serine as a chemoattractant driving attraction to serum. The idea of osmotaxis is interesting, but outside the scope of this work since we focus on chemoattraction to L-serine as one of the mechanisms driving serum attraction, and have multiple lines of evidence to support that.

      (4) The migratory response of E. coli looks striking when quantified (Fig. 6C) but is really unclear from looking at Panel B - it would be more convincing if an explanation was offered for why these images look so much less striking than analogous images for other species (E.g. Fig. 6A).

      We agree that the E. coli taxis to serum response is less obvious. We have brightened those panels to hopefully make it clearer to interpret (more cells in field of view over time). Also, as stated in the y-axes of these plots, this quantification was performed by enumerating the number of cells in the field of view, and the Citrobacter and Escherichia responses are shown on separate y-axes (now Fig. 8C). As indicated, the experiments have different numbers of starting motile cells, which we presume accounts for the difference in attraction magnitude. When investigating diverse bacterial systems we found there to be differences in motility under the culturing and experimental conditions we employed, for multiple reasons, and so for these data we thought it best to report raw cell numbers rather data normalized to the starting number of bacteria, as we do elsewhere. In the specific case of these E. coli responding to serum, please view Supplementary Movie S3, which both clearly shows the attraction response and that the bacteria grew in a longer, semi-filamentous form that seem to impair their swimming speed.

      (5) It is unclear why the fold-change in bacterial distribution shows an approximately Gaussian shape with a peak at a radial distance of between 50 -100 um from the source (see for example Fig. 2H). Initially, I thought that maybe this was due to the presence of the microcapillary needle at the source, but the CheY distribution looks completely flat (Fig. 3I). Is this an artifact of how the fold-change is being calculated? Certainly, it doesn't seem to support the authors' claim that cells increase in density to a point of saturation at the source. Furthermore, it also seems inappropriate to apply a linear fit to these non-linear distributions (as is done in Fig. 2H and in the many analogous figures throughout the manuscript).

      We have revised the text to address this point, and removed the comment about cells increasing in density to a point of saturation: [Line 138] “We noted that in some experiments the population peak is 50-75 µm from the source, possibly due to a compromise between achieving proximity to nutrients in the serum and avoidance of bactericidal serum elements, but this behavior was not consistent across all experiments. Overall, our data show S. enterica serovars that cause disease in humans are exquisitely sensitive to human serum, responding to femtoliter quantities as an attractant, and that distinct reorganization at the population level occurs within minutes of exposure (Fig. 3, Movie 2).”

      We can confirm that this is not an artifact of quantification. Please refer to the videos of these responses, which demonstrates this point (Movies 1-5).

      (6) The authors present several experiments where strains/ serovars competed against each other in these chemotaxis assays. As mentioned, these are a real strength of the study - however, their utility is not always clear. These experiments are useful for studying the effects of competition between bacteria with different abilities to climb gradients.

      However, to meaningfully interpret these effects, it is first necessary to understand how the different bacteria climb gradients in monoculture. As such, it would be instructive to provide monoculture data alongside these co-culture competition experiments.

      Thank you for this suggestion. We agree that the coculture experiments showing strains competing for the same source of effector give a different perspective than monoculture. These experiments allow us to confirm taxis deficiencies or advantages with greater sensitivity, and ensure that the bacteria in competition have experienced the same gradient. This type of competition experiment is often used in in vivo experimentation for the same advantages. We note that in the gut the bacteria are not in monoculture and chemotactic bacteria do have to compete against each other for access to nutrients. Repeating all of the experiments we present to show both the taxis responses in coculture and monoculture would be an extraordinary amount of work that we do not believe would meaningfully change the conclusions of this study.

      (7) Linked to the above point, it would be especially instructive to test a tsr mutant's response in monoculture. Comparing the bottom row of Fig. 3G to Fig. 3I suggests that when in co-culture with a cheY mutant, the tsr mutant shows a higher fold-change in radial distribution than the WT strain. Fig. 4G shows that a tsr mutant can chemotaxis towards aspartate at a similar, but reduced rate to WT. This could imply that (like the trg mutant), a tsr mutant has a more general motility defect (e.g. a speed defect), which could explain why it loses out when in competition with the WT in gradients of human serum, but actually seems to migrate strongly to human serum when in co-culture with a cheY mutant. This should be resolved by studying the response of a tsr mutant in monoculture.

      Addressed above.

      (8) In Fig. 4, the response of the three clinical serovars to serine gradients appears stronger than the lab serovar, whilst in Fig. 1, the response to human serum gradients shows the opposite trend with the lab serovar apparently showing the strongest response. Can the authors offer a possible explanation for these slightly confusing trends?

      We suspect this relates to the fact that pure L-serine is a chemoattractant, whereas treatment with serum exposes the bacteria both to chemoattractants and, likely, chemorepellents. Strains may navigate the landscape of these stimuli different for a variety of reasons that are not simple to tease apart. The final magnitude of change in bacterial localization depends on multiple factors including swimming speed, adaptation, sensitivity of chemoattraction, and cooperative signaling of the chemoreceptor nanoarray. Thus, we cannot state with certainty how and why these strains are different across all experiments, but we can state that they are attracted to both serum and L-serine.

      (9) In Fig. S2, it seems important to present quantification of the effect of serine racemase and the reported lack of response to NE and DHMA - the single time-point images shown here are not easy to interpret.

      As suggested, we present quantification of the serum racemase treated samples (now Fig. S3). To assist in the interpretation of this max projections Fig. S3 now noted the chemotactic response (chemoattraction for L-serine, null-response for NE/DHMA). Further, we revised the text to state: [line 209: “We observed robust chemoattraction responses to L-serine, evident by the accumulation of cells toward the treatment source (Fig. S3E, Movie 4), but no response to NE or DHMA, with the cells remaining randomly distributed even after 5 minutes of exposure (Fig. S3F-I, Movie 5, Movie S1).”

      (10) Importantly, the authors detail how they controlled for the effects of pH and fluid flow (Line 133-136). Did the authors carry out similar controls for the dual-species experiments where fluorescent imaging could have significantly heated the fluid droplet driving stronger flow forces?

      Most of our microfluidics experiments were performed in a temperature-controlled chamber (see Methods). Since the strains in the coculture experiments experienced the same experimental conditions we have no evidence of fluorescence-imaginginduced temperature changes that have impacted whether or not the bacteria are attracted to serum or the effectors we investigated.

      (11) The inference of the authors' genetic analysis combined with the migratory response of E. coli and C. koseri to human serum shown in Fig. 6 is that Tsr drives movement towards human serum across a range of Enterobacteriaceae species. The evidence for the importance of Tsr here is currently correlative - more causal evidence could be presented by either studying the response of tsr mutants in these two species (certainly these should be readily available for E. coli) or by studying the response of these two species to serine gradients.

      We have revised the text to state: [line 402] “Without further genetic analyses in these strain backgrounds, the evidence for Tsr mediating serum taxis for these bacteria remains circumstantial. Nevertheless, taxis to serum appears to be a behavior shared by diverse Enterobacteriaceae species and perhaps also Gammaproteobacteria priority pathogen genera that possess Tsr such as Serratia, Providencia, Morganella, and Proteus (Fig. 8B).”

      We note that other work has thoroughly investigated E. coli serine taxis.

      Figure Suggestions

      (1) Fig. 2 - The inset bar charts in panels H-J and the font size in their axes labels are too small - this suggestion also applies to all analogous figures throughout the manuscript.

      We have increased the size of the text for these inset plots. We have also broken up some of the larger figures.

      (2) Panel 2F - the cartoon bacterial cell and 'number of bacteria' are confusing and seem to contradict the y-axis label. This also applies to several other figures throughout the manuscript where the significance of this cartoon cell is quite hard to interpret.

      As suggested, we have removed this cartoon.

      (3) Panels G-I in Fig. 3 are currently tricky to interpret - it would be easier if the authors were to use three different colours for the three different strains shown across these panels.

      We have broken up Figure 2 (which also had these types of plots) so that hopefully these labels are more clear. For the Figure in question (now Fig. 4), due to the many figures and different types of data and comparisons it was difficult to find a color scheme for these strains that would be consistent across the manuscript. These colors also reflect the fluorescence markers. We note that not only do we use color to indicate the strain but also text labels.

      (4) Panels 3B-F would be best moved to a supplementary figure as this figure is currently very busy. Similarly, I would potentially consider presenting only the bottom row of panels in Panels G-I in the main figure (which would then be consistent with analogous data presented elsewhere).

      We have opted to keep these panels in the main text (now Fig. 4) as they are relevant to understanding (1) our justification for why to pursue certain chemoeffector-chemoreceptor interactions and not others, and (2) how the chemoattraction response can be understood both in terms of bacterial population distribution and relevant cells over time.

      (5) Fig. 4 and possibly elsewhere - perhaps best not to use Ser as an abbreviation for Serine here because it could potentially be confused with an abbreviation for serum.

      It is unfortunate that these two words are so similar. However, Ser is the canonical abbreviation for the amino acid serine. Serum does not have a canonical abbreviation.

      (6) Fig. 4 - I would move panels H - K to a separate supplementary figure - currently, they are too squished together and it is hard to make out the x-axis labels. I would also consider moving panels E-G to supplementary as well so that the microscopy images presented elsewhere in the figure can be presented at an appropriate size.

      Since we are allowed more figures, we could also break some of these figures up into multiple ones.

      (7) Similarly, I would move some panels from Fig. 5 to supplementary as the figure is currently quite busy.

      We have rearranged the figure (now Fig. 7) to move the bioinformatics data to Fig. 8 to allow more space for the panels.

      Other suggestions

      (8) Line 179 - how do the concentrations quote for serine and glucose compare to aspartate? This would be helpful to justify the authors' decision not to investigate Tar as a potential chemoreceptor.

      This is addressed in our comments above and in Fig. 4A and Fig. 4B-F. Human serum L-Asp is much lower concentration (about 20-fold).

      (9) Line 282 - Serine levels in serum are quantified at 241 uM, but this is only discussed in the context of serum growth effects. Could this information be better used to design/ inform the serine gradients that were tested in chemotaxis assays?

      We tested a wide range of serine concentrations and show even much lower sources of serine than is present in serum is sufficient for chemoattraction. Also, the K1/2 for serine is 105 uM (Fig. S4), which is surpassed by the concentration in serum (Fig. S5).

      (10) The word 'potent' in the title might be too vague, especially as the strength of the response varies between strains/species. It may perhaps be more useful to focus on the rapidity/sensitivity of the response. However, presumably the sensitivity of the response will be driven by the sensitivity of the response to serine (which is already known for E. coli at least). Also, as noted in the public review, human serum itself is not a chemoattractant so I would consider re-phasing this in the title and elsewhere.

      As suggested, and discussed above, we have implemented this change.

      (11) Typo line 59 'context of colonizing of a healthy gut'.

      Addressed.

      (12) Typo line 538 - there is an extra full stop here.

      Addressed.

      Reviewer #2:

      (1) This study is well executed and the experiments are clearly presented. These novel chemotaxis assays provide advantages in terms of temporal resolution and the ability to detect responses from small concentrations. That said, it is perhaps not surprising these bacteria respond to serum as it is known to contain high levels of known chemoattractants, serine certainly, but also aspartate. In fact, the bacteria are shown to respond to aspartate and the tsr mutant is still chemotactic. The authors do not adequately support their decision to focus exclusively on the Tsr receptor. Tsr is one of the chemoreceptors responsible for observed attraction to serum, but perhaps, not the receptor. Furthermore, the verification of chemotaxis to serum is a useful finding, but the work does not establish the physiological relevance of the behavior or associate it with any type of disease progression. I would expect that a majority of chemotactic bacteria would be attracted to it under some conditions. Hence the impact of this finding on the chemotaxis or medical fields is uncertain.

      We agree that the data we show are mostly mechanistic and further work is required to learn whether this bacterial behavior is relevant in vivo and during infections. We present new data using an ex vivo intestinal model which supports the feasibility of serum taxis mediating invasion of enterohemorrhagic lesions (Fig. 8).

      (2) The authors also state that "Our inability to substantiate a structure-function relationship for NE/DHMA signaling indicates these neurotransmitters are not ligands of Tsr." Both norepinephrine (NE) and DHMA have been shown previously by other groups to be strong chemoattractants for E. coli (Ec), and this behavior was mediated by Tsr (e.g. single residue changes in the Tsr binding pocket block the response). Given the 82% sequence identity between the Se and Ec Tsr, this finding is unexpected (and potentially quite interesting). To validate this contradictory result the authors should test E. coli chemotaxis to DHMA in their assay. It may be possible that Ec responds to NE and DHMA and Se doesn't. However, currently, the data is not strong enough to rule out Tsr as a receptor to these ligands in all cases. At the very least the supporting data for Tsr being a receptor for NE/DHMA needs to be discussed.

      Addressed above. The focus of this study is serum attraction and the mechanisms thereof. We never saw any evidence to support the idea that NE/DHMA drives attraction to serum, nor are chemoeffectors for Salmonella, and provide these null-results in Data S2.

      (3) The authors also determine a crystal structure of the Se Tsr periplasmic ligand binding domain bound to L-Ser and note that the orientation of the ligand is different than that modeled in a previously determined structure of lower resolution. I agree that the SeTsr ligand binding mode in the new structure is well-defined and unambiguous, but I think it is too strong to imply that the pose of the ligand in the previous structure is wrong. The two conformations are in fact quite similar to one another and the resolution of the older structure, is, in my view, insufficient to distinguish them. It is possible that there are real differences between the two structures. The domains do have different sequences and, moreover, the crystal forms and cryo-cooling conditions are different in each case. It's become increasingly apparent that temperature, as manifested in differential cooling conditions here, can affect ligand binding modes. It's also notable that full-length MCPs show negative cooperativity in binding ligands, which is typically lost in the isolated periplasmic domains. Hence ligand binding is sensitive to the environment of a given domain. In short, the current data is not convincing enough to say that a previous "misconception" is being corrected.

      Thank you for this comment, which spurred us to investigate this idea more rigorously. As described above we performed new refinements of the E. coli structure edited to have the positions of the ligand and ligand-binding site as modeled in our new Tsr structure from Salmonella (Fig. 7J). The best model is obtained with these poses. Along with the poor fit of the E. coli model to the density, the best interpretations for these positions, for both structures, are as we have modeled them in the Salmonella Tsr structures.

      Figure suggestions

      (1) Figure 2 looks busy and unorganized. Fig 2C could be condensed into one image where there are different colored rings coming from the source point that represent different time points.

      Addressed above. Fig. 2 has been broken apart to help improve clarity.

      (2) What is the second (bottom) graph of 2D? I think only the top graph is necessary.

      We have added an explanation to the figure legend that the top graph shows the means and the bottom shows SEM. The plots cannot easily be overlaid.

      (3) Similarly, Fig 2E doesn't need to have so many time points. Perhaps 4 at maximum.

      As the development of the response over time is a key take-home of the study, we do not wish to reduce the timepoints shown.

      (4) The legend for Figure 2F uses the unit 'µM' to mean micrometers but should use 'µm'.

      Corrected.

      (5) In Figures 2H-J, the lime green text is difficult to read. The word "serum" does not need to be at the top of each panel. I recommend shortening the y-axis titles on the graphs so you can make the graphs themselves larger.

      Addressed above.

      (6) In Figures 2H-J, I am confused about what is being shown in the inset graph. The legend says it's the AUC for the data shown. However, in the third panel (S. Typhimurium vs. S. Enteriditus) the data appears to be much more disparate than the inset indicates. I don't think that this inset is necessary either.

      The point of this inset graph is to quantify the response through integration of the curve, i.e., area under the curve, which is a common way to quantify complex curves and compare responses as single values. We are using this method to calculate statistical significant of the response compared to a null response. We have added further clarification to the figure legend regarding these plots: Inset plots show foldchange AUC of strains in the same experiment relative to an expected baseline of 1 (no change). p-values shown are calculated with an unpaired two-sided t-test comparing the means of the two strains, or one-sided t-test to assess statistical significance in terms of change from 1-fold (stars).

      (7) Line 154, change "relevant for" to "observed in".

      Changed.

      (8) Line 171, according to the Mist4 database, Salmonella enterica has seven chemoreceptors. Why are only Tar, Tsr, and Trg mentioned? Why were only Tsr and Trg tested?

      Addressed above.

      (9) Line 192, be clear that you are referring to genes and not proteins, as italics are used.

      Revised to make this distinction clear.

      (10) Line 193, have other studies found a Trg deletion strain to be non-chemotactic? If so, cite this source here.

      We state that the Trg deletion strain had deficiencies in motility, and also have revised the text to include the clarification that this was not noted in earlier work with this strain: [line 173]: We were surprised to find that the trg strain had deficiencies in swimming motility (data not shown). This was not noted in earlier work but could explain the severe infection disadvantage of this mutant 34. Because motility is a prerequisite for chemotaxis, we chose not to study the trg mutant further, and instead focused our investigations on Tsr.

      (11) Why wasn't a Tar deletion mutant also analyzed? The authors say that based on the known composition of serum, serine and glucose are the most abundant. However, the serum does have aspartate at 10s of micromolar concentrations.

      Addressed above.

      (12) “The Tsr deletion strain still exhibits an obvious chemoattraction to serum. There are other protein(s) involved in chemoattraction to serum but the text does not discuss this.”

      Addressed above.

      (13) “In Figure 3B-F, the text is very difficult to read even when zoomed in on.”

      We have increased the font size of these panels.

      (14) “All of the text in Figure 5 is extremely small and difficult to read.”

      Addressed above. We split this figure in two to help improve clarity.

      (15) “I wonder about the accuracy of the concentration modeling. It seems like there are a lot of variables that could affect the diffusion rates, including the accuracy of the delivery system. Could the concentrations be verified by the dye experiments?”

      Addressed above. We provide a new analysis comparing experimental diffusion of A488 dye compared to calculations (Fig. S2).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) It is nice that the authors compared their model to the one "without lookahead" in Figure 4, but this comparison requires more evidence in my opinion, as I explain in this comment. The model without lookahead is closely related or possibly equivalent to the standard predictive coding. In predictive coding, one can make the network follow the stimulus rapidly by reducing the time constant tau. However, as the time constant decreases, the network would become unstable both in simulations (due to limited integration time step) and physical implementation (due to noise). Therefore I wonder if the proposed model has an advantage over standard predictive coding with an optimized time constant. Hence I suggest to also add a comparison between the proposed model, and the predictive coding with parameters (such as tau) optimized independently for each model. Of course, we know that the time-constant of biological neurons is fixed, but biological neurons might have had different time constants (by changing leak conductance) and such analysis could shed light on the question of why the neurons are organized the way they are.

      The comparison with a predictive network for which the neuronal time constants shrink towards 0 is in fact helpful. We added two news subsections in the SI that formally compares the NLA with other approaches, Equilibrium propagation and the Latent Equilibrium, with a version of Equilibrium Propagation also covering the standard predictive coding you describe (SI, Sect.C and D). The Subsection C concludes: “In the Equilibrium propagation we cannot simply take the limit t0 since then the dynamics either disappears (when tau remains on the left, t Du  0) or explodes (when t is moved to the right, dt/ t  ∞), leading to either too small or too big jumps.”

      We have also expanded the passage on the predictive coding in the main text, comparing our instantaneous network processing (up to a remaining time constant tin) with experimental data from humans (see page 10 of the revised ms). The new paragraph ends with:

      “Notice that, from a technical perspective, making the time constants of individual cortical neurons arbitrarily short leads to network instabilities and is unlikely the option chosen by the brain (see SI Sect. C, Comparison to the Equilibrium Propagation).”

      A new formal definition of the moving equilibrium in the Methods (Sect. F) helps to understand this notion of being in a balanced equilibrium state during the dynamics. This formal definition directly leads to the contraction analysis in the SI, Sect. D, showing why the Latent Equilibrium is always contractive, while the current form of the NLA may show jumps at the corner of a ReLu (since a second order derivative of the transfer function enters in the error propagation).

      The reviewer perhaps has additional simulations in mind that compare the robustness of the different models. However, as this paper is more about presenting a novel concept with a comprehensive theory (summing up to 45 pages), we prefer to not add more than the simulations necessary to check the statements of the theorems.

      (2) I found this paper difficult to follow, because the Results sections went straight into details, and various elements of the model were introduced without explaining why they are necessary. Furthermore, the neural implementation was introduced after the model simulations. I suggest reorganizing the manuscript, to describe the model following Marr's levels of description and then presenting the results of simulations. In particular, I suggest starting the Results section by explaining what computation the network is trying to achieve (describe the setup, function L, define its integral over time, and explain that the goal is to find a model minimizing this integral). Then, I suggest presenting the algorithm the neurons need to employ to minimize this integral, i.e. their dynamics and plasticity (I wonder if r=rho(u) + tau rho(u)' is a consequence of action minimization or a necessary assumption - please clarify it). Next please explain how the algorithms could be implemented in biological neurons. Afterward please present the results of the simulation.

      We are sorry to realize that we could not convey the main message clearly enough. After rewriting the paper and straightening the narrative, we hope it is simpler to understand now.

      The paper does not suggest a new model to solve a task, and writing down the function to be minimized is not enough. The point of the NLA is that the time integral of our Lagrangian is minimized with respect to the prospective coordinates, i.e. the discounted future voltage. It is about the question how dynamic equations in biology are derived. Of course, we also solve these equations, prove theorems and perform simulations. But the main point that biology seems to deal with time differently than physics deals with time. Biology “thinks” in terms of future quantities, physics “thinks” in terms of current quantities. We tried to explain this better now in the Introduction, the Results (e.g. after Eq. 5) and the Methods.

      (3) Understanding the paper requires background knowledge that most readers of eLife are unlikely to have, even if they are mathematically minded. For example, I am from the field of computational neuroscience, and I have never heard about Least Action principle from physics or the EulerLagrange equation. I felt lost after reading this paper, and to be able to write this review I needed to watch videos on the Euler-Lagrange equation. To help other readers, I have two suggestions: First, I feel that Eq 4-6 could be moved to the methods, because I found the concept of u~ difficult to understand, and it does not appear in the algorithm. Second, I advise to write in the Introduction, what knowledge is required to follow this paper, and point the readers to resources where they can find the required information. The authors may specify what background is required to follow the main text, and what is required to understand the methods.

      We hope that after explaining the rationale better, it becomes clear that we cannot skip the equations for the prospective coordinates. Likewise, the Euler-Lagrange equations need to be presented in the abstract form, since these are the equations that are eventually transformed into the “model”. We tried to give the basic intuition for this in the main text. As we explained above, the equations asked to be skipped represent the essence of the proposal. It is about how to derive a model equations.

      Moreover, we give more explanations in the Methods to understand the derivations, and we refer to the specifically sections in the SI for further details. We are aware that a full understanding of the theory requires some basic knowledge of the calculus of variation.

      We are hesitating to write in the Introduction what type of knowledge is required to understand the paper. An understanding can be on various levels. Moreover, the materials that are considered to be helpful depend on the background. While for some it is a Youtube, for some Wikipedia, and for others it is a textbook where specific ingredients can be extracted. But we do cite two textbooks in the Results and more in the SI, Sect. F, when referring to the principle of least action in physics and the mathematics, including weblinks.

      Minor comments

      Eq.3: The Authors refer to this equation as a Lagrangian. Could you please clarify why? Is the logic to minimize the energy subject to a constraint that Cost = 0?

      Thanks for asking. The cost is not really a constraint, it is globally minimized, in parallel steps. We are explaining this right after Eq. 3. “We `prospectively' minimize L locally across a voltage trajectory, so that, as a consequence, the local synaptic plasticity for W will globally reduce the cost along the trajectory (Theorem 1 below).”

      We were adding two sentence that explain why this function in Eq. 3 is called a Lagrangian: “While in classical energy-based approaches L is called the total energy, we call it the `Lagrangian' because it will be integrated along real and virtual voltage trajectories as done in variational calculus (leading to the Euler-Lagrange equations, see below and SI, Sect. F)”

      p.4, below Eq. 5 - Please explain the rationale behind NLA, i.e. why is it beneficial that "the trajectory u˜(t) keeps the action A stationary with respect to small variations δu˜"? I guess you wish to minimize L integrated over time, but this is not evident from the text.

      Hmm, yes and no. We wish to minimize the cost, and on the way there minimize the action. Since the global minimization of C is technically difficult, one looks for stationary trajectory as defined in the cited sentence, while minimizing L with respect to W, to eventually minimize the cost.

      In the text we now explain after Eq. 5:

      “The motivation to search for a trajectory that keeps the action stationary is borrowed from physics. The motivation to search for a stationary trajectory by varying the near-future voltages ũ instead of u is assigned to the evolutionary pressure in biology to 'think ahead of time'. To not react too late, internal delays involved in the integration of external feedback need to be considered and eventually need to be overcome. In fact, only for the 'prospective coordinates' defined by looking ahead into the future, even when only virtually, will a real-time learning from feedback errors become possible (as expressed by our Theorems below).”

      Bottom of page 8. The authors say that in the case of single equilibrium and strong nudging the model reduced to the Least Control Principle. Does it also reduce to Predictive coding for supervised learning? If so, it would be helpful to state so.

      Yes, in this case the prediction error in the apical dendrite becomes the one of predictive coding. We are stating this now right at the end of the cited sentence:

      “In the case of strong nudging and a single steady-state equilibrium, the NLA principle reduces to the Least-Control Principle (Meulemans et al., 2022) that minimizes the mismatch energy E^M for a constant input and a constant target, with the apical prediction error becoming the prediction error from standard predictive coding (Rao & Ballard, 1999).”

      In the Discussion we also added a further point (iv) to compare the NLA principle with predictive coding. Both “improve” the sensory representation, but the NLA does in favor of an output, and the predictive coding in favor of the sensory prediction itself (see Discussion).

      Whenever you refer to supplementary materials, please specify the section, so it is easier for the reader to find it.

      Done. Sorry to not have done it earlier. We are now also indicate specific sections when referring to the Methods.

      Reviewer #2 (Recommendations For The Authors):

      There are no major issues with this article, but I have several considerations that I think would greatly improve the impact, clarity, and validity of the claims.

      (1) Unifying the narrative. There are many many ideas put forward in what feels like a deluge. While I appreciate the enthusiasm, as a reader I found it hard to understand what it was that the authors thought was the main breakthrough. For instance, the abstract, results, introduction, and discussion all seem to provide different answers to that question. The abstract seems to focus on the motor error idea. The introduction seems to focus on the novel prospective+predictive setup of the energy function. The discussion lists the different perks of the theory (delay compensation, moving equilibrium, microcircuit) without referring to the prospective+predictive setup of the energy function.

      Thanks much for these helpful hints. Yes, the paper became an agglomerate of many ideas, also own to the fact that we wish to show how the NLA principle can be applied to explain various phenomenology in neurosicence. We now simplified the narrative to this one point of providing a novel theoretical framework for neuroscience, and explaining why this is novel and why it “suddenly works” (the prospective minimization of the energy).

      As you can see from the dominating red in the revised pdf, we did fully rewrite Abstract, Introduction and Discussion under the narrative of the NLA and prospective coding.

      (2) Laying out the organization of the notation clearly. There are quite a few subtle distinctions of what is meant by the different weight matrices (omnibus matrix then input vs recurrent then layered architecture), different temporal horizon formalisms (bar, not bar, tilde), different operators (L, curly L, derivative version, integral version). These different levels are introduced on the fly, which makes it harder to grasp. The fact that there are many duplicate notations for the same quantities does not help the reader. For instance u_0 becomes equal to u_N at one point (above Eq 25). Another example is the constant flipping between integrated and 'current input' pictures. So laying out the multiple layers early, making a table or a figure for the notation, or sticking with one level would help convey the idea to a wide readership.

      Thanks for the hints. We included the table you suggested, but put it to the SI as it became a full page itself. We banned the curly L abbreviating the look-ahead operator.

      The “change of notation” you are alluding to is tricky, though. In a recurrent layer, the index of the output neuron is called o. In a forward network with N layer, the index of the output neurons becomes the last layer N. One has to introduce the layer index l anway for the deeper layers l < N, and we found it more consistent to explain that, while switching from the recurrent to the forward network, the voltage of the output layer becomes now u_o = u_N. There are more of these examples, like the weight matrix W splitting into a intrinsic network part W_net across which errors backpropagate, and a part conveying the input, W_in, that has to be excluded when writing the backpropagation formula for general networks. Again, in the case of the feedforward networks, the notation reduces to W_l, with index l coding for the layer. Presenting the general approach and a specific example may appear as we would duplicate notations – we haven’t found a solution here.

      (3) Separate the algorithm from the implementation level. I particularly struggled with separating the ideas that belonged to the algorithm level (cost function, optimization objectives) and the biophysics. The two are interwoven in a way that does not have to be. Particularly, some of the normative elements may be implemented by other types of biophysics than the authors have in mind. It is for this reason that I think that separating more clearly what belongs to the implementation and algorithm levels would help make the ideas more widely understood. On this point, a trigger point for me was the definition of the 'prospective input rates' e_i, which comes in the second paragraph.

      We are very sorry to have made you thinking that the 'prospective input rates' would be e_i. The prospective input rates are r_i. The misunderstanding likely appeared by an unclear formulation from our side that is now corrected (see first and second paragraph of the Results where we introduce r_i and e_i).

      From a biophysical perspective, it is quite arbitrary to define the input to be the difference between the basal input and the somatic (prospective) potential. It sounds like it comes from some unclear normative picture at this point. But the authors seem to have in mind to use the fact that the somatic potential is the sum of apical and basal input, that's the biophysical picture.

      We hope to have disentangled the normative and biophysical view in the 2nd and 3rd paragraph of the Results, respectively. We introduce the prospective error ei as abstract notion in the first paragraph, while explaining that it will be interpreted as somato-dendritic mismatch error in neuron I in the next paragraph. The second paragraph contains the biophysical details with the apical and basal morphology.

      (4) Experts and non-expert would appreciate an explanation of why/how the choice of state variables matters in the NLA. The prospective coding state variables cannot be said to be the naïve guess. Why does the simple u, dot{u} not work as state variables applied on the same energy function, as would be a naïve application of the Lagrangian ideas?

      We are very glad for this hint to present an intuition behind the variation of the action with respect to a prospective state, instead of the state itself. The simple L(u, dot{u}) does not work because one does not obtain the first-order voltage dynamics compatible with the biophysics. We made an effort to explain the intuition to non-experts and experts in an additional paragraph right after presenting the voltage and error dynamics (Eq. 7 on page 4).

      Here is how the paragraph starts (not displaying the formulas here):

      “From the point of view of theoretical physics, where the laws of motion derived from the least-action principle contain an acceleration term (as in Newton's law of motion, like … for a harmonic oscillator), one may wonder why no second-order time derivative appears in the NLA dynamics. As an intuitive example, consider driving into a bend. Looking ahead in time helps us to reduce the lateral acceleration by braking early enough, as opposed to braking only when the lateral acceleration is already present. This intuition is captured by minimizing the neuronal action A with respect to the discounted future voltages ũi instead of the instantaneous voltages ui.

      Keeping up an internal equilibrium in the presence of a changing environment requires to look ahead and compensate early for the predicted perturbations.

      Technically, …”

      More details are given in the Methods after Eq. 20. Moreover, in the last part of the SI, Sect. F, we have made the link to the least-action principle in physics more explicitly. There we show how the voltage dynamics can be derived from the physical least-action principle by including the Rayleigh dissipation (Eq. 92 and 95).

      (5) Specify that the learning rules have not been observed. Though the learning rules are Hebbian, the details of the rules have not to my knowledge been observed. Would be worth mentioning as this is a sticking point of most related theories.

      We agree, and we do now explicitly write in the Discussion that the learning rule still awaits to be experimentally tested.

      6) Some relevant literature. Chalk et al. PNAS (2018) have explored the relationship between temporal predictive coding and Rao & Ballard predictive coding based on the parameters of the cost function. Harkin et al. eLife (2023) have shown that 'prospective coding' also takes place in the serotonergic system, while Kim ... Ma (2021) have put forward similar ideas for dopamine, both may participate in setting the cost function. Instantaneous voltage propagation is also a focus of Greedy et al. (2023). The authors cite Zenke et al. for spiking error propagation, but there are biological references to that end.

      Thanks much for these hints. We do now cite the book of Gerstner & Kistler on spiking neurons, and more specifically the spike-based approach for learning to represent signals (Brendel, .., Machens, Denève, PLoS CB, 2020). Otherwise, we had difficulties to incorporate the other literature that seems to us not directly related to our approach, even when related notions come up (like predictive coding and temporal processing in Chalk et al. (2018), where various temporal coding schemes coding efficiency is studied as a function of the signal-to-noise ratio), or the apical activities in Greedy et al. (2022), where bursting, multiplexing and synaptic facilitation arises). We found it would confuse more than it would help if we would cite these papers too (we do already cite 95 papers).

      (7) In the main text, theorem two is presented as proof without assumptions on the level of nudging, but the actual proof uses strong assumptions in that respect, relying on numerical ad hoc observations for the general case.

      Thanks for pointing this out. We agree it is a better style to state all the critical assumptions in Theorem itself, rather than deferring them to the Methods. We now state: “Then, for suitable top-down nudging, learning rates, and initial conditions, the ….weights …evolve such that…”.

      (8) In the discussion regarding error-backpropagation, it seems to me that it could be clarified that the current algorithm asks for a weight alignment between FF and FB matrices as well as between FB and interneuron circuit matrices. Whether all of these matrices can be learned together remains to be shown; neither Akrout, Kunin nor Max et al. have shown this explicitly. Particularly when there are other inputs to the apical dendrites from other areas.

      Yes, it is difficult to learn to align all in parallel. Nevertheless, our simulations in fact do align the lateral and vertical circuits, at is also claimed in Theorem 2. Yet, as specified in the theorem, “for suitable learning rates” (that were all the same, but were commonly reduced after some training time, as previously explained in the Methods, Details for Fig. 5).

      In the Discussion we now emphasis that, in general, simulating all the circuitries jointly from scratch in a single phase is tricky. We write:

      “A fundamental difficulty arises when the neuronal implementation of the Euler-Lagrange equations requires an additional microcircuit with its own dynamics. This is the case for the suggested microcircuit extracting the local errors. Formally, the representation of the apical feedback errors first needs to be learned before the errors can teach the feedforward synapses on the basal dendrites. We showed that this error learning can itself be formulated as minimizing an apical mismatch energy. What the lateral feedback through interneurons cannot explain away from the top-down feedback remains as apical prediction error.

      Ideally, while the network synapses targetting the basal tree are performing gradient descent on the global cost, the microcircuit synapses involved in the lateral feedback are performing gradient descent on local error functions, both at any moment in time.

      The simulations show that this intertwined system can in fact learn simultaneously with a common learning rate that is properly tuned. The cortical model network of inter- and pyramidal neurons learned to classify handwritten digits on the fly, with 10 digit samples presented per second. Yet, the overall learning is more robust if the error learning in the apical dendrites operates in phases without output teaching but with corresponding sensory activity, as may arise during sleep (see e.g. Deperrois et al., 2022 and 2023).”

      (9) The short-term depression model is assuming a slow type of short-term depression, not the fast types that are the focus of much recent experimental literature (like Campagnola et al. Science 2022).

      This assumption should be specified.

      Thanks for hinting to this literature that we were not aware of. We are now citing the releaseindependent plasticity (Campagnola et al. 2022) in the context of our synaptic depression model.

      (10) There seems to be a small notation issue: Eq 21 combines vectors of the size of the full network (bar{e}) and the size of the readout network (bar{e}star).

      Well, for notational convenience we set the target error to e*=0 for non-output neurons. This way we can write the total error for an arbitrary network neuron as the sum of the backpropagated error plus the putative target error (if the neuron is an output neuron). Otherwise we would always have to distinguish between network neuron that may be output neurons, and those that are not. We did say this in the main text, but are repeating it now again right after Eq. 21. -- Notations are often the result of a tradoff.

    1. What are the ways in which a parasocial relationship can be authentic or inauthentic?

      I think parasocial relationships can be authentic when such a figure or celebrity has a positive influence on the follower, encouraging them to grow and become a better person. I also believe that having a sense of mutual respect for one another is crucial. Though the celebrity might not know them, they can still have some level of respect, and for the follower, they can create respect by also not forming an unhealthy attachment to the figure. However, I also believe there are times when the relationship might not be authentic, as followers might misunderstand their relationship and assume that they have a deeper connection. As we see with Jessica, she believed that Mr. Rogers knew her and liked her. Although in Jessica's case it is quite an innocent misunderstanding, in some cases, it can lead to having unrealistic expectations of the figure as well as a lack of boundaries. Followers can presume the figure genuinely has a connection with them, and be devastatingly disappointed. Other times, followers may become obsessed with said figure and behave irrationally. As such, I think parasocial relationships can be authentic to a limit. I think it is important for the follower and even the figure to clarify the extent of their relationship.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a new and valuable theoretical account of spatial representational drift in the hippocampus. The evidence supporting the claims is convincing, with a clear and accessible explanation of the phenomenon. Overall, this study will likely attract researchers exploring learning and representation in both biological and artificial neural networks.

      We would like to ask the reviewers to consider elevating the assessment due to the following arguments. As noted in the original review, the study bridges two different fields (machine learning and neuroscience), and does not only touch a single subfield (representational drift in neuroscience). In the revision, we also analysed data from four different labs, strengthening the evidence and the generality of the conclusions.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors start from the premise that neural circuits exhibit "representational drift" -- i.e., slow and spontaneous changes in neural tuning despite constant network performance. While the extent to which biological systems exhibit drift is an active area of study and debate (as the authors acknowledge), there is enough interest in this topic to justify the development of theoretical models of drift.

      The contribution of this paper is to claim that drift can reflect a mixture of "directed random motion" as well as "steady state null drift." Thus far, most work within the computational neuroscience literature has focused on the latter. That is, drift is often viewed to be a harmless byproduct of continual learning under noise. In this view, drift does not affect the performance of the circuit nor does it change the nature of the network's solution or representation of the environment. The authors aim to challenge the latter viewpoint by showing that the statistics of neural representations can change (e.g. increase in sparsity) during early stages of drift. Further, they interpret this directed form of drift as "implicit regularization" on the network.

      The evidence presented in favor of these claims is concise. Nevertheless, on balance, I find their evidence persuasive on a theoretical level -- i.e., I am convinced that implicit regularization of noisy learning rules is a feature of most artificial network models. This paper does not seem to make strong claims about real biological systems. The authors do cite circumstantial experimental evidence in line with the expectations of their model (Khatib et al. 2022), but those experimental data are not carefully and quantitatively related to the authors' model.

      We thank the reviewer for pushing us to present stronger experimental evidence. We now analysed data from four different labs. Two of those are novel analyses of existing data (Karlsson et al, Jercog et al). All datasets show the same trend - increasing sparsity and increasing information per cell. We think that the results, presented in the new figure 3, allow us to make a stronger claim on real biological systems.

      To establish the possibility of implicit regularization in artificial networks, the authors cite convincing work from the machine-learning community (Blanc et al. 2020, Li et al., 2021). Here the authors make an important contribution by translating these findings into more biologically plausible models and showing that their core assumptions remain plausible. The authors also develop helpful intuition in Figure 4 by showing a minimal model that captures the essence of their result.

      We are glad that these translation efforts are appreciated.

      In Figure 2, the authors show a convincing example of the gradual sparsification of tuning curves during the early stages of drift in a model of 1D navigation. However, the evidence presented in Figure 3 could be improved. In particular, 3A shows a histogram displaying the fraction of active units over 1117 simulations. Although there is a spike near zero, a sizeable portion of simulations have greater than 60% active units at the end of the training, and critically the authors do not characterize the time course of the active fraction for every network, so it is difficult to evaluate their claim that "all [networks] demonstrated... [a] phase of directed random motion with the low-loss space." It would be useful to revise the manuscript to unpack these results more carefully. For example, a histogram of log(tau) computed in panel B on a subset of simulations may be more informative than the current histogram in panel A.

      The previous figure 3A was indeed confusing. In particular, it lumped together many simulations without proper curation. We redid this figure (now Figure 4), and added supplementary figures (Figures S1, S2) to better explain our results. It is now clear that the simulations with a large number of active units were either due to non-convergence, slow timescale of sparsification or simulations featuring label noise in which the fraction of active units is less affected. Regarding the log(tau) calculation, while it could indeed be an informative plot, it could not be calculated in a simple manner for all simulations. This is because learning curves are not always exponential, but sometimes feature initial plateaus (see also Saxe et al 2013, Schuessler et al 2020). We added a more detailed explanation of this limitation in the methods section, and we believe the current figure exemplifies the effect in a satisfactory manner.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Representational drift as a result of implicit regularization" the authors study the phenomenon of representational drift (RD) in the context of an artificial network that is trained in a predictive coding framework. When trained on a task for spatial navigation on a linear track, they found that a stochastic gradient descent algorithm led to a fast initial convergence to spatially tuned units, but then to a second very slow, yet directed drift which sparsified the representation while increasing the spatial information. They finally show that this separation of timescales is a robust phenomenon and occurs for a number of distinct learning rules.

      Strengths:

      This is a very clearly written and insightful paper, and I think people in the community will benefit from understanding how RD can emerge in such artificial networks. The mechanism underlying RD in these models is clearly laid out and the explanation given is convincing.

      We thank the reviewer for the support.

      Weaknesses:

      It is unclear how this mechanism may account for the learning of multiple environments.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      The process of RD through this mechanism also appears highly non-stationary, in contrast to what is seen in familiar environments in the hippocampus, for example.

      The non-stationarity noted by the reviewer is indeed a major feature of our observations, and is indeed linked to familiarity. We divide learning into three phases (now more clearly stated in Table 1 and Figure 4C). The first, rapid phase, consists of improvement of performance - corresponding to initial familiarity with the environment. The third phase, often reported in the literature of representational drift, is indeed stationary and obtained after prolonged familiarity. Our work focuses on the second phase, which is not as immediate as the first one, and can take several days. We note in the discussion that experiments which include a long familiarization process can miss this phase (see also Table 3). Furthermore, we speculate that real life is less stationary than a lab environment, and this second phase might actually be more relevant there.

      Reviewer #3 (Public Review):

      Summary:

      Single-unit neural activity tuned to environmental or behavioral variables gradually changes over time. This phenomenon, called representational drift, occurs even when all external variables remain constant, and challenges the idea that stable neural activity supports the performance of well-learned behaviors. While a number of studies have described representational drift across multiple brain regions, our understanding of the underlying mechanism driving drift is limited. Ratzon et al. propose that implicit regularization - which occurs when machine learning networks continue to reconfigure after reaching an optimal solution - could provide insights into why and how drift occurs in neurons. To test this theory, Ratzon et al. trained a Feedforward Network trained to perform the oft-utilized linear track behavioral paradigm and compare the changes in hidden layer units to those observed in hippocampal place cells recorded in awake, behaving animals.

      Ratzon et al. clearly demonstrate that hidden layer units in their model undergo consistent changes even after the task is well-learned, mirroring representational drift observed in real hippocampal neurons. They show that the drift occurs across three separate measures: the active proportion of units (referred to as sparsification), spatial information of units, and correlation of spatial activity. They continue to address the conditions and parameters under which drift occurs in their model to assess the generalizability of their findings.

      However, the generalizability results are presented primarily in written form: additional figures are warranted to aid in reproducibility.

      We added figures, and a Github with all the code to allow full reproducibility.

      Last, they investigate the mechanism through which sparsification occurs, showing that the flatness of the manifold near the solution can influence how the network reconfigures. The authors suggest that their findings indicate a three-stage learning process: 1) fast initial learning followed by 2) directed motion along a manifold which transitions to 3) undirected motion along a manifold.

      Overall, the authors' results support the main conclusion that implicit regularization in machine learning networks mirrors representational drift observed in hippocampal place cells.

      We thank the reviewer for this summary.

      However, additional figures/analyses are needed to clearly demonstrate how different parameters used in their model qualitatively and quantitatively influence drift.

      We now provide additional figures regarding parameters (Figures S1, S2).

      Finally, the authors need to clearly identify how their data supports the three-stage learning model they suggest.

      Their findings promise to open new fields of inquiry into the connection between machine learning and representational drift and generate testable predictions for neural data.

      Strengths:

      (1) Ratzon et al. make an insightful connection between well-known phenomena in two separate fields: implicit regularization in machine learning and representational drift in the brain. They demonstrate that changes in a recurrent neural network mirror those observed in the brain, which opens a number of interesting questions for future investigation.

      (2) The authors do an admirable job of writing to a large audience and make efforts to provide examples to make machine learning ideas accessible to a neuroscience audience and vice versa. This is no small feat and aids in broadening the impact of their work.

      (3) This paper promises to generate testable hypotheses to examine in real neural data, e.g., that drift rate should plateau over long timescales (now testable with the ability to track single-unit neural activity across long time scales with calcium imaging and flexible silicon probes). Additionally, it provides another set of tools for the neuroscience community at large to use when analyzing the increasingly high-dimensional data sets collected today.

      We thank the reviewer for these comments. Regarding the hypotheses, these are partially confirmed in the new analyses we provide of data from multiple labs (new Figure 3 and Table 3) - indicating that prolonged exposure to the environment leads to more stationarity.

      Weaknesses:

      (1) Neural representational drift and directed/undirected random walks along a manifold in ML are well described. However, outside of the first section of the main text, the analysis focuses primarily on the connection between manifold exploration and sparsification without addressing the other two drift metrics: spatial information and place field correlations. It is therefore unclear if the results from Figures 3 and 4 are specific to sparseness or extend to the other two metrics. For example, are these other metrics of drift also insensitive to most of the Feedforward Network parameters as shown in Figure 3 and the related text? These concerns could be addressed with panels analogous to Figures 3a-c and 4b for the other metrics and will increase the reproducibility of this work.

      We note that the results from figures 3 and 4 (original manuscript) are based on abstract tasks, while in figure 2 there is a contextual notion of spatial position. Spatial position metrics are not applicable to the abstract tasks as they are simple random mapping of inputs, and there isn’t necessarily an underlying latent variable such as position. This transition between task types is better explained in the text now. In essence the spatial information and place field correlation changes are simply signatures of the movements in parameter space. In the abstract tasks their change becomes trivial, as the spatial information becomes strongly correlated with sparsity and place fields are simply the activity vectors of units. These are guaranteed to change as long as there are changes in the activity statistics. We present here the calculation of these metrics averaged over simulations for completeness.

      Author response image 1.

      PV correlation between training time points averaged over 362 simulations. (B) Mean SI of units normalized to first time step, averaged over 362 simulations. Red line shows the average time point of loss convergence, the shaded area represents one standard deviation.

      (2) Many caveats/exceptions to the generality of findings are mentioned only in the main text without any supporting figures, e.g., "For label noise, the dynamics were qualitatively different, the fraction of active units did not reduce, but the activity of the units did sparsify" (lines 116-117). Supporting figures are warranted to illustrate which findings are "qualitatively different" from the main model, which are not different from the main model, and which of the many parameters mentioned are important for reproducing the findings.

      We now added figures (S1, S2) that show this exactly. We also added a github to allow full reproduction.

      (3) Key details of the model used by the authors are not listed in the methods. While they are mentioned in reference 30 (Recanatesi et al., 2021), they need to be explicitly defined in the methods section to ensure future reproducibility.

      The details of the simulation are detailed in the methods sections. We also added a github to allow full reproducibility.

      (4) How different states of drift correspond to the three learning stages outlined by the authors is unclear. Specifically, it is not clear where the second stage ends, and the third stage begins, either in real neural data or in the figures. This is compounded by the fact that the third stage - of undirected, random manifold exploration - is only discussed in relation to the introductory Figure 1 and is never connected to the neural network data or actual brain data presented by the authors. Are both stages meant to represent drift? Or is only the second stage meant to mirror drift, while undirected random motion along a manifold is a prediction that could be tested in real neural data? Identifying where each stage occurs in Figures 2C and E, for example, would clearly illustrate which attributes of drift in hidden layer neurons and real hippocampal neurons correspond to each stage.

      Thanks for this comment, which urged us to better explain these concepts.

      The different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      Recommendations for the authors:

      The reviewers have raised several concerns. They concur that the authors should address the specific points below to enhance the manuscript.

      (1) The three different phases of learning should be clearly delineated, along with how they are determined. It remains unclear in which exact phase the drift is observed.

      This is now clearly explained in the new Table 1 and Figure 4C. Note that the different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      (2) The term "sparsification" of unit activity is not fully clear. Its meaning should be more explicitly explained, especially since, in the simulations, a significant number of units appear to remain active (Fig. 3A).

      We now define precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

      (3) While the study primarily focuses on one aspect of representational drift-the proportion of active units-it should also explore other features traditionally associated with representational drift, such as spatial information and the correlation between place fields.

      This absence of features is related to the abstract nature of some of the tasks simulated in our paper. In our original submission the transition between a predictive coding task to more abstract tasks was not clearly explained, creating some confusion regarding the measured metrics. We now clarified the motivation for this transition.

      Both the initial simulation and the new experimental data analysis include spatial information (Figures 2,3). The following simulations (Figure 4) with many parameter choices use more abstract tasks, for which the notion of correlation between place cells and spatial information loses its meaning as there is no spatial ordering of the inputs, and every input is encountered only once. Spatial information becomes strongly correlated with the inverse of the active fraction metric. The correlation between place cells is also directly linked to increase in sparseness for these tasks.

      (4) There should be a clearer illustration of how labeling noise influences learning dynamics and sparsification.

      This was indeed confusing in the original submission. We removed the simulations with label noise from Figure 4, and added a supplementary figure (S2) illustrating the different effects of label noise.

      (5) The representational drift observed in this study's simulations appears to be nonstationary, which differs from in vivo reports. The reasons for this discrepancy should be clarified.

      We added experimental results from three additional labs demonstrating a change in activity statistics (i.e. increase in spatial information and increase in sparseness) over a long period of time. We suggest that such a change long after the environment is already familiar is an indication for the second phase, and stress that this change seems to saturate at some point, and that most drift papers start collecting data after this saturation, hence this effect was missed in previous in vivo reports. Furthermore, these effects are become more abundant with the advent on new calcium imaging methods, as the older electrophysiological regording methods did not usually allow recording of large amounts of cells for long periods of time. The new Table 3 surveys several experimental papers, emphasizing the degree of familiarity with the environment.

      (6) A distinctive feature of the hippocampus is its ability to learn different spatial representations for various environments. The study does not test representational drift in this context, a topic of significant interest to the community. Whether the authors choose to delve into this is up to them, but it should at least be discussed more comprehensively, as it's only briefly touched upon in the current manuscript version.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      (7) The methods section should offer more details about the neural nets employed in the study. The manuscript should be explicit about the terms "hidden layer", "units", and "neurons", ensuring they are defined clearly and not used interchangeably..

      We changed the usage of these terms to be more coherent and made our code publicly available. Specifically, “units” refer to artificial networks and “neurons” to biological ones.

      In addition, each reviewer has raised both major and minor concerns. These are listed below and should be addressed where possible.

      Reviewer #1 (Recommendations For The Authors):

      I recommend that the authors edit the text to soften their claims. For example:

      In the abstract "To uncover the underlying mechanism, we..." could be changed to "To investigate, we..."

      Agree. Done

      On line 21, "Specifically, recent studies showed that..." could be changed to "Specifically, recent studies suggest that..."

      Agree. Done

      On line 100, "All cases" should probably be softened to "Most cases" or more details should be added to Figure 3 to support the claim that every simulation truly had a phase of directed random motion.

      The text was changed in accordance with the reviewer’s suggestion. In addition, the figure was changed and only includes simulations in which we expected unit sparsity to arise (without label noise). We also added explanations and supplementary figures for label noise.

      Unless I missed something obvious, there is no new experimental data analysis reported in the paper. Thus, line 159 of the discussion, "a phenomenon we also observed in experimental data" should be changed to "a phenomenon that recently reported in experimental data."

      We thank the reviewer for drawing our attention to this. We now analyzed data from three other labs, two of which are novel analyses on existing data. All four datasets show the same trends of sparseness with increasing spatial information. The new Figure 3 and text now describe this.

      On line 179 of the Discussion, "a family of network configurations that have identical performance..." could be softened to "nearly identical performance." It would be possible for networks to have minuscule differences in performance that are not detected due to stochastic batch effects or limits on machine precision.

      The text was changed in accordance with the reviewer’s suggestion.

      Other minor comments:

      Citation 44 is missing the conference venue, please check all citations are formatted properly.

      Corrected.

      In the discussion on line 184, the connection to remapping was confusing to me, particularly because the cited reference (Sanders et al. 2020) is more of a conceptual model than an artificial network model that could be adapted to the setting of noisy learning considered in this paper. How would an RNN model of remapping (e.g. Low et al. 2023; Remapping in a recurrent neural network model of navigation and context inference) be expected to behave during the sparsifying portion of drift?

      We now clarified this section. The conceptual model of Sanders et al includes a specific prediction (Figure 7 there) which is very similar to ours - a systematic change in robustness depending on duration of training. Regarding the Low et al model, using such mechanistic models is an exciting avenue for future research.

      Reviewer #2 (Recommendations For The Authors):

      I only have two major questions.

      (1) Learning multiple representations: Memory systems in the brain typically must store many distinct memories. Certainly, the hippocampus, where RD is prominent, is involved in the ongoing storage of episodic memories. But even in the idealized case of just two spatial memories, for example, two distinct linear tracks, how would this learning process look? Would there be any interference between the two learning processes or would they be largely independent? Is the separation of time scales robust to the number of representations stored? I understand that to answer this question fully probably requires a research effort that goes well beyond the current study, but perhaps an example could be shown with two environments. At the very least the authors could express their thoughts on the matter.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      (2) Directed drift versus stationarity: I could not help but notice that the RD illustrated in Fig.2D is not stationary in nature, i.e. the upper right and lower left panels are quite different. This appears to contrast with findings in the hippocampus, for example, Fig.3e-g in (Ziv et al, 2013). Perhaps it is obvious that a directed process will not be stationary, but the authors note that there is a third phase of steady-state null drift. Is the RD seen there stationary? Basically, I wonder if the process the authors are studying is relevant only as a novel environment becomes familiar, or if it is also applicable to RD in an already familiar environment. Please discuss the issue of stationarity in this context.

      The non-stationarity noted by the reviewer is indeed a major feature of our observations, and is indeed linked to familiarity. We divide learning into three phases (now more clearly stated in Table 1 and Figure 4C). The first, rapid, phase consists of improvement of performance - corresponding to initial familiarity with the environment. The third phase, often reported in the literature of representational drift, is indeed stationary and obtained after prolonged familiarity. Our work focuses on the second phase, which is not as immediate as the first one, and can take several days. We note in the discussion that experiments which include a long familiarization process can miss this phase (see also Table 3). Furthermore, we speculate that real life is less stationary than a lab environment, and this second phase might actually be more relevant there.

      Reviewer #3 (Recommendations For The Authors):

      Most of my general recommendations are outlined in the public review. A large portion of my comments regards increasing clarity and explicitly defining many of the terms used which may require generating more figures (to better illustrate the generality of findings) or modifying existing figures (e.g., to show how/where the three stages of learning map onto the authors' data).

      Sparsification is not clearly defined in the main text. As I read it, sparsification is meant to refer to the activity of neurons, but this needs to be clearly defined. For example, lines 262-263 in the methods define "sparseness" by the number of active units, but lines 116-117 state: "For label noise, the dynamics were qualitatively different, the fraction of active units did not reduce, but the activity of the units did sparsify." If the fraction of active units (defined as "sparseness") did not change, what does it mean that the activity of the units "sparsified"? If the authors mean that the spatial activity patterns of hidden units became more sharply tuned, this should be clearly stated.

      We now defined precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

      Likewise, it is unclear which of the features the authors outlined - spatial information, active proportion of units, and spatial correlation - are meant to represent drift. The authors should clearly delineate which of these three metrics they mean to delineate drift in the main text rather than leave it to the reader to infer. While all three are mentioned early on in the text (Figure 2), the authors focus more on sparseness in the last half of the text, making it unclear if it is just sparseness that the authors mean to represent drift or the other metrics as well.

      The main focus of our paper is on the non-stationarity of drift. Namely that features (such as these three) systematically change in a directed manner as part of the drift process. This is in The new analyses of experimental data show sparseness and spatial information.

      The focus on sparseness in the second half of the paper is because we move to more abstract These are also easy to study in the more abstract tasks in the second part of the paper. In our original submission the transition between a predictive coding task to more abstract tasks was not clearly explained, creating some confusion regarding the measured metrics. We now clarified the motivation for this transition.

      It is not clear if a change in the number of active units alone constitutes "drift", especially since Geva et al. (2023) recently showed that both changes in firing rate AND place field location drive drift, and that the passage of time drives changes in activity rate (or # cells active).

      Our work did not deal with purely time-dependent drift, but rather focused on experience-dependence. Furthermore, Geva et al study the stationary phase of drift, where we do not expect a systematic change in the total number of cells active. They report changes in the average firing rate of active cells in this phase, as a function of time - which does not contradict our findings.

      "hidden layer", "units", and "neurons" seem to be used interchangeably in the text (e.g., line 81-85). However, this is confusing in several places, in particular in lines 83-85 where "neurons" is used twice. The first usage appears to refer to the rate maps of the hidden layer units simulated by the authors, while the second "neurons" appears to refer to real data from Ziv 2013 (ref 5). The authors should make it explicit whether they are referring to hidden layer units or actual neurons to avoid reader confusion.

      We changed the usage of these terms to be more coherent. Specifically, “units” refer to artificial networks and “neurons” to biological ones.

      The authors should clearly illustrate which parts of their findings support their three-phase learning theory. For example, does 2E illustrate these phases, with the first tenth of training time points illustrating the early phase, time 0.1-0.4 illustrating the intermediate phase, and 0.4-1 illustrating the last phase? Additionally, they should clarify whether the second and third stages are meant to represent drift, or is it only the second stage of directed manifold exploration that is considered to represent drift? This is unclear from the main text.

      The different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      Line 45 - It appears that the acronym ML is not defined above here anywhere.

      Added.

      Line 71: the ReLU function should be defined in the text, e.g., sigma(x) = x if x > 0 else 0.

      Added.

      106-107: Figures (or supplemental figures) to demonstrate how most parameters do not influence sparsification dynamics are warranted. As written, it is unclear what "most parameters" mean - all but noise scale. What about the learning rule? Are there any interactions between parameters?

      We now removed the label noise from Figure 4, and added two supplementary figures to clearly explain the effect of parameters. Figure 4 itself was also redone to clarify this issue.

      2F middle: should "change" be omitted for SI?

      The panel was replaced by a new one in Figure 3.

      116-119: A figure showing how results differ for label noise is warranted.

      This is now done in Figure S1, S2.

      124: typo, The -> the

      Corrected.

      127-129: This conclusion statement is the first place in the text where the three stages are explicitly outlined. There does not appear to be any support or further explanation of these stages in the text above.

      We now explain this earlier at the end of the Introduction section, along with the new Table 1 and marking on Figure 4C.

      132-133 seems to be more of a statement and less of a prediction or conclusion - do the authors mean "the flatness of the loss landscape in the vicinity of the solution predicts the rate of sparsification?"

      We thank the reviewer for this observation. The sentence was rephrased:

      Old: As illustrated in Fig. 1, different solutions in the zero-loss manifold might vary in some of their properties. The specific property suggested from theory is the flatness of the loss landscape in the vicinity of the solution.

      New: As illustrated in Fig. 1, solutions in the zero-loss manifold have identical loss, but might vary in some of their properties. The authors of [26] suggest that noisy learning will slowly increase the flatness of the loss landscape in the vicinity of the solution.

      135: typo, it's -> its

      Corrected.

      Line 135-136 "Crucially, the loss on the 136 entire manifold is exactly zero..." This appears to contradict the Figure 4A legend - the loss appears to be very high near the top and bottom edges of the manifold in 4A. Do the authors mean that the loss along the horizontal axis of the manifold is zero?

      The reviewer is correct. The manifold mentioned in the sentence is indeed the horizontal axis. We changed the text and the figure to make it clearer.

      Equation 6: This does not appear to agree with equation 2 - should there be an E_t term for an expectation function?

      Corrected.

      Line 262-263: "Sparseness means that a unit has become inactive for all inputs." This should also be stated explicitly as the definition of sparseness/sparsification in the main text.

      We now define precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General comments

      All three experts have raised excellent ideas and made important suggestions to extend the scope of our study and provide additional information. While we fully acknowledge that these points are valid and would provide exciting new knowledge, we also should not lose track of the fact that a single study cannot cover all bases. Sulfated steroids, for example, are clearly essential components of mouse urine. Unfortunately, however, all chemical analysis approaches are limited and the one we opted for is not suitable for analysis of such signaling molecules. Future studies should certainly focus on these aspects. The same holds true for the fact that we do not know which of the identified compounds are actually VSN ligands. These are inherent limitations of the approach, and we are not claiming otherwise.

      Reviewer #1 (Public Review):

      (1) In this manuscript, Nagel et al. sought to comprehensively characterize the composition of urinary compounds, some of which are putative chemosignals. They used urines from adult males and females in three different strains, including one wild-derived strain. By performing mass spectrometry of two classes of compounds: volatile organic compounds and proteins, they found that urines from inbred strains are qualitatively similar to those of a wild strain. This finding is significant because there is a high degree of genetic diversity in wild mice, with chemosensory receptor genes harboring many polymorphisms.

      We agree and thank the Reviewer for his / her positive assessment.

      (2) In the second part of this work, the authors used calcium imaging to monitor the pattern of vomeronasal neuron responses to these urines. By performing pairwise comparisons, the authors found a large degree of strain-specific response and a relatively minor response to sex-specific urinary stimuli. This is a finding generally in agreement with previous calcium imaging work by Ron Yu and colleagues in 2008. The authors extend the previous work by using urines from wild mice. They further report that the concentration diversity of urinary compounds in different urine batches is largely uncorrelated with the activity profiles of these urines. In addition, the authors found that the patterns of vomeronasal neuron response to urinary cues are not identical when measured using different recipient strains. This fascinating finding, however, requires an additional control to exclude the possibility that this is not due to sampling error.

      We thank Reviewer 1 for pointing this out. We agree that this is truly a “fascinating finding.” Reviewer 1 emphasizes that we need to add an “additional control to exclude […] that this is not due to sampling error”, and he / she elaborates on the required control in his / her Recommendations For The Authors (see below). Reviewer 1 states that “for Fig. 5, in order to conclude that the same urine activates a different population of VSNs in two different strains, a critical control is needed to demonstrate that this is not due to the sampling variability - as compositions of V1Rs and V2Rs could vary between different slices, one preferred control is to use VNO slices from the same strain and compare the selectivity used here across the A-P axis.” Importantly, we believe that this is already controlled for. In fact, for each experiment, we routinely prepare VNO slices along the organ’s entire anterior-to-posterior axis (not including the most anterior tip, where the VNO lumen tapers into the vomeronasal duct, and the most posterior part, the lumen ‘‘twists’’ toward the ventral aspect and its volume decreases (see Figs. 7 & S7 in Hamacher et al., 2024, Current Biology)). This usually yields ~7 slices per individual experiment / session. Therefore, we routinely sample and average across the entire VNO anterior-to-posterior axis for each experiment. In Fig. 5, in which we analyzed whether the “same urine activates a different population of VSNs in two different strains”, individual independent experiments from each strain (C57BL/6 versus BALB/c) amounted to (a) n = 6 versus n = 8; (b) n = 10 versus n = 10; (c) n = 7 versus n = 9; (d) n = 9 versus n = 10; (e) n = 10 versus n = 9; and (f) n = 12 versus n = 10. Together, we conclude that it is very unlikely that the considerably different response profiles measured in different recipient strains result from a “sampling error.”

      To clarify this point in the revised manuscript, we now explain our sampling routine in more detail in the Materials and Methods. Moreover, we now also refer to this point in the Results.

      (3) There are several weaknesses in this manuscript, including the lack of analysis of the compositions of sulfated steroids and other steroids, which have been proposed to be the major constituents of vomeronasal ligands in urines and the indirect (correlational) nature of their mass spectrometry data and activity data.

      Reviewer 1 is correct to point out that our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other nonvolatile small organic molecules for three main reasons: (i) as the reviewer points out, (sulfated) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a feature that we consider a strength of the current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      (4) Overall, the major contribution of this work is the identification of specific molecules in mouse urines. This work is likely to be of significant interest to researchers in chemosensory signaling in mammals and provides a systematic avenue to exhaustively identify vomeronasal ligands in the future.

      We thank the Reviewer for his / her generally positive assessment.

      Reviewer #2 (Public Review):

      (1) This manuscript by Nagel et al provides a comprehensive examination of the chemical composition of mouse urine (an important source of semiochemicals) across strain and sex, and correlates these differences with functional responses of vomeronasal sensory neurons (an important sensory population for detecting chemical social cues). The strength of the work lies in the careful and comprehensive imaging and chemical analyses, the rigor of quantification of functional responses, and the insight into the relevance of olfactory work on lab-derived vs wild-derived mice.

      We thank the Reviewer for his / her generally positive assessment.

      (2) With regards to the chemical analysis, the reader should keep in mind that a difference in the concentration of a chemical across strain or sex does not necessarily mean that that chemical is used for chemical communication. In the most extreme case, the animals may be completely insensitive to the chemical. Thus, the fact that the repertoire of proteins and volatiles could potentially allow sex and/or strain discrimination, it is unclear to what degree both are used in different situations.

      Reviewer 2 is correct to point out that sex- and/or strain-dependent differences in urine molecular composition do not automatically attribute a signaling function to those molecules. We concur and, in fact, stress this point many times throughout the manuscript. In the Results, for example, we point out (i) that “in female urine, BALB/c-specific proteins are substantially underrepresented, a fact not reflected by VSN response profiles”, (ii) that “as observed in C57BL/6 neurons, the skewed distributions of protein concentration indices were not reflected by BALB/c generalist VSN profiles”, and (iii) that “VSN population response profiles do not reflect the global molecular content of urine, suggesting that the VNO functions as a rather selective molecular detector.” Moreover, in the Discussion, we state (i) that “caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone”; (ii) that, for several sex- and/or strain-specific molecules, none “has previously been attributed a chemosensory function. Challenging the mouse VNO with purified recombinant protein(s) will help elucidate whether such functions exist”; (iii) that “generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations”; and (iv) that “to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”

      In the revised manuscript, we now aim to even more strongly emphasize the point made by Reviewer 2. In the Discussion, we have deleted a sentence that read: “Sex- and strain-specific chemical profiles give rise to unique VSN activity patterns.” Moreover, we have added the following statement: “In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all.”

      Reviewer #3 (Public Review):

      (1) One of the primary objectives in this study is to ascertain the extent to which the response profiles of VSNs are specific to sex and strain. The design of these Ca2+ imaging experiments uses a simple stimulus design, using two interleaved bouts of stimulation with pairs of urine (e.g. male versus female C57BL/6, male C57BL/6 versus male BALB/c) at a single dilution factor (1:100). This introduces two significant limitations: (1) the "generalist" versus "specialist" descriptors pertain only to the specific pairwise comparisons made and (2) there is no information about the sensitivity/concentration-dependence of the responses.

      Reviewer 3 points to two limitations of our VSN activity assay. He / she is correct to mention that characterizing a VSN as generalist or specialist based on a “pairwise comparison” should not be the basis of attributing such a “generalist” or “specialist” label in general (i.e., regarding the global stimulus space). We acknowledge this point, but we do not regard this as a limitation of our study since we are not investigating rather broad (i.e., multidimensional) questions of selectivity. All we are asking in the context of this study is whether VSNs - when being challenged with pairs of sex- or strain-specific urine samples - act as rather selective semiochemical detectors. Of course, one can always think of a study design that provides more information. However, we here opted for an assay that - in our hands - is robust, “low noise” (i.e., displays low intrinsic signal variability as evident form reliability index calculations), ensures recovery from VSN adaptation (Wong et al., 2018), and, importantly, answers the specific question we are asking.

      Regarding the second point (“there is no information about the sensitivity/concentrationdependence of the responses”), we would like to emphasize that this was not a focus of our study either. In fact, concentration-dependence of VSN activity has been a major focus of several previous studies referenced in our manuscript (e.g., Leinders-Zufall et al., 2000; He et al., 2008), albeit with contradictory results. In our study, we ask whether a pair of stimuli that we have shown to display, in part, strikingly different chemical composition (both absolute and relative) preferentially activates the same or different VSNs. With this question in mind, we believe that our assay (and its results) are highly informative.

      (2) The functional measurements of VSN tuning to various pairs of urine stimuli are consistently presented alongside mass spectrometry-based comparisons. Although it is clear from the manuscript text that the mass spectrometry-based analysis was separated from the VSN tuning experiments/analysis, the juxtaposition of VSN tuning measurements with independent molecular diversity measurements gives the appearance to readers that these experiments were integrated (i.e., that the diversity of ligands was underlying the diversity of physiological responses). This is a hypothesis raised by the parallel studies, not a supported conclusion of the work. This data presentation style risks confusing readers.

      As Reviewer 3 points out correctly “it is clear from the manuscript text that the mass spectrometry-based analysis was separated from the VSN tuning experiments/analysis.” In the figures, we try make the distinction between VSN response statistics and chemical profiling more obvious by gray shadows that link the plots depicting VSN response characteristics to the general pie charts.

      We now also made an extra effort to avoid “confusing readers” by stating in the Discussion (i) that “caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone”; (ii) that, for several sex- and/or strain-specific molecules, none “has previously been attributed a chemosensory function. Challenging the mouse VNO with purified recombinant protein(s) will help elucidate whether such functions exist”; (iii) that “generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations”; and (iv) that “to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.” Moreover, we have deleted a sentence that read: “sex- and strain-specific chemical profiles give rise to unique VSN activity patterns”, and we have added the following statement: “In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all.”

      However, we believe that there is value in presenting “VSN tuning measurements” next to “independent molecular diversity measurements.” While these are independent measurements, their similarity or, quite frequently, lack thereof are informative. We are sure that by taking the above “precautions” we have now mitigated the risk of “confusing readers.”

      (3) The impact of mass spectrometry findings is limited by the fact that none of these molecules (in bulk, fractions, or monomolecular candidate ligands) were tested on VSNs. It is possible that only a very small number of these ligands activate the VNO. The list of variably expressed proteins - especially several proteins that are preferentially found in female urine - is compelling, but, again, there is no evidence presented that indicates whether or not these candidate ligands drive VSN activity. It is noteworthy that the largest class of known natural ligands for VSNs are small nonvolatiles that are found at high levels in mouse urine. These molecules were almost certainly involved in driving VSN activity in the physiology assays (both "generalist" and "specialist"), but they are absent from the molecular analysis.

      Reviewer 3 is right, of course, that at this point we have not tested the identified molecules on VSNs. This is clearly beyond the scope of the present study. We believe that the data we present will be the basis of (several full-length) future studies that aim to identify specific ligands and - best case scenario - receptor-ligand pairs. We find it hard to concur that our study, which provides the necessary basis for those future endeavors, is regarded as “incomplete”. By design, all studies are somewhat incomplete, i.e., there are always remaining questions and we are not contesting that.

      It is true, of course, that a class of “known natural ligands for VSNs are small nonvolatiles.” As we replied above, our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other non-volatile small organic molecules for three main reasons: (i) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a fact that we consider a key strength of our current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      Reviewer #1 (Recommendations For The Authors):

      (1) I find that the study is highly valuable for researchers in this field. With the finding that wild mouse urines do not elicit significantly more variable responses from urines from inbred strains, researchers can now be reassured to use inbred strains to gain general insights on pheromone signaling.

      A major omission of this study is non-volatile small organic molecules such as steroids. These compounds are the only molecular class in urine that have been identified to stimulate specific vomeronasal receptors to date. It is unclear to me that the specificity of VOC and proteins can alone fully explain the response specificity of the VSNs that have been monitored in this study. The discussion of this topic is highly beneficial for the readers.

      Reviewer 1 is correct to point out that our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other nonvolatile small organic molecules for three main reasons: (i) as the reviewer points out, (sulfated) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a fact that we consider a key strength of our current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      (2) How many different wild mouse urines were tested in this study? Is this sufficient to capture the diversity of wild M. musculus in local (Prague) habitats?

      We thank the reviewer for pointing this out. For the present study, 20 male (M) and 27 female (F) wild mice were caught at six different sites in the broader Prague area (i.e., Bohnice (50.13415N, 14.41421E; 2M+4F), Dolni Brezany (49.96321N, 14.4585E; 3M+4F), Hodkovice (49.97227N, 14.48039E; 5M+6F), Písnice (49.98988N, 14.46625E; 3M+6F), Lhota (49.95369N, 14.43087E; 1M+2F), and Zalepy (49.9532N, 14.40829E; 6M+5F). 18 of the 27 wild females were caught pregnant. The remaining 9 females were mated with males caught at the same site and produced offspring within a month. When selecting 10 male and 10 female individuals from first-generation offspring for urine collection, we ensured that all six capture sites were represented and that age-matched animals displayed similar weight (~17g). We believe that this capture / breeding strategy sufficiently represents “the diversity of wild M. musculus in local (Prague) habitats.” In the revised manuscript, we have now included these details in the Materials and Methods.

      (3) I found Figure 1e and figures in a similar format confusing - one panel describes the response statistics of VSNs, and other panels show the number of compounds found in different MS profiling, which is not immediately obvious from the figures. Is the y-axis legend correct (%)?

      We now try make the distinction between VSN “response statistics” and chemical profiling more obvious by gray shadows that link the plots depicting VSN response characteristics to the general pie charts. Moreover, we thank the Reviewer for pointing out the mislabeling of the y-axis. Accordingly, we have deleted “%” in all corresponding figures.

      (4) For Figure 5, in order to conclude that the same urine activates a different population of VSNs in two different strains, a critical control is needed to demonstrate that this is not due to the sampling variability - as compositions of V1Rs and V2Rs could vary between different slices, one preferred control is to use VNO slices from the same strain and compare the selectivity used here across the A-P axis.

      We thank Reviewer 1 for pointing this out. Importantly, we believe that this is already controlled for (see our response to the Public Review). In fact, for each experiment, we routinely prepare VNO slices along the entire anterior-to-posterior axis (not including the most anterior tip, where the VNO lumen tapers into the vomeronasal duct, and the most posterior part, the lumen ‘‘twists’’ toward the ventral aspect and its volume decreases (see Figs. 7 & S7 in Hamacher et al., 2024, Current Biology)). This usually yields ~7 slices per individual experiment / session. Therefore, we routinely sample and average across the entire VNO anterior-to-posterior axis for each experiment. In Fig. 5, individual independent experiments from each strain (C57BL/6 versus BALB/c) amounted to (a) n = 6 versus n = 8; (b) n = 10 versus n = 10; (c) n = 7 versus n = 9; (d) n = 9 versus n = 10; (e) n = 10 versus n = 9; and (f) n = 12 versus n = 10. Together, we can thus exclude that the considerably different response profiles that we measured using different recipient strains result from a “sampling error.”

      To clarify this point in the revised manuscript, we now explain our sampling routine in more detail in the Materials and Methods. Moreover, we now also mention this point in the Results.

      Reviewer #2 (Recommendations For The Authors):

      (1) Pg 5 Lines 3-16: This summary paragraph contains too much detail given that the reader has not read the paper yet, which makes it bewildering. This should be condensed.

      We agree and have substantially condensed this paragraph.

      (2) Pg 6 Line 5-8: This summary of the experimental design is obtuse and should be edited for clarity.

      We have edited the relevant passage for clarity.

      (3) Pg 6 Line 11: "VSNs were categorized..." Specialist vs generalist is defined as responding to one or both stimuli. This definition is placed right after saying that the cells were also tested with KCl. The reader might think that specialist vs generalist was defined in relation to KCl.

      We have edited this sentence, which now reads: “Dependent on their individual urine response profiles, VSNs were categorized as either specialists (selective response to one stimulus) or generalists (responsive to both stimuli).”

      (4) Pg 6 Line 13: "we recorded urine-dependent Ca2+ signals from a total of 16,715 VSNs". Is a "signal" a response? Did all 16,715 VSNs respond to urine? What was the total of KCl responsive cells recorded?

      We edited the corresponding passage for clarification. The text now reads: “Overall, we recorded >43,000 K+-sensitive neurons, of which a total of 16,715 VSNs (38.4%) responded to urine stimulation. Of these urine-sensitive neurons, 61.4% displayed generalist profiles, whereas 38.6% were categorized as specialists (Figure 1c,d).”

      (5) Pg 7 Line 6: The repeated use of the word "pooled" is confusing as it suggests a variation in the experiment. The authors should establish once in the Methods and maybe in the Results that stimuli were pooled across animals. Then they should just refer to the stimulus as male or female or BALB/c rather than "pooled" male etc.

      We acknowledge the reviewer’s argument. Accordingly, we now introduce the experimental use of pooled urine once in the Methods and in the introductory paragraph of the Results. All other references to “pooled” urine in the Results and Captions have been deleted.

      (6) Pg 7 Line 10: "...detected in >=3 out of 10 male..." For the chemical analysis, were these samples not pooled?

      Correct. We deliberately did not pool samples for chemical analysis, but instead analyzed all individual samples separately (i.e., 60 samples were subjected to both proteomic and metabolomic analyses). Thus, the criterion that a VOC or protein must be detected in at least 3 of the 10 individual samples from a given sex/strain combination for a ‘present’ call (and in at least 6 of the 10 samples to be called ‘enriched’) ensures that the molecular signatures we identify are not “contaminated” by unusual aberrations within single samples.<br /> For clarification, we now explicitly outline this procedure in the Methods (Experimental Design and Statistical Analysis – Proteomics and metabolomics).

      (7) Pg 7 Line 23: In line 7, the specialist rate was defined as 5% in reference to the total KCl responsive cells. Here the specialist rate is defined from responsive cells. This is confusing.

      We apologize for the confusion. In both cases, the numbers (%) refer to all K+-sensitive neurons. We have added this information to both relevant sentences (l. 7 as well as ll. 23-24). Note that the rate in ll. 23-24 refers to generalists.

      (8) Pg 7 Line 25: Concentration index should be defined before its use here.

      We have revised the corresponding sentence, which now reads: “By contrast, analogously calculated concentration indices (see Materials and Methods) that can reflect potential disparities are distributed more broadly and non-normally (Figure 1h).”

      (9) Pg 7 Line 29: change "trivially" to "simply".

      Done

      (10) Pg 7 Line 30: What is meant by a "generalist" ligand? The neurons are generalists. Probably should read "common ligands"

      We have changed the text accordingly.

      (11) Pg 7 Line 31: What is meant by "global observed concentration disparities" ?

      We have changed the text to “…represented by the observed general concentration disparities.”

      (12) Pg 8 Lines 7-11: This section needs to be edited for clarity as it is very difficult to follow. For example, the definition of "enriched" is buried in a parenthetical. Also, it is very difficult to figure out what a "sample" is in this paper. Is it a pooled stimulus, or is it urine from an individual animal?

      We apologize for the confusion. Throughout the paper a “sample” is a pooled stimulus (from all 10 individuals of a given sex/strain combination) for all physiological experiments. For chemical analysis a “sample” refers to urine from an individual animal.

      (13)Pg 8 Line 11: "abundant proteins" Does this mean absolute concentration or enriched in one sample vs another?

      We changed the term “abundant” to “enriched” as this descriptor has been defined (present in ≥6 of 10 individual samples) in the previous sentence.

      (14) Pg 8 Line 18: "While 32.9% of all..." Please edit for clarity. What is the point?

      The main point here is that, for VOCs, the vast majority of compounds (91.3%) are either generic mouse urinary molecules or are sex/strain-specific.

      (15) Pg 10 Line 18: "Increased VSN selectivity..." This title is misleading as it suggests a change in sensitivity with animal exposure. I think the authors are trying to say "VSNs are more selective for strain than for sex". The authors should avoid the term "exposure to" when they mean "stimulation with" as the former suggests chronic exposure prior to testing.

      We thank the reviewer for the advice and have changed the title accordingly. We also edited the text to avoid the term "exposure to" throughout the manuscript.

      (16) Pg 12 Line 10: "we recorded hardly any..." Hardly any in comparison to what? BALB/c?

      We apologize for the confusion. We have edited the text for clarity, which now reads: “In fact, (i) compared to an average specialist rate of 11.2% ± 6.6% (mean ± SD) calculated over all 13 binary stimulus pairs (n = 26 specialist types), we observed only few specialist responses upon stimulation with urine from wild females (2% and 3%, respectively), and…”

      Reviewer #3 (Recommendations For The Authors):

      (1) Related to the pairwise stimulus-response experimental design and analysis: there is precedent in the field for studies that explore the same topic (sex- and strain-selectivity), but measure VSN sensitivity across many urine stimuli, not just two at a time. This has been done both in the VNO (He et al, Science, 2008; Fu, et al, Cell, 2015) and in the AOB (Tolokh, et al, Journal of Neuroscience, 2013). The current manuscript does not cite these studies.

      Reviewer 3 is correct and we apologize for this oversight. We now cite the two VSN-related studies by He et al. and Fu et al. in the Introduction.

      (2) The findings of the mass spectrometry-based profiling of mouse urine - especially for volatiles - is only accessible through repositories, making it difficult to for readers to understand which molecules were found to be highly divergent between sexes/strains. There is value in the list of ligands to further investigate, but this information should be made more accessible to readers without having to comb through the repositories.

      We agree that there “is value in the list of ligands to further investigate” and, accordingly, we now provide a table (Table 1) that lists the top-5 VOCs that – according to sPLS-DA – display the most discriminative power to classify samples by sex (related to Figure 2c) or strain (related to Figure 2d). For ease of identification, all entries list internal mass spectrometry identifiers, identifiers extracted from MS analysis database, the sex or strain that drives separation, which two-dimensional component / x-variate represents the most discriminative variable, PubChem chemical formula, PubChem common or alternative names, Chemical Entities of Biological Interest or PubChem Compound Identification, and the VOC’s putative origin.

      (3) There is a long precedent for integrating molecular assessments and physiological recordings to identify specific ligands for the vomeronasal system: - nonvolatiles (e.g., Leinders-Zufall, et al., Nature, 2000)

      • peptides (e.g., Kimoto et al., Nature, 2005; Leinders-Zufall et al. Science, 2004; Riviere et al., Nature, 2009; Liberles, et al., PNAS, 2009)
      • proteins (e.g., Chamero et al., Nature, 2007; Roberts et al., BMC Biology, 2010)

      • excreted steroids and bile acids (Nodari et al., Journal of Neuroscience, 2008; Fu et al., Cell, 2015; Doyle, et al., Nature Communications, 2016)

      The Leinders-Zufall (2000), Roberts, and Nodari papers are referenced, but the broader efforts by the community to find specific drivers of vomeronasal activity are not fully represented in the manuscript. The focus of this paper is fully related to this broader effort, and it would be appropriate for this work to be placed in this context in the introduction and discussion.

      We now refer to all of the studies mentioned in the Introduction (except the article published by Liberles et al. in 2009, since the authors of that study do not identify vomeronasal ligands).

      (4) Throughout the manuscript (starting in Fig. 1h) the figure panels and captions use the term "response index" whereas the methods define a "preference index." It seems to be the case that these two terms are synonymous. If so, a single term should be consistently used. If not, this needs to be clarified.

      We now consistently use the term “response index” throughout the manuscript.

      (5) It would be useful to provide a table associated with Figure 2 - figure supplement 1 that lists the common names and/or chemical formulas for the volatiles that were found to be of high importance.

      We agree and, accordingly, we now provide a table (Table 2) that lists VOC, which – according to Random Forest classification and resulting Gini importance scores – display the most discriminative power to classify samples by sex (related to Figure 2 - figure supplement 1a) or strain (related to Figure 2 - figure supplement 1b). Notably, it is generally reassuring that several VOCs are listed in both Table 1 and Table 2, emphasizing that two different supervised machine learning algorithms (i.e., sPLS-DA (Table 1) and Random Forest (Table 2)) yield largely congruent results.

      (5) The use of the term "comprehensive" for the molecular analysis is a little bit misleading, as volatiles and proteins are just two of the many categories of molecules present in mouse urine.

      We have now deleted most mentions of the term "comprehensive" when referring to the molecular analysis.

      (7) Page 11, lines 24-27: The sentences starting "We conclude..." and ending in "semiochemical concentrations." These two sentences do not make sense. It is not known how many of the identified proteins are actual VSN ligands. Moreover, there is abundant evidence from other studies that individual VSN activity provides information about distinct semiochemical concentrations.

      We have substantially edited and rephrased this paragraph to better reflect that different scenarios / interpretations are possible. The relevant text now reads: “We conclude that VSN population response strength might not be so strongly affected by strain-dependent concentration differences among common urinary proteins. In that case, it would appear somewhat unlikely that individual VSN activity provides fine-tuned information about distinct semiochemical concentrations. Alternatively, as some (or even many) of the identified proteins could not serve as vomeronasal ligands at all, generalist VSNs might sample information from only a subset of compounds which, in fact, are secreted at roughly similar concentrations.”

      (8) The explanation of stimulus timing is mentioned several times but not defined clearly in methods. Page 19, lines 14-19 have information about the stimulus delivery device, but it would be helpful to have stimulus timing explicitly stated.

      In addition to the relevant captions, we now explicitly state stimulus timing (i.e., 10 s stimulations at 180 s inter-stimulus intervals) in the Results.

      (9) Typos: Page 10, line 7: "male biased" → "male-biased" for clarity

      Wilcoxon "signed-rank" test is often misspelled "Wilcoxon singed ranked test" or "Wilcoxon signed ranked test"

      In the Fig. 3 legend, the asterisk meaning is unspecified.

      "(im)balances" → imbalances (page 27, line 24; page 37, line 16; page 38, line 16)

      Figure 2 - figure supplement 1 and in Figure 2 - figure supplement 2, in the box-andwhisker plots the units are not specified in the graph or legend.”

      We have made all required corrections.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Summary:

      In this work, Zemlianski and colleagues exploit S. pombe mutations responsible for catastrophic mitoses, in particular those leading to a cut / cut-like phenotypes, whereby cytokinesis takes place without proper DNA segregation, trapping DNA molecules by septum formation in between the two separating cells. The work builds on the team's previous observation that these defects can be alleviated when cells are grown in a nitrogen-rich medium, and motivate their efforts to understand this better. The manuscript is written in a concise, neat and informative manner, and the results are presented clearly, with consistence in the format and the style all along. The analyses appear to have been, in general, conducted under the best standards. The findings are important and the data are of good quality. I have, however, important concerns that will be detailed below, and which, as I hope will be made clear, question the pertinence of including "TOR signaling" in the title, and making a distinction between "good" and "poor" nitrogen sources in the abstract.

      Major comments:

      Results

      The conclusion that the phenotype is suppressed by "good" but not "poor" nitrogen sources is not sufficiently supported. First, this interpretation is based on comparing only two or three sources of each type; Second, the "good" source glutamate needed to be raised for it to have a significant effect; 3) there is a strange datum, as Glu 100 mM in Graph 1D looks exactly the same as Glu 50 mM in Graph 1E, I guess there is a mistake in the plotting; 4) and, more important, the fact that the authors had the nice initiative of reproducing their YES medium experiments for every graph led to the inevitable fact that slightly different values were obtained every time, which is normal. While the values yield very similar data for panels 1B, 1C and even 1D, the frequency of catastrophic mitoses for the cbf11 mutant in YES in panel 1E is much lower than in panel Figure 1B, for example. This has the consequence of making the suppression obtained when adding 'poor' sources, such as proline or uracil, non-significantly different. Thus, the authors conclude that 'poor' nitrogen sources are not good at suppressing the phenotype. I suggest that the authors pool all their YES data (they will have 12 repeats of their experiment) and plot, in a single graph, all the other treatments. By performing the analyses again, using the appropriate statistical test for that, perhaps they will have a surprise. After which, the question is, is it so important to put the emphasis on whether the source is good or poor? The incontestable observation is that, in general, there is clear trend of suppression of the phenotype.

      In Figure 2, images should be shown as an example of what was seen, what was quantified, how the "decrease in nuclear cross-section area" looked like indeed.

      Also, important for Figure 2, the authors used the nuclear cross-section area as a readout for nuclear envelope expansion versus shrinkage. For that, they did not use a fluorescent marker for the nuclear envelope that is continuous, but a nucleoporin (Cut11-GFP). In my experience, nucleoporins being discontinuously distributed throughout the nuclear envelope, the area encompassed by the signal may be underestimated in the event of a strong nuclear envelope deformation, as I have tried to illustrate in the scheme below: I WILL SEND THE SCHEME BY MAIL TO THE EDITOR, AS I CANNOT COPY-PASTE IT IN THE SYSTEM BOX Given that the photos from which the data were retrieved have not been shown, I cannot at present judge whether the use of a nuclear envelope marker providing continuous signals is absolutely necessary or not, and whether this consideration will affect (or not at all) the conclusions.

      The authors do not seem to comment or pay any attention to a very crucial result they obtain: the addition of ammonium to the WT strain has the effect of also restricting the nuclear cross-section area. They indeed say in their text "we did not observe any differences between cultures grown with or without ammonium supplementation (Fig.2)". I guess they refer here to the cbf11 mutant, in which case the sentence is true (although unfair to the WT). But by neglecting that the supplementation with ammonium had the power of reducing the cross-section area of WT nuclei, they are misled (or misleading) in their interpretation. The same, although milder, is true for Figure 5C, where the addition of ammonium to the WT culture does not alter the median value of prophase + metaphase duration, however has the virtue of very much rendering sharp (less scattered) the population of values, suggesting that the accuracy / control of the process is enhanced. What does this mean? I think it should be carefully thought about and considered as a whole.

      In the same line as above, the authors omit the RNA-seq analysis concerning the treatment of the WT with ammonium (Figure 3). This is very important to understand the standpoint of what this treatment elicits. It would also help unravel the observations I mentioned above that the authors did not assess in their descriptions. Also regarding Figure 3, it is completely obscure why the authors decided to show the genes on the right axis, and not others. Knowing how vast the lipid pathways are, there are likely many other hits that could be relevant. A particular thought goes for the proteins in charge of filling lipid droplets, such as sterol- and fatty acid-esterifying enzymes. Unless a very justified reason is provided, the choice at present seems arbitrary and it would be better to show a more unbiased data representation.

      In the same vein, related to the effect of ammonium onto the WT, in Figure S1 (I want to congratulate the authors for showing their 3 experimental replicates), the results very neatly show that ammonium supplementation to the WT leads to a neat and reproducible increase in TAG, a fact on which the authors do not comment. In the mutant, irrespective of ammonium presence or absence, a huge increase in squalene and steryl esters (SE) are seen. I think the work would benefit from actually quantifying the intensity of these bands and thus materializing this in the form of values. TAG, squalene and SE are all neutral lipids, and are all stored within LD to prevent lipotoxicity if accumulated in the endoplasmic reticulum. While ammonium elicits strong TAG accumulation in the WT, this is not the case in the mutant, likely because the massive occupation of LD storage capacity is overwhelmed with squalene and SE. Could this have something to do with the suppression they are studying?

      In the section of results where the authors comment the TLC analysis, they write "suggesting failed coordination between sterol and TAG lipid metabolism pathway". As it stands, the sentence is rather devoid of real meaning and may be even misleading, when considering what I wrote before.

      My biggest concern has to do with the very last part, when they explore the implications of TOR:

      • First, all the data presented in the two concerned panels of Figure 7 (B and C) and of Figure S3 lack the values obtained for the single mutants with which cbf11 was combined. This is not acceptable from a genetic point of view, and may prevent us from having important information. For example: if the authors were right that Tor2/TORC1 is ensuring successful progression through closed mitosis (last sentence of results), then one would predict that the tor2-S allele leads to an increase, already per se, of the frequency of catastrophic mitoses. However, at present, I cannot check that.
      • the authors turn to use a ∆ssp2 mutant to "increase Tor2 activity". However, this is a pleiotropic strategy, as AMP-kinase is the major sensor and responder to energy depletion, frequently triggered by glucose shortage, thus I am not sure the effects associated to its absence can be unequivocally be ascribed to a Tor2 raise.
      • there is a counterintuitive observation: rapamycin, which mimics nitrogen shortage, has the same effect than ammonium supplementation. This is strangely bypassed in the discussion, where the authors wrote "we showed increased mitotic fidelity in cbf11 cells when the stress-response branch of the TOR network was suppressed, either by ablation of Tor1/TORC2 or by boosting the activity of the pro-growth Tor2/TORC1 branch. These data are in agreement with previous findings that Tor2/TORC1 inhibition mimics nitrogen starvation".
      • last, and irrespective of what was said above, the authors conclude that the phenotype suppression is due to "a role for Tor2/TORC1 in ensuring successful progression through mitosis". If, as stated by the authors, Tor1/TORC2 absence not only abrogates Tor1/TORC2 activity, but it simultaneously raises Tor2/TORC1 activity, and if reciprocally Tor2/TORC1 increased activity concurs with Tor1/TORC2 attenuation, it cannot therefore be discerned if the suppression is due to Tor2/TORC1 raise or to Tor1/TORC2 dampening.

      Discussion

      The authors invoke that TOR controls lipin, despite what they go on to dismiss the link between TOR and lipids by saying "we did not observe any major changes in phospholipid composition when cells were grown in ammonium-supplemented YES medium compared to plain YES (Figure S2)", with this reinforcing their conclusion that ammonium does not suppress lipid-related cut mutants through directly correcting lipid metabolism defects. While I agree with that reasoning, I invoke again that they nevertheless neglected the clear change observed in their three replicates (Figure S2) that ammonium addition to WT cells strongly increases the amount of TAG (esterified fatty acids). Since lipin activity promotes DAG formation, which then leads to TAG accumulation, this aspect should not be neglected.

      The emphasis on TOR, which expands several paragraphs of the Discussion, should be revisited if the evidence provided for this part of the data is not reinforced.

      To finish, if I may provide some personal thoughts that may be useful for the authors, I would first remind that TAG storage prevents the channeling of phosphatidic acid towards novel phospholipid synthesis thus antagonizes NE expansion, which agrees with their neglected observation for the WT in Figure 2A. The antagonization of NE expansion can be achieved through autophagy (DOI 10.1038/s41467-023-39172-3; DOI 10.1177/25152564231157706), and indeed rapamycin addition (a very potent inducer of autophagy) also suppressed the cut phenotype (Figure 7A). What is more, in S. cerevisiae, autophagy has been shown as important to transition through mitosis conveniently and to prevent mitotic aberrations (DOI 10.1371/journal.pgen.1003245), and to impose a "genome instability" intolerance threshold by restricting NE expansion (DOI 10.1177/25152564231157706). In the first mentioned work, the authors proposed that autophagy may help raising aminoacid levels, which could assist cell cycle progression. This would have the virtue of reconciling the otherwise counterintuitive observation of the authors that rapamycin, which mimics nitrogen shortage, has the same effect than ammonium supplementation. It could be that ammonium supplementation mimics the downstream signal of a complex cascade initiated by actual aminoacid shortage, known to elicit autophagy-like processes (thus explaining why TAG raise, why the NE does not expand), and may culminate with launching a program for more accurate mitosis and genome segregation. In further support, TORC1 inhibition (as elicited by +rapamycin) is a central node that integrates multiple cues, not only nitrogen availability, but also carbon shortage (DOI 10.1016/j.molcel.2017.05.027), and even genetic instability cues (DOI 10.1016/j.celrep.2014.08.053), perhaps helping unravel why ammonium (via TOR) suppresses very diverse cut mutants, irrespective of whether they stem from lipid or chromatid cohesion deficiencies. These previous works should be considered by the authors.

      Minor

      There was no speculation about why the suppressions are partial.

      Reference 15, cited in the text, is absent from the references list.

      An explanation of which statistical tests were chosen and why they were chosen would be necessary.

      In particular, for the analyses performed for Figure 5, one-way ANOVA should be applied instead of several t-tests.

      A small section in M&M about how data in general was acquired, quantified, plotted and analyzed would be appropriate.

      In the discussion, the sentence "this could mean that the signaling of availability of a good nitrogen source is by itself more important for mitotic fidelity than the actual physical presence of the nutrients" is a rather void sentence. Because, from the point of view of how a cell "works", the signal is important for the basic reason that it is supposed to represent the actual real cue eliciting it.

      In the second part of Results, when the phenotype of cbf11 mutants concerning LD is mentioned, the authors said "aberrant LD content". It would be good if they can mention already at this stage which type of aberration this was: more LD? less LD? bigger? smaller?

      What is the difference between the two SE bands in Figure S2? What exactly does SE-1 and SE-2 mean?

      In Figure 2, the two graphs, presented side by side, would be more easily comparable if they could be plotted with the same y-axis scale.

      In Figure 1A, it would be useful for non-specialists of this phenotype and non-pombe readers to show a control of how it looks to be "normal".

      Referees cross-commenting

      Overall, there is a striking consensus on the need to either address experimentally or remove the emphasis put on the TOR/mitotic fidelity connection, and of clarifying the counter-intuitive notions associated to the results obtained with rapamycin. Also, the need for revisiting / improving / justifying the means by which nuclear envelope deformation is assessed has been raised at least twice. I therefore guess that the common guidelines for improving this manuscript are clearly established.

      Significance

      In view of all of the above, my feeling is that the authors have put the accent on the TOR message, which is weak, while they have less put the accent on very strong and elegant findings they do: The authors discover that the suppression of cut(-like) mutant phenotype by addition of NH4 is not due to a correction in lipid metabolism defects, suggesting that the effect is indirect. In support, cut-like mutants whose molecular defect stems from lipid-unrelated defects are also suppressed by ammonium addition. What is more, the authors refine the type of cut-like mutants susceptible of being "corrected" by ammonium addition, finding a "novel definition of cuts" that invoke a temporal rule. This important observation has relevant implications:

      • the long-standing interpretation (commented by the authors) that lipid-related cut mutants are defective because of insufficient synthesis of lipids to be able to grow their nuclear envelope membranes seems now inappropriate in light of their data;
      • this has the immediate implication that perhaps the importance of nitrogen supplementation for accurate mitosis is no longer a fact that may apply only to (yeast) organisms performing closed mitosis, which may broaden the implications of their finding substantially;
      • the nature of the temporal ruler they discover that makes defects appearing early susceptible of being suppressed by nitrogen supplementation deserves analysis in further works, thus opening an immediate perspective.
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study utilizes a virus-mediated short hairpin RNA (shRNA) approach to investigate in a novel way the role of the wild-type PHOX2B transcription factor in critical chemosensory neurons in the brainstem retrotrapezoid nucleus (RTN) region for maintaining normal CO2 chemoreflex control of breathing in adult rats. The solid results presented show blunted ventilation during elevated inhaled CO2 (hypercapnia) with knockdown of PHOX2B, accompanied by a reduction in expression of Gpr4 and Task2 mRNA for the proposed RTN neuron proton sensor proteins GPR4 and TASK2. These results suggest that maintained expression of wild-type PHOX2B affects respiratory control in adult animals, which complements previous studies showing that PHOX2B-expressing RTN neurons may be critical for chemosensory control throughout the lifespan and with implications for neurological disorders involving the RTN. When some methodological, data interpretation, and prior literature reference issues further highlighting novelty are adequately addressed, this study will be of interest to neuroscientists studying respiratory neurobiology as well as the neurodevelopmental control of motor behavior.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This important study investigated the role of the PHOX2B transcription factor in neurons in the key brainstem chemosensory structure, the retrotrapezoid nucleus (RTN), for maintaining proper CO2 chemoreflex responses of breathing in the adult rat in vivo. PHOX2B has an important transcriptional role in neuronal survival and/or function, and mutations of PHOX2B severely impair the development and function of the autonomic nervous system and RTN, resulting in the developmental genetic disease congenital central hypoventilation syndrome (CCHS) in neonates, where the RTN may not form and is functionally impaired. The function of the wild-type PHOX2B protein in adult RTN neurons that continue to express PHOX2B is not fully understood. By utilizing a viral PHOX2B-shRNA approach for knockdown of PHOX2B specifically in RTN neurons, the authors' solid results show impaired ventilatory responses to elevated inspired CO2, measured by whole-body plethysmography in freely behaving adult rats, that develop progressively over a four-week period in vivo, indicating effects on RTN neuron transcriptional activity and associated blunting of the CO2 ventilatory response. The RTN neuronal mRNA expression data presented suggests the impaired hypercapnic ventilatory response is possibly due to the decreased expression of key proton sensors in the RTN. This study will be of interest to neuroscientists studying respiratory neurobiology as well as the neurodevelopmental control of motor behavior.

      Strengths:

      (1) The authors used a shRNA viral approach to progressively knock down the PHOX2B protein, specifically in RTN neurons to determine whether PHOX2B is necessary for the survival and/or chemosensory function of adult RTN neurons in vivo.

      (2) To determine the extent of PHOX2B knockdown in RTN neurons, the authors combined RNAScope® and immunohistochemistry assays to quantify the subpopulation of RTN neurons expressing PHOX2B and neuromedin B (Nmb), which has been proposed to be key chemosensory neurons in the RTN.

      (3) The authors demonstrate that knockdown efficiency is time-dependent, with a progressive decrease in the number of Nmb-expressing RTN neurons that co-express PHOX2B over a four-week period.

      (4) Their results convincingly show hypoventilation particularly in 7.2% CO2 only for PHOX2B-shRNA RTN-injected rats after four weeks as compared to naïve and non-PHOX2B-shRNA targeted (NT-shRNA) RTN injected rats, suggesting a specific impairment of chemosensitive properties in RTN neurons with PHOX2B knockdown.

      (5) Analysis of the association between PHOX2B knockdown in RTN neurons and the attenuation of the hypercapnic ventilatory response (HCVR), by evaluating the correlation between the number of Nmb+/PHOX2B+ or Nmb+/PHOX2B- cells in the RTN and the resulting HCVR, showed a significant correlation between HCVR and number of Nmb+/PHOX2B+ and Nmb+/PHOX2B- cells, suggesting that the number of PHOX2B-expressing cells in the RTN is a predictor of the chemoreflex response and the reduction of PHOX2B protein impairs the CO2-chemoreflex.

      (6) The data presented indicate that PHOX2B knockdown not only causes a reduction in the HCVR but also a reduction in the expression of Gpr4 and Task2 mRNAs, suggesting that PHOX2B knockdown affects RTN neurons transcriptional activity and decreases the CO2 response, possibly by reducing the expression of key proton sensors in the RTN.

      (7) Results of this study show that independent of the role of PHOX2B during development, PHOX2B is still required to maintain proper CO2 chemoreflex responses in the adult brain, and its reduction in CCHS may contribute to the respiratory impairment in this disorder.

      Weaknesses:

      (1) The authors found a significant decrease in the total number of Nmb+ RTN neurons (i.e., Nmb+/PHOX2B+ plus Nmb+/ PHOX2B-) in NT-shRNA rats at two weeks post viral injection, and also at the four-week period where the impairment of the chemosensory function of the RTN became significant, suggesting some inherent cell death possibly due to off-target toxic effects associated with shRNA procedures that may affect the experimental results.

      (2) The tissue sampling procedures for quantifying numbers of cells expressing proteins/mRNAs throughout the extended RTN region bilaterally have not been completely validated to accurately represent the full expression patterns in the RTN under experimental conditions.

      (3) The inferences about RTN neuronal expression of NMB, GPR4, or TASK2 are based on changes in mRNA levels, so it remains speculation that the observed reduction in Gpr4 and Task2 mRNA translates to a reduction in the protein levels and associated reduction of RTN neuronal chemosensitive properties.

      Thank you for sharing the excitement for our study showing novel findings on the contribution of PHOX2B to the chemoreflex response and activity of adult RTN neurons. We believe that reporting the results on cell death following shRNA viral injections, potentially due to some off-target effects, are important to share with the scientific community to help plan experiments of similar kind in various fields of neuroscience.

      Thanks for pointing out your concerns about cell quantification, we have edited the methods and results section to add clarity about our analytical procedure.

      As we discussed in the manuscript, we were only able to assess mRNA levels of Nmb, Gpr4, Task2 as current available antibodies for the 3 targets are still unreliable. Future studies will benefit from the analysis of changes at protein levels and possibly electrophysiological recordings to verify that chemosensitive properties of RTN neurons are impaired due to reduction of PHOX2B expression. We discuss these limitations in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a short hairpin RNA technique strategy to elucidate the functional activity of neurons in the retrotrapezoid nucleus (RTN), a critical brainstem region for central chemoreception. Dysfunction in this area is associated with the neuropathology of congenital central hypoventilation syndrome (CCHS). The subsequent examination of these rats aimed to shed light on the intricate aspects of RTN and its implications for central chemoreception and disorders like CCHS in adults. They found that using the short hairpin RNA (shRNA) targeting Phox2b mRNA, a reduction of Phox2b expression was observed in Nmb neurons. In addition, Phox2b knockdown did not affect breathing in room air or under hypoxia, but the hypercapnia ventilatory response was significantly impaired. They concluded that Phox2b in the adult brain has an important role in CO2 chemoreception. They thought that their findings provided new evidence for mechanisms related to CCHS neuropathology. The conclusions of this paper are well supported by data, but careful discussion seems to be required for comparison with the results of various previous studies performed by different genetic strategies for the RTN neurons.

      Strengths:

      The most exciting aspect of this work is the modelling of the Phox2b knockdown in one element of the central neuronal circuit mediating respiratory reflexes, that is in the RTN. To date, mutations in the PHOX2B gene are commonly associated with most patients diagnosed with CCHS, a disease characterized by hypoventilation and absence of chemoreflexes, in the neonatal period, which in severe cases can lead to respiratory arrest during sleep. In the present study, the authors demonstrated that the role of Phox2b extends beyond the developmental period, and its reduction in CCHS may contribute to the respiratory impairment observed in this disorder.

      Weaknesses:

      Whereas the most exciting part of this work is the knockdown of the Phox2b in the RTN in adult rodents, the weakness of this study is the lack of a clear physiological, developmental, and anatomical distinction between this approach and similar studies already reported elsewhere (Ruffault et al., 2015, DOI: 10.7554/eLife.07051; Ramanantsoa et al., 2011, DOI: 10.1523/JNEUROSCI.1721-11.2011; Huang et al., 2017, DOI: 10.1016/j.neuron.2012.06.027; Hernandez-Miranda et al., 2018, DOI: 10.1073/pnas.1813520115; Ferreira et al., 2022 DOI: 10.7554/eLife.73130; Takakura et al., 2008 DOI: 10.1113/jphysiol.2008.153163; Basting et al., 2015 DOI: 10.1523/JNEUROSCI.2923-14.2015; Marina et al., 2010 DOI: 10.1523/JNEUROSCI.3141-10.2010). In addition, several conclusions presented in this work are not directly supported by the provided data.

      Thanks for the feedback on or manuscript. We have further highlighted in our discussion the previous developmental work aimed at determining the role of PHOX2B in embryonic development. Our study was triggered by the fascinating observations that despite its important role in development of the central and peripheral nervous system, PHOX2B is still present in the adult brain and its function in adult neurons is unknown, thus we aimed to investigate its role in the adult RTN by knocking down its expression with a shRNA approach. Therefore, in our model knockdown of PHOX2B does not affect development of the RTN. Previous studies (mentioned by the reviewer, as well as cited in the manuscript) have focused on investigating 1) the role of PHOX2B in the developmental period, 2) the physiological changes associated with the transgenic expression of mutant forms of PHOX2B in relation to CCHS, 3) the killing or the acute silencing/excitation of neuronal activity of PHOX2B+ RTN neurons. Our study had a different aim: to test whether the transcription factor PHOX2B had a physiologically relevant role in adult RTN neurons. In this experimental approach PHOX2B is not altered throughout embryonic or postnatal development. By knocking down PHOX2B in the Nmb+ cells of the RTN our results show a reduction in chemoreflex response and mRNA expression of protein sensors. Hence, we conclude that PHOX2B alters the function of Nmb+ RTN neurons, possibly through transcriptional changes including the reduction in Gpr4 and Task2 mRNA expression.

      Reviewer #3 (Public Review):

      A brain region called the retrotrapezoid nucleus (RTN) regulates breathing in response to changes in CO2/H+, a process termed central chemoreception. A transcription factor called PHOX2B is important for RTN development and mutations in the PHOX2B gene result in a severe type of sleep apnea called Congenital Central Hypoventilation Syndrome. PHOX2B is also expressed throughout life, but its postmitotic functions remain unknown. This study shows that knockdown of PHOX2B in the RTN region in adult rats decreased expression of Task2 and Gpr4 in Nmb-expressing RTN chemoreceptors and this corresponded with a diminished ventilatory response to CO2 but did not impact baseline breathing or the hypoxic ventilatory response. These results provide novel insight regarding the postmitotic functions of PHOX2B in RTN neurons.

      Main issues:

      (1) The experimental approach was not targeted to Nmb+ neurons and since other cells in the area also express Phox2b, conclusions should be tempered to focus on Phox2b expressing parafacial neurons NOT specifically RTN neurons.

      (2) It is not clear whether PHOX2B is important for the transcription of pH sensing machinery, cell health, or both. If knockdown of PHOX2B knockdown results in loss of RTN neurons this is also expected to decrease Task2 and Gpr4 levels, albeit by a transcription-independent mechanism.

      Although we did not specifically target Nmb+ neurons, we performed viral injections within the area where neurons expressing PHOX2B and Nmb are localized (i.e., the RTN region). We carefully quantified the impact of PHOX2B knockdown on Nmb expressing neurons, as well as the effects on the adjacent TH expressing C1 population and FN neurons (figure 5). As reported in the results section, significant changes in the numbers of PHOX2B expressing neurons was only observed at the site of injection in PHOX2B+/Nmb+ neurons. We did not observe changes in the total number of C1 cells (TH+/PHOX2B+), in the number of TH cells coexpressing PHOX2B, or in the hypoxic ventilatory response (which is dependent on the health status of C1 neuron). We have updated figure 5 to show representative expression of PHOX2B in TH+ neurons in the ventral medulla to complement our cell count analysis. To address potential effects on other cell populations we have edited our discussion as follows:

      “PHOX2B knockdown was also restricted to RTN neurons, as adjacent C1 TH+ neurons did not show any change in number of TH+/PHOX2B+ expressing cells, although we cannot exclude that some C1 cells may have been infected and their relative PHOX2B expression levels were reduced. To support the lack of significant alterations associated with the possible loss of C1 function was the absence of significant changes in the hypoxic response that has been shown to be dependent on C1 neurons (Malheiros-Lima et al., 2017).”

      Where appropriate, we have substituted “RTN” with “Nmb expressing neurons of the RTN” throughout the manuscript.

      We have clarified in the methods and results section how we quantified Task2 and Gpr4 mRNA expression. The quantification was performed on a pool of single cells (200-250/rat) expressing Nmb. Hence, the overall reduction is not a result of general fluorescence loss in the RTN region, but specifically assessed in single cells expressing Nmb. This is therefore independent of the reduction of the total number of Nmb cells.

      We propose that cell death is not a direct effect of PHOX2B knockdown, but rather it is associated with the injection of the viral constructs that have been already reported to promote some off-target effects (as reported in the manuscript). While modest cell death is observed only in the first two weeks post-infection, it does not increase further between 2 and 4 weeks post infection, when the reduction in PHOX2B (not associated with a further reduction in Nmb+ cells, hence no further cell death in RTN) is evident and the respiratory chemoreflex is impaired. These results suggest that 1) reduction of PHOX2B is not responsible for cell death; 2) it is the reduction of PHOX2B levels that promotes chemoreflex impairment. Given the observation that Nmb cells with no detectable PHOX2B protein show reduced expression of Task2 and Gpr4 mRNA, we propose that one of the possible mechanisms of chemoreflex impairment in PHOX2B shRNA rats is the reduction of Task2 and Gpr4. In the discussion we also suggest possible additional mechanisms that can be investigated in further studies.

      Recommendations for the authors:.

      In revising this manuscript, the authors should carefully address the issues raised by the reviewers to substantially improve the manuscript and solidify the reviewers' general assessment of the potential importance of this work.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) The cell counts for Nmb+/PHOX2B+ and Nmb+/PHOX2B- RTN neurons are a critical component of the study, and it is unclear how the tissue sampling procedures (eight sections per animal) for quantifying numbers of cells expressing proteins/mRNAs throughout the extended RTN region bilaterally has been validated to accurately represent the full expression patterns in the RTN under the experimental conditions. It is possible that the sampling/quantification procedures used may be adequate, but validation is important. Also, quantification of the CTCF signal for Nmb, Gpr4, and Task2 mRNA is an important component of this study, but only four sections/rats were used.

      Thank you for pointing out your concern on our quantification method. We have clarified in the methods section the procedure for cell counting and quantification of the CTCF signal. We have sampled the area of the RTN in order to identify Nmb cells of RTN.

      We have edited the methods section as follows:

      “To quantify Nmb+/PHOX2B- and Nmb+/PHOX2B+ neurons within the RTN region, we analysed one in every seven sections (210 µm interval; 8 sections/rat in total) along the rostrocaudal distribution of the RTN on the ventral surface of the brainstem and compared total bilateral cell counts of PHOX2B-shRNA rats with non-target control (NT-shRNA) and naïve rats. Cells that expressed Nmb and Phox2b mRNAs but did not show co-localization with PHOX2B protein were considered Nmb+/PHOX2B-.

      The Corrected Total Cell Fluorescence (CTCF) signal for Nmb, Gpr4 and Task2 mRNAs was quantified as previously described (Cardani et al., 2022; McCloy et al., 2014). Briefly, a Leica TCS SP5 (B-120G) Laser Scanning Confocal microscope was used to acquire images of the tissue. Exposure time and acquisition parameters were set for the naïve group and kept unchanged for the entire dataset acquisition. The collected images were then analysed by selecting a single cell at a time and measuring the area, integrated density and mean grey value (McCloy et al., 2014). For each image, three background areas were used to normalize against autofluorescence. We used 4 sections/rat (210 µm interval) to count Nmb, Gpr4 and Task2 mRNA CTCF in the core of the RTN area where several Nmb cells could be identified. For each section two images were acquired with a 20× objective, so that at least fifty cells per tissue sample were obtained for the mRNA quantification analysis. To evaluate changes in Nmb mRNA expression levels following PHOX2B knockdown at the level of the RTN, we compared, the fluorescence intensity of each RTN Nmb+ cell (223.2 ± 37.1 cells/animal) with the average fluorescent signal of Nmb+ cells located dorsally in the NTS (4.3 ± 1.2 cells/animal) (Nmb CTCF ratio RTN/NTS) as we reasoned that the latter would not be affected by the shRNA infection and knockdown.

      To quantify Gpr4 and Task2 mRNA expression in Nmb cells of the RTN, we first quantified single cell CTCF for either Gpr4 (200.7 ± 13.2 cells/animal) or Task2 (169.6 ± 10.3 cells/animal) mRNA in Nmb+ RTN neurons in the 3 experimental groups (naïve, NT shRNA and PHOX2B shRNA) independent of their PHOX2B expression. We then compared CTCF values of Gpr4 and Task2 mRNA between Nmb+/PHOX2B+ and Nmb+/PHOX2B- RTN neurons in PHOX2B-shRNA rats to address changes in their mRNA expression induced by PHOX2B knockdown.

      (2) Furthermore, to evaluate changes in Nmb mRNA expression following PHOX2B knockdown at the level of the RTN, it is stated in Materials and Methods "we compared, on the same tissue section, the fluorescence intensity of RTN Nmb+ cells with the signal of Nmb+ cells in the NTS (Nmb CTCF ratio RTN/NTS)". How this was accomplished is unclear, considering the non-overlapping locations of the RTN and rostral NTS. Providing images would be helpful.

      The first sections containing Nmb cells in the ventral medulla also express few Nmb cells in the dorsal medulla. We used those cells as reference for fluorescence levels since they would not be affected by the viral infection. Similar cells are also present in the brains of mice and reported in the Allen Brain atlas (https://mouse.brain-map.org/experiment/show/71836874). We have clarified our procedure in the methods section (see above) and included a sample image of Nmb in both ventral and dorsal regions in Figure 5.

      (3) The staining for tyrosine hydroxylase (TH) to identify and quantify C1 cells (TH+/PHOX2B+) following shRNA injection provides important information, and it would be useful to show images of histological examples to accompany Fig. 5A.

      We included in figure 5A a sample image of C1 neurons used for our TH quantification.

      Minor:

      (1) Provide animal ns in the text of the Results section for the four weeks of PHOX2B knockdown.

      They have been included.

      (2) Please state in the legends for Figures 2 & 3, which images are superimposition images.

      We have in the figure information about merged images.

      Reviewer #2 (Recommendations For The Authors):

      This manuscript by Cardani and colleagues attempts to address whether a reduction in Phox2b expression in chemosensitive neuromedin-B (NMB)-expressing neurons in the RTN alters respiratory function. The authors used a short hairpin RNA technique to silence RTN chemosensor neurons. The present study is very interesting, but there are several major concerns that need to be addressed, including the main hypothesis.

      Major

      (1) Page 6, lines 119-121: I did not grasp the mechanistic property described by the authors in this passage, nor did I understand the experiments they conducted to establish a mechanistic link between Phox2b and the chemosensitive property. Could the authors provide further clarification on these points?

      We believe the reviewer refers to this paragraph: “In order to have a better understanding of the role of PHOX2B in the CO2 homeostatic processes we used a non-replicating lentivirus vector of two short-hairpin RNA (shRNA) clones targeting selectively Phox2b mRNA to knockdown the expression of PHOX2B in the RTN of adult rats and tested ventilation and chemoreflex responses. In parallel, we also determined whether knockdown of PHOX2B in adult RTN neurons negatively affected cell survival. Finally, we sought to provide a mechanistic link between PHOX2B expression and the chemosensitive properties of RTN neurons, which have been attributed to two proton sensors, the proton-activated G protein-coupled receptor (GPR4) and the proton-modulated potassium channel (TASK-2).”

      The rationale for running these experiments is based on the fact that it is well known in the literature that PHOX2B is an important transcription factor for the development of several neuronal populations. PHOX2B Knockout mice die before birth and heterozygous mice have some anatomical defects, but respiration is only impaired in the early post-natal period. While many developmental transcription factors are generally downregulated in the post-natal period, PHOX2B is still expressed in some neurons into adulthood. What is the function of PHOX2B in these fully developed neurons? We do not know as we do not yet know the entire set of target genes that PHOX2B regulates in the adult brain. Hence we decided to test what would happen if we knocked down the PHOX2B protein in the Nmb neurons of the RTN, an area that is critical for central chemoreception and involved in the presentation of CCHS. Our results show that reduction of PHOX2B blunts the CO2 chemoreflex response and reduces mRNA expression of Task2 and Gpr4, two pH sensors that have been shown to be key for RTN chemosensitive properties. We also show that the Nmb mRNA and cell survival are not affected by PHOX2B knockdown and we propose that the reduced CO2 chemoreflex may be attributed to a reduction of chemosensory function of Nmb neurons of the RTN due to partial loss of Gpr4 and Task2.

      (2) It is imperative for the authors to enhance the description of their hypothesis, as, from my perspective, the contribution of the data to the field is not clearly articulated. Numerous more selectively designed experiments were conducted to investigate the role of Phoxb-expressing neurons at the RTN level and their involvement in respiratory activity. In summary, the current study appears to lack novelty.

      We respectfully disagree with this statement. We believe we have adequately summarized previous work, although we realize we can’t reference every single publication on this topic. As described above, the developmental role of PHOX2B has been elegantly investigated in mouse embryonic studies (extensively cited in the manuscript). Furthermore, very interesting studies have shown that when the CCHS defining mutant PHOX2B protein (+7Ala PHOX2B) and other mutations linked to CCHS have been transgenically expressed in mice through development, severe anatomical defects are observed and respiratory function is impaired (extensively cited in the manuscript). We have also cited papers relevant to this study that describe the role of PHOX2B/Nmb RTN neurons and the pH protein sensors in the CO2 chemoreflex. If we missed some papers that the reviewer deems essential in the context of this study we will be happy to include them.

      We are not aware of other studies that have investigated the specific role of the PHOX2B protein in the adult RTN in the absence of confounding developmental pathogenesis (i.e. in an otherwise ‘healthy’ animal), and of no other studies that looked at the effects on the RTN proton sensors and Nmb expression following PHOX2B knockdown. Hence we believe that our results are novel and, in our opinion, very interesting.

      (3) On pages 13 and 14 (Results section), I am seeking clarity on the novelty of the findings. Doug Bayliss's prior work has already demonstrated the role of Gpr4 and Task2 on Phox2b neurons in regulating ventilation in conscious rodents.

      Bayliss’ group has elegantly demonstrated that Gpr4 and Task2 are the two proton sensors in the PHOX2B/Nmb neurons of the RTN that have a key role in chemoreception (cited in the manuscript). The novelty of our findings is that we show that a reduction in PHOX2B protein is associated with a reduction of mRNA levels of Gpr4 and Task2. This is a novel finding. Currently, we do not know what transcriptional activity PHOX2B has in adult RTN neurons (i.e., what gene targets PHOX2B has in this cell population and many others) and here we propose that Nmb is not a gene target of PHOX2B while Gpr4 and Task2 are.

      (4) The authors assert that the transcription factor Phox2b remains not fully understood. While I concur, the present study falls short of fully investigating the actual contribution of Phox2b to breathing regulation. In other words, the knockdown of Phox2b neurons did not add much to the knowledge of the field.

      We respectfully disagree with the reviewer. With the exception of very few target genes, the transcriptional role of PHOX2B beyond the embryonic development is poorly understood. No mechanistic connection has been made before between the transcriptional activity of PHOX2B with the expression of proton sensors in the RTN. Other groups have investigated the role of stimulating or depressing the neuronal activity of PHOX2B/NMB neurons in the RTN showing a key role of RTN on respiratory control, but these prior studies did not test whether changing the expression of the PHOX2B protein in these neurons had a role on respiratory control and the central chemoreflex. No other study has investigated the role of the PHOX2B protein within the RTN cells, with the exception of PHOX2B knockout mice or transgenic expression of the mutated PHOX2B that are relevant for CCHS. Again, these previous studies were done on a background of developmental impairment and to the best of our knowledge did not seek to show any association between PHOX2B expression and expression of Gpr4 or Task2.

      (5) I recommend removing the entire section entitled "The role of Phox2b in development and in the adult brain." The authors merely describe Phox2b expression without contextualizing it within the obtained data.

      Because reviewers raised the issue about not including important information about the role of PHOX2B in development and respiratory control we prefer to keep the section.

      (6) Are the authors aware of whether the shRNA in Phox2b/Nmb neurons truly induced cell death or solely depleted the expression of the transcription factor protein? Do the chemosensitive neurons persist?

      This is an excellent question that we tried to address with our study. As we report in figures 2 and 3, we propose that some cell death is occurring as an off-target effect within the first 2 weeks post-infection, likely due to off-target action of the shRNA approach and not dependent on the reduction of PHOX2B expression (discussed in the manuscript). This is further evidenced by our Fig.S1 data in which higher concentrations of shRNA led to more cell death, indicative of off-target effects. We do not believe it is a consequence of our surgical procedure as we do not see similar cell loss when injecting vehicle or other control solutions (unpublished work; Janes et al., 2024).

      During the first 2 weeks post-surgery the proportion of Nmb+/PHOX2B- cells does not change compared to control rats or non-target shRNA (knockdown is not yet visible at protein level). Four weeks post-injection, there is no further cell death (assessed by the total number of NMB cells), whereas the fraction of NMB cells that express PHOX2B is reduced (and the fraction of NMB not expressing PHOX2B is increased), suggesting that the reduction of PHOX2B protein in Nmb cells is not correlated with cell loss/survival whereas the impairment that we observe in terms of central chemoreception is possibly due to the progressive decrease of PHOX2B expression in these neurons.

      (7) In Figures 2 and 3, it is noteworthy that the authors observe peak expression at a very caudal level. In rats, the RTN initiates at the caudal end of the facial, approximately 11.6 mm, and should exhibit a rostral direction of about 2 mm.

      In our experience the Nmb cells on the ventral surface of the medulla peak in number around the caudal tip of the facial nucleus in adult SD rats (Janes et al., 2024). To add clarity to the figure we reported cell count distribution data in relation to the distance from caudal tip of the facial.

      Minor

      (1) I would like to suggest that the authors correct the recurring statement throughout the manuscript that Phox2b is essential only for the development of the autonomic nervous system. In my view, it also plays a crucial role in certain sensory and respiratory systems.

      We have addressed this in the manuscript.

      (2) Page 4, lines 59-60: Out of curiosity, do the data include information from different countries?

      This data refers to information from France and Japan. Currently it is estimated that there are 1000-2000 CCHS patients worldwide.

      (3) Page 7, lines 129-131: In my understanding, the sentence is quite clear; if we knock down the PHOX2B gene, we are expected to reduce or even eliminate the expression of Gpr4 or Task2. Am I right?

      This is what we propose from the results of this study. We would like to point out that the transcriptional activity of PHOX2B (i.e., what genes PHOX2B regulate) in adult neurons has not yet been fully investigated. With the exception of few target genes (e.g., TH, DBH) the transcriptional activity of PHOX2B in neurons is not yet known. Here we report novel findings that suggest that Gpr4 and Task2 are potential target genes of PHOX2B in RTN neurons.

      (4) The authors mentioned that NT-shRNA also impacts CO2 chemosensitivity. Could this effect be attributed to mechanical damage of the tissue resulting from the injection?

      Just to clarify, we observe some impairment in chemosensitivity when NT-shRNA was injected in “larger” (2x 200ul/side) volume. No impairment was observed in NT-shRNA when we injected smaller volumes (2x 100ul/side). Physical damage could be a possibility although in our experience (unpublished work; Janes et al, 2024, Acta Physiologica) injections of similar volume of solution performed by the same investigator in the same brain area and experimental settings did not produce a physical lesion associated with respiratory impairment. Hence we attribute the unexpected results with larger volumes to toxic effects associated with the shRNA viral constructs.

      (5) In the reference section, the authors should review and correct some entries. For instance, Janes, T. A., Cardani, S., Saini, J. K., & Pagliardini, S. (2024). Title: "Etonogestrel Promotes Respiratory Recovery in an In Vivo Rat Model of Central Chemoreflex Impairment." Running title: "Chemoreflex Recovery by Etonogestrel." Some references contain the journal, pages, and volume, while others lack this information entirely.

      We have updated references. Janes et al., 2024 has now been published in Acta Physiologica.

      (6) Why does the baseline have distribution points, whereas the other boxplots do not?

      We have clarified in the figure legend that, to be fair to the presentation of our results, the data points shown in some of the boxplot graphs do not refer to entire baseline data but only the ones that are outliers.

      In our Box-and Whisker-Plots, whiskers represent the 10th and 90th percentiles, showing the range of values for the middle 80% of the data. Individual data values that fall outside the 10th/90th percentile range are represented as single point (outliers).

      Reviewer #3 (Recommendations For The Authors):

      • What is the rationale behind dedicating the first paragraph of results to discussing an artifact?

      We think that it is important to report off target effects of shRNA viral constructs as concentration and volumes of viruses injected in various studies vary considerably and other investigators may attempt to use larger volumes of viruses to obtain more considerable or faster knockdown but would obtain erroneous conclusions if appropriate tests are not performed.

      Furthermore, because some readers could question whether we injected enough virus to knockdown the expression of PHOX2B, and may wonder if with a larger amount of virus we would increase knockdown efficiency, we wanted to show that, in our opinion, we used the maximum amount of virus to knockdown PHOX2B without causing toxic effects or physiological changes that are not dependent on PHOX2B knockdown.

      • All individual data points should be visible in floating bar graphs in Figures 1 and 4. For example, I don't see any dots for naïve animals in any of the panels in Figure 1.

      We have clarified in the figure legend that, to be fair to the presentation of our results, the data points shown in some of the boxplot graphs do not refer to entire baseline data but only the ones that are outliers.

      In our Box-and Whisker-Plots, whiskers represent the 10th and 90th percentiles, showing the range of values for the middle 80% of the data. Individual data values that fall outside the 10th/90th percentile range are represented as single point (outliers).

      • Please include specific F and T values along with DF.

      We have included a table with all the specific values in the supplementary section as Table 1.

      • The C1 and facial partly overlap with the RTN at this level of the medulla and these cells should appear as Phox2b+/Nmb- cells so it is not clear to me why these cells are not evident in the control tissue in Figures 2B and 3B. Also, some of the bregma levels shown in Figure 5A overlap with Figures 2-3 so again it is not clear to me how this non-cell type specific viral approach was targeted to Nmb cells but not nearby TH+ cells. Please clarify.

      In our experience, C1 TH cells are located slightly medial to the Nmb cells and they spread much more caudally than Nmb cells of the RTN. We focused our small volume injection in the core of the RTN to target Nmb cells but we also assessed PHOX2B knockdown in TH C1 cells by counting the PHOX2B/TH cells across treatment groups. Although we can’t exclude subtle changes in the C1 population, we did not observe changes in the total number of C1 cells (TH+/PHOX2B+), in the number of TH cells expressing PHOX2B, or in the hypoxic ventilatory response (which is dependent on the health status of C1 neuron). We have updated figure 5 to show representative expression of PHOX2B in TH+ neurons in the ventral medulla to complement our cell count analysis. To address potential effects on other cell populations we have edited our discussion as follows:

      “PHOX2B knockdown was also restricted to RTN neurons, as adjacent C1 TH+ neurons did not show any change in number of TH+/PHOX2B+ expressing cells, although we cannot exclude that some C1 cells may have been infected and their relative PHOX2B expression levels were reduced. To support the lack of significant alterations associated with the possible loss of C1 function was the absence of significant changes in the hypoxic response that has been shown to be dependent on C1 neurons (Malheiros-Lima et al., 2017).”

      • To confirm, Nmb is not expressed in the NTS, and this region was chosen as a background, right?

      In order to systematically analyze Nmb mRNA expression we decided to use measurement of fluorescence relative to Nmb neurons present in the dorsal brainstem. Here cells are sparse but we used them as reference fluorescence since they would not be affected by the ventral shRNA injection. Similar cells are also present in the brains of mice and reported by the Allen Brain atlas (https://mouse.brain-map.org/experiment/show/71836874). We have clarified our procedure in the methods section (see above) and included a sample image of Nmb in both ventral and dorsal in Figure 5.

      • How do you get a loss of Nmb+ neurons (Figs 2-3) with no change in Nmb fluorescence (Fig. 5B)? In the absence of representative images these results are not compelling and should be substantiated by more readily quantifiable approaches like qPCR.

      We have clarified in the methods and results section our analytical procedure to assess PHOX2B and Nmb expression. Figure 2 and 3 display the results of counting numbers of Nmb+ cells in the RTN. Figure 5B reports the average of total cell fluorescence measured inside Nmb+ cells, not an average fluorescence measurement of the area of the ventral medulla. Basically, our results show that we have less Nmb cells that express PHOX2B but the overall Nmb mRNA fluorescence (expression) in Nmb cells relative to Nmb fluorescence in cells of the dorsal brainstem is the same.

      We have edited the methods as follows:

      “The Corrected Total Cell Fluorescence (CTCF) signal for Nmb, Gpr4 and Task2 mRNAs was quantified as previously described (Cardani et al., 2022; McCloy et al., 2014). Briefly, a Leica TCS SP5 (B-120G) Laser Scanning Confocal microscope was used to acquire images of the tissue. Exposure time and acquisition parameters were set for the naïve group and kept unchanged for the entire dataset acquisition. The collected images were then analysed by selecting a single cell at a time and measuring the area, integrated density and mean grey value (McCloy et al., 2014). For each image, three background areas were used to normalize against autofluorescence. We used 4 sections/rat (210 µm interval) to count Nmb, Gpr4 and Task2 mRNA CTCF in the core of the RTN area where several Nmb cells could be identified. For each section two images were acquired with a 20× objective, so that at least fifty cells per tissue sample were obtained for the mRNA quantification analysis. To evaluate changes in Nmb mRNA expression levels following PHOX2B knockdown at the level of the RTN, we compared the fluorescence intensity of each RTN Nmb+ cell (223.2 ± 37.1 cells/animal) with the average fluorescent signal of Nmb+ cells located dorsally in the NTS ( 4.3 ± 1.2 cells/animal) (Nmb CTCF ratio RTN/NTS) as we reasoned that the latter would not be affected by the shRNA infection and knockdown. “

      A single cell qPCR analysis would be definitely ideal but a qPCR from dissected tissue would not help us determine whether within a cell there was a reduction in Nmb mRNA levels.

      • The boxed RTN region in these examples is all over the place. It the RTN should be consistently placed along the ventral surface under the facial and pprox.. equal distance from the trigeminal and pyramids.

      We have update the figures to consistently present the areas of interest where Nmb cells are located and images are taken.

      • Fluorescent in situ typically appears as discrete puncta so it is not clear to me why that is not the case here.

      Our images are taken at low magnification (20X) where it is difficult to distinguish the single mRNA molecules. However, is it possible to appreciate the differences between the grainy fluorescent signal in the in situ hybridization assay (RNAScope) and the smoother signal of protein detection in the immunofluorescence assay.

      • Can TUNEL staining be done to confirm loss of Nmb neurons is due to death and not re-localization?

      Does the reviewer mean “cell migration” with relocalization? We do not expect that this would occur in our experiments. Although TUNEL in the first week post-infection could be useful to determine cell death in our tissue, we do not expect a cell migration of neurons within the brain as our viral shRNA injections are performed in adult rats when developmental processes are already concluded.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Comment: The fact that there are Arid1a transcripts that escape the Cre system in the Arid1a KO mouse model might difficult the interpretation of the data. The phenotype of the Arid1a knockout is probably masked by the fact that many of the sequencing techniques used here are done on a heterogeneous population of knockout and wild type spermatocytes. In relation to this, I think that the use of the term "pachytene arrest" might be overstated, since this is not the phenotype truly observed. Knockout mice produce sperm, and probably litters, although a full description of the subfertility phenotype is lacking, along with identification of the stage at which cell death is happening by detection of apoptosis.

      Response: As the reviewer indicates, we did not observe a complete arrest at Pachynema. In fact, the histology shows the presence of spermatids and sperm in seminiferous tubules and epididymides (Fig. Sup. 3). However, our data argue that the wild-type haploid gametes produced were derived from spermatocyte precursors that have likely escaped Cre mediated activity (Fig. Sup. 4). Furthermore, diplotene and metaphase-I spermatocytes lacking ARID1A protein by IF were undetectable in the Arid1acKO testes (Fig. S4B). Therefore, although we do not demonstrate a strict pachytene arrest, it is reasonable to conclude that ARID1A is necessary to progress beyond pachynema. We have revised the manuscript to reflect this point (Abstract lines 17,18; Results lines 153,154)

      Comment: It is clear from this work that ARID1a is part of the protein network that contributes to silencing of the sex chromosomes. However, it is challenging to understand the timing of the role of ARID1a in the context of the well-known DDR pathways that have been described for MSCI.

      Response: With respect to the comment on the lack of clarity as to which stage of meiosis we observe cell death, our data do suggest that it is reasonable to conclude that mutant spermatocytes (ARID1A-) undergo cell death at pachynema given their inability to execute MSCI, which is a well-established phenotype.

      Comment: Staining of chromosome spreads with Arid1a antibody showed localization at the sex chromosomes by diplonema; however, analysis of gene expression in Arid1a KO was performed on pachytene spermatocytes. Therefore, is not very clear how the chromatin remodeling activity of Arid1a in diplonema is affecting gene expression of a previous stage. CUTnRUN showed that ARID1a is present at the sex chromatin in earlier stages, leading to hypothesize that immunofluorescence with ARID1a antibody might not reflect ARID1a real localization.

      Response: It is unclear what the reviewer means about not understanding how ARID1A activity at diplonema affects gene expression at earlier stages. Our interpretations were not based solely on the observation of ARID1A associations with the XY body at diplonema. In fact, mRNA expression and CUT&RUN analyses were performed on pachytene-enriched populations. ARID1A's association with the XY body is not exclusive to diplonema. Based on both CUT&RUN and IF data, ARID1A associates with XY chromatin as early as pachynema. Only at late diplonema did we observe ARID1A hyperaccumulation on the XY body by IF.

      Reviewer #2 (Public Review):

      Comment: The inefficient deletion of ARID1A in this mouse model does not allow any detailed analysis in a quantitative manner.

      Response: As explained in our response to these comments in the first revision, we respectfully disagree with this reviewer’s conclusions. We have been quantitative by co-staining for ARID1A, ensuring that we can score mutant pachytene spermatocytes from escapers. Additionally, we provide data to show the efficiency of ARID1A loss in the purified pachytene populations sampled in our genomic assays.

      Reviewer #3 (Public Review):

      Comment: The data demonstrate that the mutant cells fail to progress past pachytene, although it is unclear whether this specifically reflects pachytene arrest, as accumulation in other stages of Prophase also is suggested by the data in Table 1. The western blot showing ARID1A expression in WT vs. cKO spermatocytes (Fig. S2) is supportive of the cKO model but raises some questions. The blot shows many bands that are at lower intensity in the cKO, at MWs from 100-250kDa. The text and accompanying figure legend have limited information. Are the various bands with reduced expression different isoforms of ARID1A, or something else? What is the loading control 'NCL'? How was quantification done given the variation in signal across a large range of MWs?

      Response: The loading control is Nucleolin. With respect to the other bands in the range of 100-250 kDa, it is difficult to say whether they represent ARID1A isoforms. The Uniprot entry for Mouse ARID1A only indicates a large mol. wt sequence of ~242 kDa; therefore, the band corresponding to that size was quantified. There is no evidence to suggest that lower molecular weight isoforms may be translated. Although speculative, it is possible that the lower molecular weight bands represent proteolytic/proteasomal degradation products or products of antibody non-specificity. These points are addressed in the revised manuscript (Legend to Fig S2, lines 926-931). Blots were scanned on a LI-COR Odyssey CLx imager and viewed and quantified using Image Studio Version 5.2.5 (Methods, lines 640-642).

      Comment: An additional weakness relates to how the authors describe the relationship between ARID1A and DNA damage response (DDR) signaling. The authors don't see defects in a few DDR markers in ARID1A CKO cells (including a low-resolution assessment of ATR), suggesting that ARID1A may not be required for meiotic DDR signaling. However, as previously noted the data do not rule out the possibility that ARID1A is downstream of DDR signaling and the authors even indicate that "it is reasonable to hypothesize that DDR signaling might recruit BAF-A to the sex chromosomes (lines 509-510)." It therefore is difficult to understand why the authors continue to state that "...the mechanisms underlying ARID1A-mediated repression of the sex-linked transcription are mutually exclusive to DDR pathways regulating sex body formation" (p. 8) and that "BAF-A-mediated transcriptional repression of the sex chromosomes occurs independently of DDR signaling" (p. 16). The data provided do not justify these conclusions, as a role for DDR signaling upstream of ARID1A would mean that these mechanisms are not mutually exclusive or independent of one another.

      Response: The reviewer’s argument is reasonable, and we have made the recommended changes (Results, lines 212-215; Discussion, lines 499-500).

      Comment: A final comment relates to the impacts of ARID1A loss on DMC1 focus formation and the interesting observation of reduced sex chromosome association by DMC1. The authors additionally assess the related recombinase RAD51 and suggest that it is unaffected by ARID1A loss. However, only a single image of RAD51 staining in the cKO is provided (Fig. S11) and there are no associated quantitative data provided. The data are suggestive but it would be appropriate to add a qualifier to the conclusion regarding RAD51 in the discussion which states that "...loss of ARID1a decreases DMC1 foci on the XY chromosomes without affecting RAD51" given that the provided RAD51 data are not rigorous. In the long-term it also would be interesting to quantitatively examine DMC1 and RAD51 focus formation on autosomes as well.

      Response: We agree with the reviewer’s comment and have made the recommended changes (Discussion, lines 518-519).

      Response to non-public recommendations

      Reviewer 2:

      Comment: Meiotic arrest is usually judged based on testicular phenotypes. If mutant testes do not have any haploid spermatids, we can conclude that meiotic arrest is a phenotype. In this case, mutant testes have haploid spermatids and are fertile. The authors cannot conclude meiotic arrest. The mutant cells appear to undergo cell death in the pachytene stage, but the authors cannot say "meiotic arrest."

      Response: We disagree with this comment. By IF, we see that ~70% of the spermatocytes have deleted ARID1A. Furthermore, we never observed diplotene spermatocytes that lacked ARID1A. The conclusion that the absence of ARID1A results in a pachynema arrest and that the escapers produce the haploid spermatids is firm.

      Comment: Fig. S2 and S3 have wrong figure legends.

      Response: The figure legends for Fig. S2 and S3 are correct.

      Comment: The authors do not appear to evaluate independent mice for scoring (the result is about 74% deletion above, Table S1). Sup S2: how many independent mice did the authors examine?

      Response:These were Sta-Put purified fractions obtained from 14-15 WT and mutant mice. It is difficult to isolate pachytene spermatocytes by Sta-Put at the required purity in sufficient yields using one mouse at a time. We used three technical replicates to quantify the band intensity, and the error bars represent the standard error of the mean (S.E.M) of the band intensity.

      Comment: Comparison of cKO and wild-type littermate yielded nearly identical results (Avg total conc WT = 32.65 M/m; Avg total conc cKO = 32.06 M/ml)". This sounds like a negative result (i.e., no difference between WT and cKO).

      Response: This is correct. There is no difference between Arid1aWT and Arid1aCKO sperm production. This is because wild-type haploid gametes produced were derived from spermatocyte precursors that have escaped Cre-mediated activity (Fig. S4). These data merely serve to highlight an inherent caveat of our conditional knockout model and are not intended to support the main conclusion that ARID1A is necessary for pachytene progression.

      Comment: The authors now admit ~ 70 % efficiency in deletion, and the authors did not show the purity of these samples. If the purity of pachytene spermatocytes is ~ 80%, the real proportion of mutant cells can be ~ 56%. It is very difficult to interpret the data.

      Response: The original submission did refer to inefficient Cre-induced recombination. The reviewer asked for the % efficiency, which was provided in the revised version. Also, please refer to Fig. S2, where Western blot analysis demonstrates a significant loss of ARID1A protein levels in CKO relative to WT pachytene spermatocyte populations that were used for CUT&RUN data generation.

      Comment: The authors should not use the other study to justify their own data. The H3.3 ChIP-seq data in the NAR paper detected clear peaks on autosomes. However, in this study, as shown in Fig. S7A, the authors detected only 4 peaks on autosomes based on MACS2 peak calling. This must be a failed experiment. Also, S7A appears to have labeling errors.

      Response: I believe the reviewer is referring to supplementary figure 8A. Here, it is not clear which labeling errors the reviewer is referring to. In the wild type, the identified peaks were overwhelmingly sex-linked intergenic sites. This is consistent with the fact that H3.3 is hyper-accumulated on the sex chromosomes at pachynema.

      The authors of the NAR paper did not perform a peak-calling analysis using MACS2 or any other peak-calling algorithm. They merely compared the coverage of H3.3 relative to input. Therefore, it is not clear on what basis the reviewer says that the NAR paper identified autosomal peaks. Their H3.3 signal appears widely distributed over a 6 kb window centered at the TSS of autosomal genes, which, compared to input, appears enriched. Our data clearly demonstrates a less noisy and narrower window of H3.3 enrichment at autosomal TSSs in WT pachytene spermatocytes, albeit at levels lower than that seen in CKO pachytene spermatocytes (Fig S8B and see data copied below for each individual replicate). Moreover, the lack of peaks does not mean that there was an absence of H3.3 at these autosomal TSSs (Supp. Fig. S8B). Therefore, we disagree with the reviewer’s comment that the H3.3 CUT&RUN was a failed experiment.

      Author response image 1.

      H3.3 Occupancy at genes mis-regulated in the absence of ARID1A

      Comment: If the author wishes to study the function of ARID2 in spermatogenesis, they may need to try other cre-lines to have more robust phenotypes, and all analyses must be redone using a mouse model with efficient deletion of ARID2.

      Response: As noted, we chose Stra8-Cre to conditionally knockout Arid1a because ARID1A is haploinsufficient during embryonic development. The lack of Cre expression in the maternal germline allows for transmission of the floxed allele, allowing for the experiments to progress.

      Comment: The inefficient deletion of ARID1A in this mouse model does not allow any detailed analysis in a quantitative manner.

      Response: In many experiments, we have been quantitative when possible by co-staining for ARID1A, ensuring that we can score mutant pachytene spermatocytes from escapers. Additionally, we provide data to show the efficiency of ARID1A loss in the purified pachytene populations sampled in our genomic assays.

      Reviewer 3:

      Comment: The Methods section refers to antibodies as being in Supplementary Table 3, but the table is labeled as Supplementary Table 2.

      Response: This has been corrected

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __ Summary In this manuscript the authors address the largely unexplored role of micro RNAs (miRNAS) in Drosophila melanogaster brain development, in particular in neural stem cell lineages. The authors for the first time adapt the Ago protein Affinity Purification by Peptides (AGO-APP) technology for Drosophila. They show that this technique works efficiently in neural stem cell lineages and identify several cell type specific active miRNAs. Through a series of bioinformatic analysis the authors identify candidate mRNA targets for these miRNAs. The authors then functionally analyse the role of some of the identified miRNAs, focusing on miRNAs significantly over-represented in neuroblasts.

      By overexpressing Mir-1, the authors demonstrate that this miRNA effectively targets the UTR of Prospero, resulting in the overproliferation of neuroblasts. In a parallel experiment, overexpression of Mir-9c causes neuroblast differentiation defects, similar to the phenotype caused by nerfin-1 mutants, a previously validated target. Loss of function analyses show that knock down of single miRNAs has little functional effects in neuroblast size, showing that the individual effect caused by miRNAs knock down is likely compensated. In contrast, a sponge against a selected group of miRNAs leads to a reduction in poxn positive neuroblasts. Overall these results validate the approach and support the theory that miRNAs cooperate in functional modules during stem cell differentiation.

      We thank Reviewer 1 for its overall positive review. We are grateful for the useful suggestions and we believe the additional experiments we have performed and added strongly improve the quality of the study and will hopefully satisfy the reviewer's concerns.

      Comments

      Title: As the authors do not really explore exit from neural stem cell state this should be altered. The authors do not assess for the levels of any temporal genes, nor other markers of neural stem cell state exit (e.g. nuclear Pros).

      We now have further evidence that the identified microRNA module preserves neuroblasts, in particular in the optic lobe. We have modified the title accordingly: "In vivo AGO-APP identifies a module of microRNAs cooperatively preserving neural progenitors"

      The observed effects, with the available experiments, rather say that neural stem cell state is not maintained in general, not being clear what mechanistically happens to these cells expressing Cluster 2 sponges. The described phenotype caused by the expression of sponges against individual miRNAs also rather shows a blockage in differentiation.

      -The miRNAs analysed were found in Ago-APP to be predominantly active in neuroblasts, but was there any phenotypes of OE or KD in neurons or glial cells?

      Since the analyzed miRNAs were either not or poorly expressed in neurons or glia overall, it seemed less essential to investigate potential phenotypes in these cells. However, we did mis-expressed miR-cluster1sponge and miR-cluster2sponge in neurons and in glial cells (using elav-GAL4 and Repo-GAL4, respectively) throughout development, and did not observe any major impact on viability. All pupae were able to hatch.

      In addition, we show now that mis-expression of the miR-cluster2sponge (that induces strong phenotypes in neuroblasts) specifically in the wing pouch throughout development did not lead to any phenotype in the adult (e.g. wing size (tissue growth), patterning defects (cell differentiation)) (Fig6K,L). Importantly, this experiment rules out unspecific effects of the sponge construct on cell fitness, and highlight the tissue-specificity of the phenotype.

      • The authors obtained a phenotype when using a sponge against Cluster 2 in poxn neuroblasts. Is this specific for these 6 neuroblasts? What happens if this sponge is expressed with a pan-neuroblast driver in central brain/VNC/optic lobe? These experiments should be included as they would show if these are conserved effects for all neuroblasts.

      We already showed in Fig.4B of the first version of the manuscript (using a flip-out approach in clones) that miR-cluster1sponge or miR-cluster2sponge expression leads to an overall reduction in the neuroblast size in the VNC and CB.

      We have now added four more experiments, all suggesting that these sponges specifically affect type I neuroblasts:

      • using the pan-neuroblast driver nab-GAL4, we show that neuroblasts in the VNC and CB expressing these sponges are significantly smaller in late L3. Also, their number is reduced, indicated that some neuroblasts are eliminated (Fig.4C-G).
      • Using pox-GAL4 (already in first version) and eagle-GAL4, we show that different subset of type I neuroblasts in the VNC exhibit different sensitivities to the sponges (from light/medium - neuroblast shrinkage, to high - neuroblast elimination) (Fig.4H-J, S6C-E)
      • using the dpnOL-GAL4 driver, that is specific and strongly active in medulla neuroblasts in the optic lobe, we demonstrate that both, miR-cluster1sponge and miR-cluster2sponge, induce neuroblast shrinking. In addition, we find that the width of the medulla neuroblast stripe is strongly reduced when using the miR-cluster2sponge, providing further evidence for precocious neuroblast elimination (6C,D). Importantly, this leads to a smaller medulla in late L3 (Fig 6F), implying that in these conditions, medulla neuroblasts produce fewer neuronal progeny. Because medulla neuroblasts generate GMCs that undergo a single division, they are also considered as type I neuroblasts
      • using a worniu-GAL4, ase-GAL80 driver, that is specifically active in type II neuroblasts, we show that expression of miR-cluster1sponge and miR-cluster2sponge does not affect neuroblast size and the number of intermediate progenitors (Fig 6H-J). Together, these additional experiments in different types of neuroblasts and in non-neural tissue (the wing pouch, see above) demonstrate a type I neuroblast-specific effect. Our new results also imply that the microRNA module is active in most, if not all type I neuroblasts. In contrast, it is not present or not affecting differentiation genes in type II neuroblasts. Importantly, in Type II lineages, intermediate progenitors produced by neuroblasts undergo themselves a few rounds of divisions before differentiating, unlike GMCs that give rise to two differentiated progeny after a single division. Therefore, the dynamics of differentiation is different in the two lineages, involving a distinct sequential expression of differentiation factors, and possibly different miRNAs.

      The authors do different analyses in different brain regions, making also a hard to conclude if all brain regions behave the same way. As authors show that some miRNAs are only expressed in sub-sets of cells, this becomes particularly relevant.

      The new set of experiments in different types of type I neuroblasts and in type II neuroblasts, presented above, addresses the points on the specificity of the microRNA module.

      Could sponge of cluster 1 cause a phenotype if it had been expressed in other neuroblast lineages?

      Yes, it can. See our new experiments discussed above.

      __ __In addition, a discussion of the results obtained from sponge 1 should be included and put in context with miRNA function, technical limitations, levels/cell, targets, pitfalls of analyses, sponges, etc.

      We have more carefully acknowledge that sponge mediated knock-down is not very efficient and dose-dependent. We also clarified that other approaches will be required in the future to rigorously assess the specificity of each miRNA/mRNA interaction as well as their cooperativity.

      For example: "In contrast to genetic miR-1KO (Fig. 3O), we found that sponge mediated knock-down of this miRNA, or of other individual miRNAs in the module, had never a significant effect on neuroblast size (Fig. 4B), likely because the inhibition induced by sponges is incomplete. However, expression of either multi-sponge 1 or multi-sponge 2 significantly reduced neuroblast size in a dose dependent manner - two copies of the transgene exacerbate the phenotype (Fig. 4B)."

      We also state at the end of the discussion: "In the future, the combination of Ago-APP with complementary genetic strategies will be required to rigorously assess the specificity of each miRNA/mRNA interaction as well as their cooperativity."

      It would also be interesting to further explore the phenotypes caused by Mir-1 sp expression - are there any milder lineage defects?

      We observed an increase in Prospero expression and a decrease of the neuroblast size in miR-1null mutant neuroblast clones (Fig.3L-O). These phenotypes are not observed when miR-1sponge is mis-expressed. This is probably due to the fact that miR-1sponge expression leads to only a partial knock -down of miR-1. Moreover, we have added data about the expression of miR-1sponge in medulla neuroblasts in the optic lobe, showing an absence of obvious phenotype when assessing neuroblast size and neuroblast maintenance. This contrasts with expression of miR-cluster1sponge and miR-cluster2sponge (Fig. 4F,G). This new data is in line with our hypothesis that the knockdown of miRNAs of a common module synergize/cooperate to produce the phenotype expected from the deregulation of their common target mRNAs.

      Any defects in other brain regions/lineages, like in type 2 neuroblasts that usually do not express Pros?

      As suggested by the reviewer, and discussed above, we tested expression of miR-cluster1sponge and miR-cluster2sponge in type-II neuroblasts using the worniu-GAL4, asense-GAL80 driver (Neumüller et al., 2011). Interestingly, in contrast to type I neuroblasts in VNC, CB and OL regions, we did not observe neuroblast shrinking or changes in INP numbers. This suggests that either the self-renewing state is more robust in Type II than in Type I neuroblasts, or that that the uncovered miRNA module is more specific to type I neuroblasts than to type II. We have added and discussed these important data in Fig 6H-J in the revised version.

      Ago-APP identifies cell type specific miRNAs in larval neurogenesis section: - "...29oC... allows Gal4-dependent expression (Fig.1B,C)" - this description of Gal80ts/Gal4 works is not correct, expression is not prevented.

      Gal80 directly binds to Gal4 carboxy terminus and prevents Gal4-mediated transcriptional activation.

      We have tried to clarify this point in the revised version.

      "Thus, when x-GAL4, tub-GAL80ts, UAS-T6B animals are maintained at 18{degree sign}C (restrictive temperature), GAL80 binds to Gal4 and inhibits its activity. *Switching to 29{degree sign}C (permissive temperature) for 24 hours inactivates GAL80, allowing for GAL4-mediated transcriptional activation of UAS-T6B" *

      • Fig S1 - nab-Gal4 also drives expression in GMCs and neurons, rephrase text. Is nab-Gal4 expressed in optic lobe, VNC and central brain neuroblasts?

      nab-GAL4 drives UAS-T6B expression in neuroblasts (in the VNC and in the CB), but also at lower levels in the medulla neuroblasts of the OL.

      We now describe this expression more precisely in the text and in Fig.S1C:

      "nab-GAL4 was used for T6B expression in all neuroblasts. However, because GAL4 is inherited by neuroblast progeny, T6B will also be present in GMCs and a few immature neurons (Fig.S1A,C)24. Of note, nab-GAL4 is highly expressed in the neuroblasts of the ventral nerve cord (VNC) and of the central brain (CB), and weaker in the neuroblasts of the optic lobe (OL) (Fig. S1C)".

      • "20 late larval CNS" - mention the exact stage

      We mention now the precise stage: the wandering stage.

      • Providing a more detailed and interpretive description of Figures 1D and 1E would greatly enhance their clarity. Currently, the descriptions of these pannels resemble typical figure legends.

      We now provide a more detailed description of the data, emphasizing that they are consistent with previous studies on specific miRNAs.

      • Fig. 1F,G,H - It is not clear why the authors sometimes use the optic lobe, other ventral nerve cord as both regions have both neuroblasts, neurons and glia. Are the drivers used for Ago-APP not expressed in all brain regions?

      We now document the activity of the GAL4 drivers used for AGO-APP throughout the entire larval central nervous system in Fig.S1B-D. We also show images of the entire larval central nervous system for the different reporter lines (Fig S1E-K) and focus on regions of interest in the main Fig 1F-M with quantitative measurement of reporter gene expression.

      • Show "data not shown" for 1H.

      It is now shown in Fig. 1M'.

      • Fig. 1F, G, H - Please quantify intensity levels in the different cell types to facilitate comparison with Ago-APP graphs. Include in figure legend what is "cpm".

      Quantification of intensity levels is now represented in Fig. 1F,I and L. Cpm means "counts per millions". We added this in the figure legend.

      A regulatory module controlling neuroblast-to-neuron transition section: - Fig. 2C - A more detailed explanation in text is required in addition to what is mentioned in the figure legend. Including a brief summary/conclusion of the results would be helpful. If possible, add in X-axis 1, 2, 3.

      We clarified this point in the text:

      "We used the Targetscan algorithm1 to determine the predicted target genes of each neuroblast-enriched miRNA. Next, we investigated the correlation between the identified miRNAs and the presence of their targets, based on independently generated mRNA expression data44.

      *This analysis showed that neuroblast-enriched miRNAs predominantly target mRNAs that are normally highly expressed in neurons (Fig. 2C), consistent with a differentiation inhibiting function." *

      • Figure S2B - as mentioned in the text elav is expressed from the neuroblast, although this is not represented in the figure.

      I In this scheme, we depict the expression of proteins, not the presence of mRNAs. elav mRNA is indeed present at low levels in neuroblasts but the protein is absent from both neuroblasts and GMCs (as shown by all our immunostainings against Elav). This fact strongly suggests post-transcriptional repression of elav mRNA (possibly by miRNAs). This likely explains why the elav-GAL4 is also active in neuroblasts. It also suggests some post-transcriptional mechanisms to silence elav in the neuroblasts/GMCs (miRNAs?)

      It is hard to tell what are young vs maturing neurons in the cartoon, pls add a label/legend.

      We added new labels in Fig S2B to uncouple neuronal maturation from temporal identity. We hope it is clearer now.

      • Fig.3I - please shown a control brain. The merge images are not easy to see. I think it would be nicer to change the figures to be color-blind friendly.

      We added the control brain in Fig 3I for VNC clones, and Fig S3A for OL clones.

      We also changed all the figures to be color-blind friendly.

      • Fig. 3K,L - why is this now done in the VNC?

      We now focus on the VNC in the main Figure 3 (Fig.3I,J,K,L,N), and show similar phenotypes in the OL in the Supplemental Figure S3 (Fig.S3A-C).

      • Are there any lineage defects when Mir-1 sp is expressed?

      See previous comment on miR-1sponge.

      • Based on which parameters/variables of the predicted targets was the Hierarchical clustering done? A brief explanation would help the interpretation of the results and of the choice of the clusters that were further analysed.

      Hierarchical clustering is now explained in the "Bioinformatics analysis" section of the Material & Methods section with an additional matrix available in Table S1.

      • "revealed the presence of three main groups" - this should be rephrased as this "grouping" was done arbitrarily by the authors and not by hclust. Hclust is set to merge individual clusters/sub-trees up to 1. Furthermore, a more detailed explanation that supported this decision of choosing this 3 large clusters should be included.

      See previous question.

      • Fig. 4B, S4B - please include in legend how were these clones generated. S4B - scale bars missing.

      We included the missing information and added the missing scale bars.

      • Fig. 4H - was the ratio of UAS/Gal4 kept in both experimental conditions? Increasing the number of UAS/Gal4 leads to weaker expression of UAS and thus could lead to a weaker phenotype. Including in legends genotype details would help.

      This is a very good point as the number of copies of the UAS and/or GAL4 can influence transgene expression and consequently the phenotype observed. We indeed kept the ratio of UAS/GAL4 in both experimental conditions. The exact genotypes for the experiments are:

      Hs-FLP/+; act>stop>Gal4, UAS-GFP/+; UAS-RFP/UAS-miR-1

      Hs-FLP/+; act>stop>Gal4, UAS-GFP/UAS-cluster2sp; UAS-miR-1/+.

      To address this important issue in the manuscript, we added a table (Table S3) listing the precise genotypes for each experiment.

      Minor - Abstract: "a defined group of miRNAs that are predicted to redundantly target all..." This is only predicted, not experimentally shown, this should be modified accordingly.

      Although the request here is not clear to us, we made a few minor changes to the abstract that we hope will satisfy the reviewer.

      • Intro: "Elav, an RNA binding protein, is expressed as soon as post-mitotic neurons..." - Elav is expressed already in neuroblasts, as also mentioned by the authors in the result section. Correct, add references.

      elav is indeed already transcribed in neuroblasts and GMCs. However, the protein is absent in the two cell types (as shown by all our immunostainings), and only present in neurons. Thus, there is a level of post-transcriptional regulation that prevents elav mRNA translation in neuroblasts and GMCs (likely at least partly mediated by miRNAs). This also explains why in elav-GAL4; UAS-T6B brains T6B is expressed in neuroblasts and GMCs, as the GAL4 mRNA transgene is not submitted to the same post-transcriptional regulation.

      • Last paragraph of Intro (Bioinformatic analyses...) - it is not easy to understand the content of this paragraph. Rewrite to improve clarity.

      The paragraph has been rewritten for more clarity with the addition of Table S1

      • All legends: Please mention which developmental stage is being analysed in each panel (i.e. wandering 3IL, hours After Larval Hatching, hours After Puparium Formation, or other), in which brain region the analyses/images are being done.

      The CNS regions are now systematically annotated in the figures. All experiments have been done in wandering L3 (except for the new Fig.6 K,L, where the experiment is done in the adult wing). We now systematically mention in the text and legend the developmental stage at which the experiment is performed.

      Please include more detailed information about the genetics in figure legends.

      We added Table S3 that describes the exact genotype of all crosses done in this study.

      • Please include brief explanation of the genetics of miR-10KOGal4 line.

      This is now also explained in the new Table S3.

      • Why are miRNAs sometimes referred as (e.g.) "miR-1" and others "miR-1-3p"?

      The miRNA found enriched (and thus active) in the neuroblast is the miR-1-3p strand. The UAS-miR-1-sponge has been designed to be complementary to the miR-1-3p strand, and is then referred as miR-1-3psp in the text and figure legend. The miR-1 null clones have been made using the miR-1KO allele, which inactivates the entire locus and therefore both, the miR-1-3p and miR-1-5p strands. This is referred to as miR-1KO or miR-1 in the text. Finally, constructions used to mis-expressed miR-1 and other miRNAs are made with the pre-miRNA, meaning that both strands of the miRNA are mis-expressed. This is then referred as miR-1 in the text.

      • Fig. 3I-M - stage of the animal? 3M - in which brain region is this?

      We have systematically mentioned the brain region on panels on all figures.

      • Fig. 3N - can actual sizes be additionally shown, or at least averages mentioned in text?

      Average sizes are indicated in the legend of new Fig. 4F.

      • If non differentially expressed miRNAs, or miRNA with other expression patterns, had been analysed to determine their targets in the sub-set of genes expressed in neuroblasts (from the transcriptome) would different targets been found? Meaning, how specific are these binding patterns for the selected miRNA?

      This is an interesting and important point. To answer, we added a new analysis (Fig.S2C), where the total number of target sites in the 3'UTR of the pro-differentiation/temporal network genes are shown for different categories of miRNAs: neuroblast-enriched miRNAs (analysed in this study), neuron-enriched miRNAs, glia-enriched miRNAs, and random miRNAs not expressed in the brain. This analysis shows that neuroblast-enriched miRNAs exhibits a higher level of promiscuity with the iconic pro-differentiation/temporal genes than other identified or random miRNAs, arguing for functional relevance.

      **Referees cross-commenting**

      *think this study is very interesting as it optimizes a novel technique in Drosophila for the investigation of cell-specific active miRNAs, and it globally addresses the role of miRNAs in neural stem cell lineages. Although the authors do not explore deeply the biological effect of these miRNAs in neural lineages, I think that the technical contribution and the identification of some miRNA targets is relevant on its own. The authors use Prospero as an example, which is very interesting, as this gene is required to be lowly expressed in Neuroblasts and then upregulated during differentiation. Which the authors propose can be regulated by miRNAs, identifying a novel player in this differentiation mechanism. I do not feel the authors need to perform additional experiments to corroborate their findings, as they are well supported by the experiments presented. I do agree that the authors did not explore deeply the biological effect in neural lineages, and the claims regarding premature terminal differentiation, nerfin, etc need to be toned down accordingly.

      * Reviewer #1 (Significance (Required)):

      This study is both a technical and conceptual advance. It is very interesting as it optimizes a novel technique in Drosophila for the investigation of cell-specific active miRNAs, and it globally addresses the role of miRNAs in neural stem cell lineages. However, the text, especially in the results section, could benefit from increased detail to enhance the comprehension of the experiments, results, and conclusions. Given that the functional analyses were not conducted at a very detailed level, there exist certain instances of over-interpretation, which could be easily addressed either by revising the text or by incorporating additional experiments, as elaborated upon below. This manuscript will be interesting for research fields interested in stem cell differentiation, brain development, micro RNAs, both for Drosophilists and scientists working with other animal models. I am an expert in Drosophila brain development.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ Summary MicroRNAs (miRNAs) have a well-established role in fine-tuning gene expression. Because the mechanisms by which miRNAs recognize specific target transcripts are poorly understood, their functionally relevant targets in the physiological context are mostly poorly defined. Studies in vertebrates have suggested that miRNAs play a prominent role in regulating cell type specification during brain development. Insight into miRNA regulation of target selection will improve our understanding of neural development. Cell type-specific gene expression patterns and functions in the neural stem cell (neuroblast) lineage in the fly larval brain are well characterized. The fly genome is compact, and gene redundancy including miRNAs is significantly less than vertebrates. For these reasons, the authors chose to investigate how miRNAs regulate cell-state transitions by first establishing a comprehensive miRNA expression profile for major cell types in the fly larval brain. They combined the AGO-APP strategy and the GAL4-UAS inducible expression system to pull-down cell type-specific miRNAs from fly larval brain. The authors focused on miRNAs that are enriched in neuroblasts and examine how multi-miRNA modules regulate the maintenance of an undifferentiated state in neuroblasts. The cell type-specific inducible AGO-APP system introduced in this study is innovative and allows for systematic identification of miRNAs that most standard RNA-sequencing techniques missed in previously published datasets. The technological note sets high promise for this study, but the findings appear tame. It is my opinion that there are a number of shortcomings that can improve the rigor of this study. For example, strategies used to determine spatial expression patterns of miRNAs as well as to validate miRNA target genes are indirect with high likelihood of caveats. The choices of candidate target genes to assess the function of miRNAs in the cell state transition appear counterintuitive.

      We thank the reviewer for qualifying our study as "technologically excellent" and for emphasizing the "innovative character of AGO-APP" and the potential of such studies to "be hugely significant to the general audience".

      We are aware that there could be ways to more rigorously and systematically investigate the interactions between miRNAs and their targets and assess their cooperativity. Beyond in vitro luciferase assays (an approach we have used in this study), this would ideally involve multiple new transgenic assays, with point mutations in various miRNA sites in the 3'UTR of predicted target genes as proposed by Reviewer 2. Also, measuring the direct effect of miRNA knockdown on its target is notoriously difficult as it can be modest (and only be revealed through the cooperative action with other miRNAs, as proposed in this study), and sometimes not detected by measuring mRNA levels (e.g. by transcriptomic approaches or FISH).

      One of our aims in the future is to develop such non-trivial approaches, which will take a considerable amount of time and work. At this stage we believe that it would go beyond the scope of the present study which aims at illustrating how introducing a new technology for miRNA isolation (AGO-APP) can help to reassess important questions on miRNA biology and function (e.g. miRNA cooperation within in the context of developmental transitions). We discuss this point now in the last paragraph of the discussion in the revised version.

      Our unbiased AGO-APP results reveal a group of neuroblast enriched miRNAs that are predicted to target multiple times pro-differentiation genes (prospero, elav, nerfin-1, brat) while not targeting stemness genes such as miranda, worniu, inscuteable, deadpan, grainyhead. Mutation in pro-differentiation genes are known to either promote neuroblast tumors (prospero, nerfin-1, brat ) (https://doi.org/10.1016/j.cell.2006.01.03; 10.1101/gad.250282.114) or perturb neuronal differentiation (elav) (https://doi.org/10.1002/neu.480240604). On the other hand, mis-expression of these genes in neuroblasts often promotes shrinkage, precocious differentiation and /or cell cycle-exit (10.1016/j.cell.2008.03.034 ; 10.7554/eLife.03363 ; 10.1101/gad.250282.114). Therefore, bioinformatic prediction and previous studies made it likely that GOF of the neuroblast-enriched miRNAs would lead to neuroblast expansion or differentiation defects, and that LOF would lead to neuroblast shrinkage, cell cycle exit or differentiation. All these predictions are experimentally validated in our study. To reinforce our data, we have performed a number of additional experiments that are described below.

      Furthermore, the authors provided no rationale as to why they chose cell types that are not in the brain (such as wing cells and cells in the optic lobe) to assess the phenotypic effect of manipulating miRNAs.

      All our analysis were done either in the different types of neuroblasts found in the central nervous system (CNS) composed of the ventral nerve cord (VNC) (equivalent to vertebrate spinal cord) and brain (comprising the central brain (CB) and the optic lobes (OL) (10.1016/j.neuron.2013.12.017) - not to be confused with eye imaginal discs that produce the retina but do not contain neuroblasts. We tested the role of the neuroblast-enriched miRNAs in all neuroblasts of the CNS based on the pan-neuroblast activity of the nab-GAL4 driver used for the AGO-APP experiment. We then focused on different types of neuroblasts using lineage specific GAL4 drivers (poxn-GAL4, eagle-GAL4, dpnOL-GAL4, type II-GAL4). This is shown in the entirely revisited last paragraph of the results (Fig 4, 5, 6, S6 and S7). These experiments demonstrate that sponges simultaneously targeting several miRNAs of the module only affect type I neuroblasts but not type II neuroblasts.

      To investigate whether miR-1 directly regulates prospero mRNA in vivo, we used a tissue where prospero is not normally expressed (the wing pouch of the wing imaginal disc in late l3 larvae), allowing us to test how over-expressing miR-1 post-transcriptionally affects versions of prospero mRNAs that either possess or not its endogenous 3'UTR. The obtained results are consistent with in vitro luciferase assays, and miR-1 gain-of function in neuroblasts and GMCs, supporting the hypothesis that prospero mRNA is a direct target of miR-1 via its 3'UTR. We have clarified these points in the revised version of the manuscript.

      Using solely a reduced cell size as the functional readout for "precocious differentiation" is not rigorous and should be complemented with additional measures.

      Reduced neuroblast size always precedes neuroblast differentiation and has been widely used as functional readout of precocious differentiation (this is more clearly emphasized and referenced in the revised version). We have now also observed this phenotype in the neuroblasts of the optic lobe (Fig 6), together with precocious "plunging" of old neuroblasts in the deep layer of the medulla (Fig S7G), another sign of differentiation. These experiments show that the shrinkage phenotype is robust to all type I neuroblasts (medulla neuroblasts of the optic lobe can also be considered as type I neuroblasts because they generate GMCs that undergo a single division).

      Moreover, opposite to precocious differentiation induced by the simultaneous knockdown of multiple miRNAs of the neuroblast module, we now show that mis-expression of many of the miRNAs of the module prevents proper neuronal differentiation (miR-1, miR-9, miR-92a, miR-8) (Fig S5). Taken together, these experiments strongly suggest that the miRNAs of the module have the ability to block neuronal differentiation and that they represent a functional module in type I neuroblasts.

      Major concern: 1. The authors should use a direct method to confirm the expression pattern of identified miRNAs such as miRNA scope (ACD) in the whole mount brain instead of indirect methods such as reporters.

      Such techniques are not trivial and do not represent a standard in Drosophila. Instead, the reporter genes we have used in our study have been already validated in other studies to reflect the expression of particular miRNAs in different tissues. We thus have taken advantages of these available lines to correlate expression patterns as reflected by transgenics with our AGO-APP experiment. All reporter lines tested quantitatively support the AGO-APP data as now shown in the revised Fig 1F,I,L.

      The entire figure 3 aims to provide evidence to support that prospero mRNA is a direct target of miR-1-3p. These convoluted experiments with significant caveats should be replaced with mutating the endogenous miR-1-3p binding sites in the 3'UTR of the prospero reading frame, and demonstrate that the endogenous prospero transcript level is increased by sm-FISH. The authors could also use this novel allele to assess the phenotypic effect of "unregulated prospero" in the larval brain.

      It would indeed be an interesting experiment to perform to show that miR-1 directly regulates pros RNA in vivo. However, our miR-1 mutant clones suggests that miR-1 on its own has only a small contribution to prospero mRNA regulation during the neuroblast-to-neuron transition. This could be due to the low physiological levels of miR-1-3p in neuroblasts and to the fact that several miRNAs of the module may act partly redundantly and collaboratively to maintain the correct level of prospero mRNA. Thus, in this case, it is well possible that changes in the endogenous prospero mRNA transcript may not be significant and detected by smFISH, unless more miRNA sites are mutated. Such an experiment would involve the generation of several new transgenic lines using the CRISPR technology, which represents a long-term project.

      Again, these approaches are powerful and we agree that they would represent a more rigorous assessment of miRNA cooperation. But we feel that it goes beyond the scope of this article, as mentioned above.

      The effect of overexpressing mir-1 on the prospero transgene with its 3'UTR vs without 3'UTR cannot easily compared since the UTR might be regulated by other regulatory mechanisms in addition to mir-1.

      To minimize the potential effect of other regulators, we only compare conditions where the only difference is the presence or absence of miR-1. We do not directly compare levels of Prospero with its 3'UTR vs without 3'UTR. However, there is indeed still the possibility that miR-1 overexpression would change the expression of a protein that regulates prospero mRNA via its 3'UTR.

      Considering this we have tuned-down our conclusion concerning this part in the revised version of the manuscript and now used the sentence:

      "These experiments performed in two different cellular contexts strongly suggest that prospero mRNA is a direct target of miR-1-3p."

      How could the author use evidence-based strategy to demonstrate that massive amplification of Mira-expressing cells induced by overexpressing mir-1 in the optic lobe is indeed due to mis-regulation of prospero instead of mimicking the prospero-mutant phenotype?

      First, we noted that miR-1 overexpression in neuroblast clones causes neuroblast amplification in all regions of the CNS (not only in the optic lobe) at the expense of neuronal differentiation. This is now shown in Fig 3 and S3.

      Second, multiple chemical or genome-wide RNAi screens have been performed (Gould lab, Chia lab, Knoblich lab, etc) to identify genes whose downregulation causes efficient neuroblast amplification (10.1186/1471-2156-7-33 ; 10.1016/j.stem.2011.02.022). In VNC type I neuroblasts, only inactivation of prospero or miranda can lead to efficient neuroblast amplification in late larvae, generating tumour-like structures devoid of neurons. We find that while Miranda is highly expressed in neuroblast clones overexpressing miR-1 (Fig 3J), Prospero is completely absent, suggesting that it is efficiently silenced by miR-1 overexpression, and therefore responsible for the observed phenotype. This new result is now added in Fig.S3D. It is very unlikely that the down-regulation of another gene is responsible for this phenotype. However, we cannot exclude that other genes are deregulated that contribute to this phenotype in addition to prospero knockdown.


      Similarly, what is the evidence that the phenotype associated with mir-9a knockout is due to mis-regulation of nerfin-1?

      In contrast to prosperoKD clones that are devoid of neurons, nerfin-1 mutant clones are known to be composed of a mix of neuroblasts and neurons (Fig S4E,G) (10.1101/gad.250282.114 ). When over-expressing miR-9 in neuroblast clones in the VNC, we observed a strong downregulation of nerfin-1 (Fig S4A, C) showing that nerfin-1 is a likely target of miR-9. However, downregulation is not complete which could explain why we do not see neuroblast amplification in the VNC (Fig 4F). Together with the significant up-regulation of nerfin-1 upon miR-9sponge expression, and the results of our luciferase assays, these data are consistent with nerfin-1 being a direct target of miR-9. Finally, the fact that overexpression of miR-9 in the optic lobes triggers phenotypes very similar to loss of function of nerfin-1 (but different from loss of function of prospero which is upstream of Nerfin-1 in epistatic tests) suggests that down-regulation of nerfin-1 is at least partially responsible for the phenotype (Fig S4D,E).

      Again, we cannot exclude that other deregulated targets contribute to the phenotype.

      Most of look-alike mutant phenotypes presented by the authors appear to occur in the OL. Is there any reason why cells in the visual center, which is not a part of the brain, appears to be more suspectable to loss of function of miRNAs? This is particularly important when manipulating the same miRNAs appear to have very subtle effects on VNC neuroblasts.

      Optic lobes (OL) are a part of the brain (10.1016/j.neuron.2013.12.017). Indeed, each OL constitutes a large region located on both sides of the central brain that integrates signals from retinal photoreceptors coming from the retina in the eyes. Moreover, medulla neuroblasts in the OL can be considered as type I neuroblasts because they generate GMCs that undergo a single division, in contrast to intermediate progenitors (INPs) produced by type II neuroblasts.

      In the original version of our manuscript, we mainly showed gain-of-function in the OL , as for some of the miRNAs the phenotypes were more striking than elsewhere. We have now more systematically tested our gain-of-function and loss-of-function in both the VNC (type-I neuroblasts) (Fig 3, 4, 5, S3, S4, S6) and in the OL (medulla neuroblasts) (Fig 6, S4, S5, S7).

      Results in the VNC are presented generally in the main figures, while results in the OL are presented mainly in supplemental figures; but phenotypes obtained in both parts are now clearly described in the text of the revised version.

      How do the authors know that multi-sponge 2 expression leads to loss of stemness potential in neuroblasts? Any additional evidence that supports precocious differentiation but not death or cell cycle exit?

      This is indeed an important point which we have investigated further in the new version. We now show that inhibiting apoptosis partially rescues neuroblast elimination but not shrinkage when miR-cluster2sponge is expressed in the poxn lineage in the VNC (Fig.4L,M). This shows that VNC neuroblast can disappear by apoptosis upon miR-cluster2sponge, but that shrinkage precedes apoptosis. We also show that optic lobe neuroblasts also shrink upon miR-cluster2sponge and are precociously eliminated as indicated by the thinner neuroblast stripe, by a mechanism independent of apoptosis (Fig 6C,D, S7F). Indeed, the neuroblast stripe in the optic lobe remains free of anti-activated caspase 1 (Dcp1), a widely used label of apoptotic cells, upon miR-cluster2sponge (Fig S7F). Finally, we also show precocious "plunging" of the old OL neuroblasts deep in the medulla, another sign of precocious differentiation (Fig S7G).

      Therefore, these experiments reinforce the conclusion that the neuroblast-enriched miRNA module is involved in neuroblast maintenance and that down-regulation of this module leads to the progressive loss of the neuroblast state.

      Lastly, we show that miR-cluster2sponge has no effect on type II neuroblasts or wing imaginal discs arguing for a specific type I neuroblast effect (including VNC, CB and medulla neuroblasts).

      Again, how do the authors know that mir-1 overexpression efficiently silenced prospero mRNA in neuroblasts and GMCs in Fig. 4F?

      This relevant question is addressed in our response to questions 2 and 3.

      Have the authors considered other targets to better assess the function of these miRNAs enriched in neuroblasts. For example, could these miRNAs function to dampen the expression of genes that are required for maintaining these cells in an undifferentiated state? Several studies using the neuroblast model suggest that the expression of these genes needs to be downregulated at the transcriptional and post-transcriptional levels. Perhaps, these miRNAs might target these "stemness" transcripts instead of "differentiation" transcripts. Is there evidence for or against this possibility?

      This is definitely a good point that we have now discussed in the revised version. We found that neuroblast identity genes (e.g. Mira, Dpn, Insc, etc) are not targeted by the miRNA module. However, the module of miRNA in late L3 neuroblasts also appears to target the early temporal genes (Chinmo, Imp), that are strongly oncogenic and stemness promoting. These need to be silenced in late L3 to ensure that neuroblasts stop dividing during metamorphosis ( 10.7554/eLife.13463). Therefore, there is indeed a strong possibility that the miRNA module we have identified in late L3 both maintains stemness by inhibiting differentiation genes and dampens stemness by silencing early temporal genes ensuring timely elimination in pupal stage. We are actively working on the regulation of temporal genes by microRNAs along development and will describe this in details in another study.

      This point was clarified in the discussion as followed:

      "In this context it is interesting to note that, in addition to differentiation factors, the early temporal factors Chinmo and Imp are predicted to be highly targeted by the neuroblast-enriched miRNA module. Given the strong oncogenic potential of these genes30*, it possible that the microRNA module not only protects neuroblasts against precocious differentiation but also protects against uncontrolled self-renewal. Therefore, in principle the same miRNA module could control neuroblast activity through the control of both self-renewal and differentiation, two seemingly opposing biological activities." *

      Minor point 1. There are a number of mis-leading statements throughout the manuscript. -In the abstract, the authors indicated "isolate actively inhibiting miRNAs from different neural cell populations in the larval Drosophila central nervous system". For example, the expression patterns of Nub-Gal4 an Elav-Gal4 drivers appear to be partially overlapping in multiple cell types and might be active in the visual center (optic lobe). If true, it was unclear to me what neural cell types were actually used in their analyses and how they could confidently indicate that cell types in the central nervous system were used in their study. Aren't there more specific Gal4 drivers or more sophisticated genetic tools available to increase the purity of cell types? If not, the alternative could be a much more precise secondary screening step to directly determine where these miRNAs are actually detected instead of relying on indirect readouts of where they might be expressed.

      The expression patterns with additional figures are now more clearly described in the main text and in Fig.S1C,D.

      We are in the process of using other GAL4 drivers that target more specific populations of neurons. But this is beyond the scope of this first study and will be published later.

      -The statement "GMCs lacking Prospero, Nerfin-1 or Brat fail to differentiate and reacquire a neuroblast identity" is very problematic. Nerfin-1 does not appear to be expressed in GMCs according to Fig. S2B. Furthermore, Froldi et al., 2015 suggested that Nerfin-1 appears to prevent activated Notch from reverting neurons to ectopic neuroblasts.

      Indeed, Nerfin-1 is not expressed in GMCs but in immature neurons to stabilize neuronal identity and prevent reversion as shown by Froldi et al. and other studies (DOI: 10.1101/gad.250282.114 ; https://doi.org/10.1242/dev.141341). We have now clarified this point in the introduction: "This process involves the sequential activity of key cell fate determinants such as the transcription factor Prospero and the RNA-binding protein Brat in the GMCs followed by the transcription factor Nerfin-1 and the RNA-binding protein Elav in the maturing neurons20-23. GMCs lacking Prospero, or immature post-mitotic progeny lacking Nerfin-1, fail to initiate or maintain differentiation respectively, and progressively reacquire a neuroblast identity, leading to neuroblast amplification 21,23-25."

      -The statement on page 6 "Strikingly, the group of genes ... contained all iconic genes known to induce neuron differentiation after neuroblast asymmetric division, including nerfin-1, prospero, elav and brat" is problematic. Again, Nerfin-1 probably functions to maintain a neuronal state rather to induce differentiation. Is there evidence that Elav induces neuron differentiation after neuroblast asymmetric division? Brat seems to downregulate Notch signaling in neuroblast progeny rather than instructing neuron differentiation. Furthermore, previous studies suggested that loss of brat function does not affect identity of GMCs and their symmetric division to generate neurons. A similar statement is used at the end of this same paragraph to reiterates mis-leading messages.

      Prospero and Nerfin-1 are sequentially expressed in maturing neurons. Nerfin-1 shares many similar targets as Prospero. It has been proposed that Nerfin-1 prolonged the action of Prospero, allowing stabilisation/maintenance of the differentiated neuronal state (10.1101/gad.250282.114 ; 10.1016/j.celrep.2018.10.038)

      Brat is also involved in the sequence of events needed to produce neurons upon neuroblast asymmetric division. However, the mode of action of Brat in GMCs from type-I neuroblasts and in INPs from type-II neuroblasts is unclear. It was shown that Brat is an RNA-binding protein that has multiple targets. For example, it can bind and silence Myc, Zelda and Deadpan, and promote neuroblast-to-INP differentiation. It may also inhibit Notch signaling which is required for neuroblast-to-INP differentiation (https://doi.org/10.1016/j.devcel.2006.01.017; 10.1016/j.devcel.2008.03.004 ; https://doi.org/10.15252/embr.201744188; https://doi.org/10.1158/0008-5472.CAN-15-2299)

      We have clarified the difference between Type I and Type II neuroblasts in the introduction: "A sparse subset of neuroblasts (Type II) generate intermediate progenitors (INPs) that can undergo a few more asymmetric divisions, allowing for larger lineages to be produced. The neuroblast-to-neuron process in Type II lineages involves a slightly different sequential expression of differentiation factors21,24."

      We have also added a new reference describing that neuronal differentiation and maintenance are severely affected upon elav loss of function:

      Yao, K.-M., Samson, M.-L., Reeves, R. & White, K. Gene elav of Drosophila melanogaster: A prototype for neuronal-specific RNA binding protein gene family that is conserved in flies and humans. J. Neurobiol. 24, 723-739 (1993).

      **Referees cross-commenting**

      My main concern about data in this study remains direct vs. indirect effects of manipulating miRNA functions and the corresponding phenotype in various cell types in flies. The authors focused most of their effort on using genes that promote GMC differentiation in order to establish the role of neuroblast-specific miRNAs. Most of the experiments were not rigorously performed to the level that eliminates obvious caveats and suggests their interpretation is the most likely possibility. It is a technologically excellent study but lacks in-depth analyses in biological effects.

      Reviewer #2 (Significance (Required)):

      I believe there is a strong general interest in better appreciating how miRNAs regulate precise gene expression. Deriving some sort of rules such as the specificity of target selection or the efficiency of downregulating gene expression will be hugely significant to the general audience

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study advances our understanding of the brain nuclei involved in rapid-eye movement (REM) sleep regulation. Using a combination of imaging, electrophysiology, and optogenetic tools, the study provides convincing evidence that inhibitory neurons in the preoptic area of the hypothalamus influence REM sleep. This work will be of interest to neurobiologists working on sleep and/or brain circuitry.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper identifies GABA cells in the preoptic hypothalamus which are involved in REM sleep rebound (the increase in REM sleep) after selective REM sleep deprivation. By calcium photometry, these cells are most active during REM, and show more claim signals during REM deprivation, suggesting they respond to "REM pressure". Inhibiting these cells ontogenetically diminishes REM sleep. The optogenetic and photometry work is carried out to a high standard, the paper is well-written, and the findings are interesting.

      We thank the reviewer for the detailed feedback and thoughtful comments on how to improve our manuscript. To address the reviewer’s concerns, we revised our discussion and added new data. Below, we address the concerns point by point.

      Points that could be addressed or discussed:

      (1) The circuit mechanism for REM rebound is not defined. How do the authors see REM rebound as working from the POAGAD2 cells? Although the POAGAD2 does project to the TMN, the actual REM rebound could be mediated by a projection of these cells elsewhere. This could be discussed.

      We demonstrate thatPOA GAD2→TMN cells become more frequently activated as the pressure for REMs builds up, whereas inhibiting these neurons during high REMs pressure leads to a suppression of the REMs rebound. It is not known how POA GAD2→TMN cells encodeincreased REMs pressure and subsequently influence the REMs rebound. REMsdeprivation wasshown to changethe intrinsic excitabilityof hippocampal neurons and impact synaptic plasticity (McDermott et al., 2003; Mallick and Singh, 2011 ; Zhou et al., 2020) . We speculate that increasedREMs pressure leads to an increase in the excitabilityof POA->TMN neurons, reflected inthe increased number ofcalcium peaks. The increased excitability of POA GAD2→TMN neurons in turn likely leads to stronger inhibition of downstream REM-off neurons. Consequently, as soon as REMsdeprivation stops, there is an increased chance for enteringREMs. The time coursefor how long it takes till the POA excitability resettles toits baseline consequently sets a permissive time window for increasedamounts of REMs to recover its lostamount. For future studies, it would be interesting to map how quickly the excitability ofPOA neurons increases or decays as afunction of the lost or recovered amount of REMs andunravel the cellularmechanisms underlying the elevated activity of POAGAD2 →TMN neurons during highREMs pressure, e.g., whether changes in the expression of ion channels contribute to increasedexcitability of these neurons (Donlea et al., 2014) . As we mentioned in the Discussion, the POAalso projects to other REMs regulatorybrain regions such as the vlPAG and LH. Therefore, it remains to be tested whether POA GAD2 →TMN neurons also innervate these brain regions to potentially regulate REMs homeostasis. We explicitly state this now in the revised Discussion.

      (2) The "POAGAD2 to TMN" name for these cells is somewhat confusing. The authors chose this name because they approach the POAGAD2 cells via retrograde AAV labelling (rAAV injected into the TMN). However, the name also seems to imply that neurons (perhaps histamine neurons) in the TMN are involved in the REM rebound, but there is no evidence in the paper that this is the case. Although it is nice to see from the photometry studies that the histamine cells are selectively more active (as expected) in NREM sleep (Fig. S2), I could not logically see how this was a relevant finding to REM rebound or the subject of the paper. There are many other types of cells in the TMN area, not just histamine cells, so are the authors suggesting that these non-histamine cells in the TMN could be involved?

      We acknowledge that other types of neurons in the TMN may also be involved in the REMs rebound, and therefore inhibition of histamine neurons by POA GAD2 →TMN neurons may not be the sole source of the observed effect. To stress that other neurons within the TMN and/or brain regions may also contribute to the REMs rebound, we have revised the Results section.

      We performed complementary optogenetic inhibition experiments of TMN HIS neurons to investigate if suppression of these neurons is sufficient to promote REMs. We foundthat SwiChR++ mediated inhibition of TMNHIS neurons increased theamount of REMs compared withrecordings without laser stimulation in the same mice and eYFPmice withlaser stimulation. Thus, while TMN HIS neurons may not bethe only downstream target of GABAergic POA neurons, these data suggest that they contribute to REMs regulation. We have incorporated these results in Fig. S4 .

      We further investigated whether the activity of TMN HIS neurons changes between two REMs episodes. Assumingthat REMs pressure inhibits the activity ofREM-off histamine neurons,their firing rates should behighest right after REMs ends when REMs pressure is lowest, and progressivelydecay throughout the inter-REM interval, and reach their lowest activity right before the onset of REMs ( Park et al., 2021) , similarto the activity profile observed for vlPAG REM-off neurons (Weber et al., 2018).We indeed found that TMNHIS neurons displaya gradual decrease in their activity throughout theinter-REM interval and thus potentially reflect the build up of REM pressure ( Fig. S2F ).

      (3) It is a puzzle why most of the neurons in the POA seem to have their highest activity in REM, as also found by Miracca et al 2022, yet presumably some of these cells are going to be involved in NREM sleep as well. Could the same POAGAD2-TMN cells identified by the authors also be involved in inducing NREM sleep-inhibiting histamine neurons (Chung et al). And some of these POA cells will also be involved in NREM sleep homeostasis (e.g. Ma et al Curr Biol)? Is NREM sleep rebound necessary before getting REM sleep rebound? Indeed, can these two things (NREM and REM sleep rebound) be separated?

      Previous studies have demonstrated that POA GABAergic neurons, including those projecting to the TMN, are involved in NREMs homeostasis (Sherin et al., 1998; Gong et al., 2004; Ma et al., 2019) . Therefore, we predict that POA neurons that are involved in NREMs homeostasis are a subset of POA GAD2 → TMN neurons in our manuscript.

      Using optrode recordings in the POA, we recently reported that 12.4% of neurons sampled have higher activity during NREMs compared with REMs; in contrast, 43.8% of neurons sampled have the highest activity during REMs compared with NREMs (Antila et al., 2022) indicating that the proportion of NREM max neurons is smaller compared with REM max neurons. These proportions of neurons are in agreement with previous results (Takahashi et al., 2009) . Considering fiber photometry monitors the average activity of a population of neurons as opposed to individual neurons, it is possible that we recorded neural activity across heterogeneous populations and therefore our findings may disguise the neural activity of the low proportion of NREMs neurons. We previously reported thespiking activity of POA GAD2 →TMN neurons at the singlecell level (Chung et al., 2017) . We have noted in themanuscript thatwhile the activity ofPOA GAD2→TMN neurons is highestduring REMs, theneural activity increases at NREMs → REMs transitions indicating these neurons also areactive during NREMs.

      Using our REMs restriction protocol, we selectively restricted REMs leading to the subsequent rebound of REMs without affecting NREMs and consequently we did not find an increase in the amount of NREMs during the rebound or an increase in slow-wave activity, a key characteristic of sleep rebound that gradually dissipates during recovery sleep (Blake and Gerard, 1937; Williams et al., 1964; Rosa and Bonnet, 1985; Dijk et al., 1990; Neckelmann and Ursin, 1993; Ferrara et al., 1999) . However, during total sleep deprivation when subjects are deprived of both NREMs and REMs, isolating NREMs and REMs rebound may not be attainable.

      (4) Is it possible to narrow down the POA area where the GAD2 cells are located more precisely?

      POA can be subdivided into anatomically distinct regions such as medial preoptic area, median preoptic area, ventrolateral preoptic area, and lateral preoptic area (MPO, MPN, VLPO, and LPO respectively). To quantify where the virus expressing GAD2 cells and optic fibers are located within the POA, we overlaid the POA coronal reference images (with red boundaries denoting these anatomically distinct regions) over the virus heat maps and optic fiber tracts from datasets used in Figure 1A. We found that virus expression and optic fiber tracts were located in the ventrolateral POA, lateral POA, and the lateral part of medial POA, and included this description in the text.

      Author response image 1.

      Location of virus expression (A) and optic fiber placement (B) within subregions of POA.

      (5) It would be ideal to further characterize these particular GAD2 cells by RT-PCR or RNA seq. Which other markers do they express?

      Single-cell RNA-sequencing of POA neurons has revealed an enormous level of molecular diversity, consisting of nearly 70 subpopulations based on gene expression of which 43 can be clustered into inhibitory neurons (Moffitt et al., 2018) . One of the most studied subpopulation of POA sleep-active neurons contains the inhibitory neuropeptide galanin (Sherin et al., 1998; Gaus et al., 2002; Chung et al., 2017; Kroeger et al., 2018; Ma et al., 2019; Miracca et al., 2022) . Galanin neurons have been demonstrated to innervate the TMN (Sherin et al., 1998) yet, within the galanin neurons 7 distinct clusters exist based on unique gene expression (Moffitt et al., 2018) . In addition to galanin, we have previously performed single-cell RNA-seq on POA GAD2 → TMN neurons and identified additional neuropeptides such as cholecystokinin (CCK), corticotropin-releasing hormone (CRH), prodynorphin (PDYN), and tachykinin 1 (TAC1) as subpopulations of GABAergic POA sleep-active neurons (Chung et al., 2017; Smith et al., 2023) . Like galanin, these neuropeptides can also be divided into multiple subtypes as well (Chen et al., 2017; Moffitt et al., 2018) . Thus while these molecular markers for POA neurons are immensely diverse, we agree that characterizing the molecular identity of POA GAD2 → TMN neurons and investigating the functional relevance of these neuropeptides in the context of REMs homeostasis would enrich our understanding of a neural circuit involved in REMs homeostasis and can stand as a separate extension of this manuscript.

      Reviewer #2 (Public Review):

      Maurer et al investigated the contribution of GAD2+ neurons in the preoptic area (POA), projecting to the tuberomammillary nucleus (TMN), to REM sleep regulation. They applied an elegant design to monitor and manipulate the activity of this specific group of neurons: a GAD2-Cre mouse, injected with retrograde AAV constructs in the TMN, thereby presumably only targeting GAD2+ cells projecting to the TMN. Using this set-up in combination with technically challenging techniques including EEG with photometry and REM sleep deprivation, the authors found that this cell-type studied becomes active shortly (≈40sec) prior to entering REM sleep and remains active during REM sleep. Moreover, optogenetic inhibition of GAD2+ cells inhibits REM sleep by a third and also impairs the rebound in REM sleep in the following hour. Despite a few reservations or details that would benefit from further clarification (outlined below), the data makes a convincing case for the role of GAD2+ neurons in the POA projecting to the TMN in REM sleep regulation.

      We thank the reviewer for the thorough assessment of our study and supportive comments. We have addressed your concerns in the revised manuscript, and our point by point response is provided below.

      The authors found that optogenetic inhibition of GAD2+ cells suppressed REM sleep in the hour following the inhibition (e.g. Fig2 and Fig4). If the authors have the data available, it would be important to include the subsequent hours in the rebound time (e.g. from ZT8.5 to ZT24) to test whether REM sleep rebound remains impaired, or recovers, albeit with a delay.

      We thank the reviewer for this comment and agree that it would be interesting to know how REMs changes for a longer period of time throughout the rebound phase. For Fig. 2, we did not record the subsequent hours. For Fig 4, we recorded the subsequent rebound between ZT7.5 and 10.5. When we compare the REMs amount during this 4 hr interval, the SwiChR mice have less REMs compared with eYFP mice with marginal significance (unpaired t-test, p=0.0641). We also plotted the cumulative REMs amount during restriction and rebound phases, and found that the cumulative amount of REMs was still lower in SwiChR mice than eYFP mice at ZT 10.5 (Author response image 2). Therefore, it will be interesting to record for a longer period of time to test when the SwiChR mice compensate for all the REMs that was lost during the restriction period.

      Author response image 2.

      Cumulative amount of REMs during REMs deprivation and rebound combined with optogenetic stimulation in eYFP and SwiChR groups. This data is shown as bar graphs in Figure 4.

      REM sleep is under tight circadian control (e.g. Wurts et al., 2000 in rats; Dijk, Czeisler 1995 in humans). To contextualize the results, it would be important to mention that it is not clear if the role of the manipulated neurons in REM sleep regulation hold at other circadian times of the day.

      Author response image 3.

      Inhibiting POA GAD2→ TMN neurons at ZT5-8 reduces REMs. (A) Schematic of optogenetic inhibition experiments. (B) Percentage of time spent in REMs, NREMs and wakefulness with laser in SwiChR++ and eYFP mice. Unpaired t-tests, p = 0.0013, 0.0469 for REMs and wakeamount. (C) Duration of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0113 for NREMs duration. (D) Frequency of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0063, 0.0382 for REMs and NREMs frequency.

      REMs propensity is largest towards the end of the light phase (Czeisler et al., 1980; Dijk and Czeisler, 1995; Wurts and Edgar, 2000). As a control, we therefore performed the optogenetic inhibition experiments of POA GAD2→TMN neurons during ZT5-8 (Author response image 3). Similar to our results in Figure 2, we found that SwiChR-mediated inhibition of POA GAD2 →TMN neurons attenuated REMs compared with eYFP laser sessions. These findings suggest our results are consistentat other circadian times of the day.

      The effect size of the REM sleep deprivation using the vibrating motor method is unclear. In FigS4-D, the experimental mice reduce their REM sleep to 3% whereas the control mice spend 6% in REM sleep. In Fig4, mice are either subjected to REM sleep deprivation with the vibrating motor (controls), or REM sleep deprivations + optogenetics (experimental mice).

      The control mice (vibrating motor) in Fig4 spend 6% of their time in REM sleep, which is double the amount of REM sleep compared to the mice receiving the same treatment in FigS4-D. Can the authors clarify the origin of this difference in the text?

      The effect size for REM sleep deprivation is now added in the text.

      It is important to note that these figures are analyzing two different intervals of the REMs restriction. In Fig. S4D, we analyzed the total amount of REMs over the entire 6 hr restriction interval (ZT1.5-7.5). In Fig. 4, we analyzed the amount of REMs only during the last 3 hr of restriction (ZT4.5-7.5) as optogenetic inhibition was performed only during the last 3 hrs when the REMs pressure is high. In Fig. S4D, we looked at the amount of REMs during ZT1.5-4.5 and 4.5-7.5 and found that the amount of REMs during ZT4.5-7.5 (4.46 ± 0.25 %; mean ± s.e.m.) is indeed higher than ZT 1.5-4.5 (1.66 ± 0.62 %), and is comparable to the amount of REMs during ZT4.5-7.5 in eYFP mice (5.95 ± 0.52 %) in Fig. 4. We now clearly state in the manuscript at which time points we analyzed the amount, duration and frequency of REMs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) A few further citations suggested: Discussion "The TMN contains histamine producing neurons and antagonizing histamine neurons causes sleepiness..." It would be appropriate to cite Uygun DS et al 2016 J Neurosci (PMID: 27807161) here. Using the same HDC-Cre mice as used by Maurer et al., Uygun et al found that selectively increasing GABAergic inhibition onto histamine neurons produced NREM sleep.

      We apologize for omitting this important paper. In the revised manuscript, we added this citation.

      (2) Materials and Methods.

      Although the JAX numbers are given for the mouse lines based on researchers generously donating to JAX for others to use, please cite the papers corresponding to the GAD2-ires-Cre and HDC-ires-Cre mouse lines deposited at JAX.

      GAD2-ires-Cre was described in Taniguchi H et al., 2011, Neuron (PMID: 21943598).

      The construction of the HDC-ires-CRE line is described in Zecharia AY et al J Neurosci et al 2012 (PMID: 22993424).

      We have now added these important citations in the revised manuscript.

      (3) Similarly, for the viruses, please provide the citations for the AAV constructs that were donated to Addgene.

      We have now added these citations in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The authors rely heavily on their conclusions by using an optogenetic tool that inhibits the activity of GAD2+ neurons, however, it is not shown that these neurons are indeed inhibited as expected. An alternative approach to tackle this could be the application of a different technique to achieve the same output (e.g. chemogenetics). However, both experiments (confirmation of inhibition, or using a different technique) would require a significant amount of work, and given the numerous studies out there showing that these optogenetic tools tend to work, may not be necessary. Hence the authors could also cite a similar study that used a likewise construct and where it was indeed shown that this technique works (i.e. similar retrograde optogenetic construct with Cre depedendent expression combined with electrophysiological recordings).

      This laser stimulation protocol was designed based on previous reports of sustained inhibition using the same inhibitory opsin and our prior results that recapitulate similar findings as inhibitory chemogenetic techniques (Iyer et al., 2016; Kim et al., 2016; Wiegert et al., 2017; Stucynski et al., 2022). We have now added this description in the Result section.

      Fig1A - Right: the virus expression graphs are great and give a helpful insight into the variability. The image on the left (GCAMP+ cells) is less clear, the GCAMP+ cells don't differentiate well from the background. Perhaps the whole brain image with inset in POA can show the GCAMP expression more convincingly.

      We have added a histology picture showing the whole brain image with inset in the POA in the updated Fig. 1A .

      Statistics: The table is very helpful. Based on the degrees of freedom, it seems that in some instances the stats are run on the recordings rather than on the individual mice (e.g. Fig1). It could be considered to use a mixed model where subjects as taken into account as a factor.

      Author response image 4.

      ΔF/Factivity of POA GAD2→TMN neurons during NREMs. The duration of NREMs episodes was normalized in time, ranging from 0 to 100%. Shading, ± s.e.m. Pairwise t-tests with Holm-Bonferroni correctionp = 5.34 e-4 between80 and100. Graybar, intervals where ΔF/F activity was significantly different from baseline (0 to 20%, the first time bin). n = 10 mice. In Fig. 1E , we ran stats based on the recordings. In this data set, we ran stats based on the individual mice, and found that the activity also gradually increased throughout NREMs episodes.

      There is an effect of laser in Fig2 on REM sleep amount, as well as an interaction effect with virus injection (from the table). Therefore, it would be helpful for the reader to also show REM sleep data from the control group (laser stimulation but no active optogenetics construct) in Fig 2.

      To properly control laser and virus effect, we performed the same laser stimulation experiments in eYFP control mice (expressing only eYFP without optogenetic construct, SwiChR++) and the data is provided in Fig 2C .

      Fig3B: At the start of the rebound of REM sleep, there is a massive amount of wakefulness, also reflected in the change of spectral composition. Could you comment on the text about what is happening here?

      We quantified the amount of wakefulness during the first hour of REMs rebound and found that indeed there is no significant difference in wakefulness between REM restriction and baseline control conditions ( Fig. S4H ). Therefore, while the representative image in Fig 3B shows increased wakefulness at the beginning of REMs rebound, we do not think the overall amount of wakefulness is increased.

      Fig 4, supplementary data: it would be helpful for the reader to have mentioned in the text the effect size of the REM sleep restriction protocol (e.g. mean and standard deviation).

      Thank you for this suggestion. We have now added the effect size for the REM sleep restriction experiments in the main text.

      REM sleep restriction and photometry experiment: could be improved by adding within the main body of text that, in order to conduct the photometry experiment in the last hours of REM sleep deprivation, the manual REM sleep deprivation had to be applied, because the vibrating motor technique disturbed the photometry recordings.

      Thank you for this suggestion. We have added the description in the main text.

      Suggestion to build further on the already existing data (not for this paper): you have a powerful dataset to test whether REM sleep pressure builds up during wakefulness or NREM sleep, by correlating when your optogenetic treatment occurs (NREM or wakefulness), with the subsequent rebound in REM sleep (see also Endo et al., 1998; Benington and Heller, 1994; Franken 2001).

      We thank the reviewer for this excellent suggestion. We plan to carry out this experiment in the future.

      References

      Antila, H., Kwak, I., Choi, A., Pisciotti, A., Covarrubias, I., Baik, J., et al. (2022). A noradrenergic-hypothalamic neural substrate for stress-induced sleep disturbances. Proc. Natl. Acad. Sci. 119, e2123528119. doi: 10.1073/pnas.2123528119.

      Blake, H., and Gerard, R. W. (1937). Brain potentials during sleep. Am. J. Physiol.-Leg. Content 119, 692–703. doi: 10.1152/ajplegacy.1937.119.4.692.

      Chen, R., Wu, X., Jiang, L., and Zhang, Y. (2017). Single-Cell RNA-Seq Reveals Hypothalamic Cell Diversity. Cell Rep. 18, 3227–3241. doi: 10.1016/j.celrep.2017.03.004.

      Chung, S., Weber, F., Zhong, P., Tan, C. L., Nguyen, T., Beier, K. T., et al. (2017). Identification of Preoptic Sleep Neurons Using Retrograde Labeling and Gene Profiling. Nature 545, 477–481. doi: 10.1038/nature22350.

      Czeisler, C. A., Zimmerman, J. C., Ronda, J. M., Moore-Ede, M. C., and Weitzman, E. D. (1980). Timing of REM sleep is coupled to the circadian rhythm of body temperature in man. Sleep 2, 329–346.

      Dijk, D. J., Brunner, D. P., Beersma, D. G., and Borbély, A. A. (1990). Electroencephalogram power density and slow wave sleep as a function of prior waking and circadian phase. Sleep 13, 430–440. doi: 10.1093/sleep/13.5.430.

      Dijk, D. J., and Czeisler, C. A. (1995). Contribution of the circadian pacemaker and the sleep homeostat to sleep propensity, sleep structure, electroencephalographic slow waves, and sleep spindle activity in humans. J. Neurosci. Off. J. Soc. Neurosci. 15, 3526–3538. doi: 10.1523/JNEUROSCI.15-05-03526.1995.

      Donlea, J. M., Pimentel, D., and Miesenböck, G. (2014). Neuronal machinery of sleep homeostasis in Drosophila. Neuron 81, 860–872. doi: 10.1016/j.neuron.2013.12.013.

      Ferrara, M., De Gennaro, L., Casagrande, M., and Bertini, M. (1999). Auditory arousal thresholds after selective slow-wave sleep deprivation. Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol. 110, 2148–2152. doi: 10.1016/s1388-2457(99)00171-6.

      Gaus, S. E., Strecker, R. E., Tate, B. A., Parker, R. A., and Saper, C. B. (2002). Ventrolateral preoptic nucleus contains sleep-active, galaninergic neurons in multiple mammalian species. Neuroscience 115, 285–294. doi: 10.1016/S0306-4522(02)00308-1.

      Gong, H., McGinty, D., Guzman-Marin, R., Chew, K.-T., Stewart, D., and Szymusiak, R. (2004). Activation of c-fos in GABAergic neurones in the preoptic area during sleep and in response to sleep deprivation. J. Physiol. 556, 935–946. doi: 10.1113/jphysiol.2003.056622.

      Iyer, S. M., Vesuna, S., Ramakrishnan, C., Huynh, K., Young, S., Berndt, A., et al. (2016). Optogenetic and chemogenetic strategies for sustained inhibition of pain. Sci. Rep. 6, 30570. doi: 10.1038/srep30570.

      Kim, H., Ährlund-Richter, S., Wang, X., Deisseroth, K., and Carlén, M. (2016). Prefrontal Parvalbumin Neurons in Control of Attention. Cell 164, 208–218. doi: 10.1016/j.cell.2015.11.038.

      Kroeger, D., Absi, G., Gagliardi, C., Bandaru, S. S., Madara, J. C., Ferrari, L. L., et al. (2018). Galanin neurons in the ventrolateral preoptic area promote sleep and heat loss in mice. Nat. Commun. 9, 4129. doi: 10.1038/s41467-018-06590-7.

      Ma, Y., Miracca, G., Yu, X., Harding, E. C., Miao, A., Yustos, R., et al. (2019). Galanin Neurons Unite Sleep Homeostasis and α2-Adrenergic Sedation. Curr. Biol. CB 29, 3315-3322.e3. doi: 10.1016/j.cub.2019.07.087.

      Mallick, B. N., and Singh, A. (2011). REM sleep loss increases brain excitability: role of noradrenaline and its mechanism of action. Sleep Med. Rev. 15, 165–178. doi: 10.1016/j.smrv.2010.11.001.

      McDermott, C. M., LaHoste, G. J., Chen, C., Musto, A., Bazan, N. G., and Magee, J. C. (2003). Sleep deprivation causes behavioral, synaptic, and membrane excitability alterations in hippocampal neurons. J. Neurosci. Off. J. Soc. Neurosci. 23, 9687–9695. doi: 10.1523/JNEUROSCI.23-29-09687.2003.

      Miracca, G., Anuncibay-Soto, B., Tossell, K., Yustos, R., Vyssotski, A. L., Franks, N. P., et al. (2022). NMDA Receptors in the Lateral Preoptic Hypothalamus Are Essential for Sustaining NREM and REM Sleep. J. Neurosci. 42, 5389–5409. doi: 10.1523/JNEUROSCI.0350-21.2022.

      Moffitt, J. R., Bambah-Mukku, D., Eichhorn, S. W., Vaughn, E., Shekhar, K., Perez, J. D., et al. (2018). Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362. doi: 10.1126/science.aau5324.

      Neckelmann, D., and Ursin, R. (1993). Sleep stages and EEG power spectrum in relation to acoustical stimulus arousal threshold in the rat. Sleep 16, 467–477.

      Park, S.-H., Baik, J., Hong, J., Antila, H., Kurland, B., Chung, S., et al. (2021). A probabilistic model for the ultradian timing of REM sleep in mice. PLOS Comput. Biol. 17, e1009316. doi: 10.1371/journal.pcbi.1009316.

      Rosa, R. R., and Bonnet, M. H. (1985). Sleep stages, auditory arousal threshold, and body temperature as predictors of behavior upon awakening. Int. J. Neurosci. 27, 73–83. doi: 10.3109/00207458509149136.

      Sherin, J. E., Elmquist, J. K., Torrealba, F., and Saper, C. B. (1998). Innervation of histaminergic tuberomammillary neurons by GABAergic and galaninergic neurons in the ventrolateral preoptic nucleus of the rat. J. Neurosci. Off. J. Soc. Neurosci. 18, 4705–4721.

      Smith, J., Honig-Frand, A., Antila, H., Choi, A., Kim, H., Beier, K. T., et al. (2023). Regulation of stress-induced sleep fragmentation by preoptic glutamatergic neurons. Curr. Biol. CB , S0960-9822(23)01585–3. doi: 10.1016/j.cub.2023.11.035.

      Stucynski, J. A., Schott, A. L., Baik, J., Chung, S., and Weber, F. (2022). Regulation of REM sleep by inhibitory neurons in the dorsomedial medulla. Curr. Biol. CB 32, 37-50.e6. doi: 10.1016/j.cub.2021.10.030.

      Takahashi, K., Lin, J.-S., and Sakai, K. (2009). Characterization and mapping of sleep-waking specific neurons in the basal forebrain and preoptic hypothalamus in mice. Neuroscience 161, 269–292. doi: 10.1016/j.neuroscience.2009.02.075.

      Weber, F., Hoang Do, J. P., Chung, S., Beier, K. T., Bikov, M., Saffari Doost, M., et al. (2018). Regulation of REM and Non-REM sleep by periaqueductal GABAergic neurons. Nat. Commun. 9, 1–13. doi: 10.1038/s41467-017-02765-w.

      Wiegert, J. S., Mahn, M., Prigge, M., Printz, Y., and Yizhar, O. (2017). Silencing Neurons: Tools, Applications, and Experimental Constraints. Neuron 95, 504–529. doi: 10.1016/j.neuron.2017.06.050.

      Williams, H. L., Hammack, J. T., Daly, R. L., Dement, W. C., and Lubin, A. (1964). RESPONSES TO AUDITORY STIMULATION, SLEEP LOSS AND THE EEG STAGES OF SLEEP. Electroencephalogr. Clin. Neurophysiol. 16, 269–279. doi: 10.1016/0013-4694(64)90109-9.

      Wurts, S. W., and Edgar, D. M. (2000). Circadian and homeostatic control of rapid eye movement (REM) sleep: promotion of REM tendency by the suprachiasmatic nucleus. J. Neurosci. Off. J. Soc. Neurosci. 20, 4300–4310. doi: 10.1523/JNEUROSCI.20-11-04300.2000.

      Zhou, Y., Lai, C. S. W., Bai, Y., Li, W., Zhao, R., Yang, G., et al. (2020). REM sleep promotes experience-dependent dendritic spine elimination in the mouse cortex. Nat. Commun. 11, 4819. doi: 10.1038/s41467-020-18592-5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Dormancy/diapause/hibernation (depending on how the terms are defined) is a key life history strategy that allows the temporal escape from unfavorable conditions. Although environmental conditions do play a major role in inducing and terminating dormancy (authors call this energy limitation hypothesis), the authors test a mutually non-exclusive hypothesis (life-history hypothesis) that sex-specific selection pressures, at least to some extent, would further shape the timing of these life-history events. Authors use a metanalytic approach to collect data (mainly on rodents) on various life-history traits to test trade-offs among these traits between sexes and how they affect entry and termination of dormancy.

      Strengths:

      I found the theoretical background in the Introduction quite interesting, to the point and the arguments were well-placed. How sex-specific selection pressures would drive entry and termination of diapause in insects (e.g. protandry), especially in temperate butterflies, is very well investigated. Authors attempt to extend these ideas to endotherms and trying to find general patterns across ectotherms and endotherms is particularly exciting. This work and similar evidence could make a great contribution to the life-history theory, specifically understanding factors that drive the regulation of life cycle timing.

      Weaknesses:

      (1) I felt that including 'ectotherms' in the title is a bit misleading as there is hardly (in fact any?) any data presented on ectotherms. Also, most of the focus of the discussion is heavily mammal (rodent) focussed. I believe saying endotherms in the title as well is a bit misleading as the data is mammalfocused.

      We change the title to : "Evolutionnary trade-offs in dormancy phenology". This is a hybrid article comprising both a meta-analysis and a literature review. Each of these parts brings new elements to the hypotheses presented. The statistical analyses only concern mammals and especially rodent species. But the literature review highlighted links between the evolution of dormancy in ectotherms and endotherms that have not been linked in previous studies. We feel it is important for readers to know that much of the discussion will focus on the comparison of these two groups. But we understand that placing the term ectotherms in the title might suggest a meta-analysis including these two groups.

      In addition, we indicated more specifically in the abstract and at the end of the introduction that the article includes two approaches associated with different groups of animals.

      We also specified in the section « review criteria » that:

      Only one bird species is considered to be a hibernator, and no information is available on sex differences in hibernation phenology (Woods and Brigham 2004, Woods et al. 2019).

      We have also added a "study limitations" section, which explains that although the meta-analysis is limited by the data available in the literature, the information available for the species groups not studied seems to support our results.

      (2) I think more information needs to be provided early on to make readers aware of the diversity of animals included in the study and their geographic distribution. Are they mostly temperate or tropical? What is the span of the latitude as day length can have a major influence on dormancy timings? I think it is important to point out that data is more rodent-centric. Along the line of this point, is there a reason why the extensively studied species like the Red Deer or Soay Sheep and other well-studied temperate mammals did not make it into the list?

      We specified in the abstract and at the end of the introduction that the species studied in the metaanalysis are mainly Holarctic species. We have also added a map showing all the study sites used in the meta-analysis. Finally, we've noted in the methods and added a "study limitation" section at the end of the discussion an explanation for those species that were not studied in the meta-analysis and the consequences for the interpretation of results

      The hypotheses developed in this article are based on the survival benefits of seasonal dormancy thanks to a period of complete inactivity lasting several months. The Red Deer or Soay Sheep remain active above ground throughout the year.

      The effect of photoperiod on phenology is one of the mechanisms that has evolved to match an activity with the favorable condition. In this study, we are not interested in the mechanisms but in the evolutionary pressures that explain the observed phenology. Interspecific variation in the effect of photoperiod results from different evolutionary pressures, which we are trying to highlight. It is therefore not necessary to review mechanisms and effects of photoperiod, themselves requiring a lengthy review.

      We also tested the “physiological constraint hypothesis” on several variables. Temperature and precipitation are factors correlated with sex differences in phenology of hibernation. These factors allow consideration of the geographical differences that influence hibernation phenology.

      (3) Isn't the term 'energy limitation hypothesis' which is used throughout the manuscript a bit endotherm-centric? Especially if the goal is to draw generalities across ectotherms and endotherms. Moreover, climate (e.g. interaction of photoperiod and temperature in temperatures) most often induces or terminates diapause/dormancy in ectotherms so I am not sure if saying 'energy limitation hypothesis' is general enough.

      We renamed this hypothesis the "physiological constraint hypothesis" and we have made appropriate changes in the text so as not to focus physiological constraints solely on energy aspects.

      (4) Since for some species, the data is averaged across studies to get species-level trait estimates, is there a scope to examine within population differences (e.g. across latitudes)? This may further strengthen the evidence and rule out the possibility of the environment, especially the length of the breeding season, affecting the timing of emergence and immergence.

      For a given species, data on hibernation phenology are averaged for different populations, but also for the same population when measurements are taken over several years. To test these hypotheses on a population scale, precise data on reproductive effort would be needed for each population tested, but this concerns very few species (less than 5).

      Testing the effects of temperature and precipitation allows us to take into account the effects of climate on phenology.

      (5) Although the authors are looking at the broader patterns, I felt like the overall ecology of the species (habitat, tropical or temperate, number of broods, etc.) is overlooked and could act as confounding factors.

      Yes, that's why we also tested the physiological constraints hypothesis, including the effect of temperature and precipitation. For the life-history hypothesis, we also tested reproductive effort, which takes into account the number of offspring per year.

      (6) I strongly think the data analysis part needs more clarity. As of now, it is difficult for me to visualize all the fitted models (despite Table 1), and the large number of life-history traits adds to this complexity. I would recommend explicitly writing down all the models in the text. Also, the Table doesn't make it clear whether interaction was allowed between the predictors or not. More information on how PGLS were fitted needs to be provided in the main text which is in the supplementary right now. I kept wondering if the authors have fit multiple models, for example, with different correlation structures or by choosing different values of lambda parameter. And, in addition to PGLS, authors are also fitting linear regressions. Can you explain clearly in the text why was this done?

      To simplify the results, we reduced the number of models to just three: one for emergence and two for immergence. In place of Table 1, we have written the structure of the models used. We have added a sentence to the statistics section: “each PGLS model produces a λ parameter representing the effect of phylogeny ranging between 0 (no phylogeny effect) and 1 (covariance entirely explained by co-ancestry)”. We have tested only three PGLS models and the estimated lambda value for these models is 0.

      (7) Figure 2 is unclear, and I do not understand how these three regression lines were computed. Please provide more details.

      We tested new models and modified existing figures.

      Reviewer #2 (Public Review):

      Summary:

      An article with lots of interesting ideas and questions regarding the evolution of timing of dormancy, emphasizing mammalian hibernation but also including ectotherms. The authors compare selective forces of constraints due to energy availability versus predator avoidance and requirements and consequences of reproduction in a review of between and within species (sex) differences in the seasonal timing of entry and exit from dormancy.

      Strengths:

      The multispecies approach including endotherms and ectotherms is ambitious. This review is rich with ideas if not in convincing conclusions.

      Weaknesses:

      The differences between physiological requirements for gameatogenesis between sexes that affect the timing of heterothermy and the need for euthermy during mammalian hibernator are significant issues that underlie but are under-discussed, in this contrast of selective pressures that determine seasonal timing of dormancy. Some additional discussion of the effects of rapid climate change on between and within species phenologies of dormancy would have been interesting.

      Reviewer #2 (Recommendations For The Authors):

      This review provides a very interesting and ambitious among and within-species comparison of the seasonal timing of entry and exit from dormancy, emphasizing literature from hibernating mammals (sans bats and bears) and with attention to ectotherms. The authors test hypotheses related to the timing of food availability (energy) versus life history considerations (requirements for reproduction, avoiding predation) while acknowledging that these are not mutually exclusive. I offer advice for clarifications and description of the limitations of the data (accuracy of emergence and immergence times), but mainly seek more emphasis for small mammalian hibernators on the contrast for requirements for significant periods of euthermy prior to the emergence in males versus females, a contrast that has energetic and timing consequences in both the active and hibernation seasons.

      A consideration alluded to but not fully explained or discussed is the differences in mammals between species and sexes in the timing of what can be called ecological hibernation, which is the seasonal duration that an animal remains sequestered in its burrow or den, and heterothermic hibernation, between the beginning and end of the use of torpor. The two are not synonymous. When "emergence" is the first appearance above ground, there is a significant missing observation key to the energetic contrasts discussed in this review, that of this costly pre-emergence behavior.

      To explain the difference between heterothermic hibernation and ecological hibernation, we've added a section in review Criteria from materials and methods :

      “In this study, we addressed what can be called ecological hibernation, i.e. the seasonal duration that an animal remains sequestered in its burrow or den, which is assumed to be directly linked to the reduced risk of predation. In contrast, we did not consider heterothermic hibernation, which corresponds to the time between the beginning and end of the use of torpor. So when we mention hibernation, emergence or immergence, the specific reference is to ecological hibernation.”

      In arctic and other ground squirrel species, males remain at high body temperatures after immerging and remaining in their burrows in the fall for several days to a week, and more consistently and importantly, males that will attempt to breed in the spring end torpor but remain constantly in their burrows for as much as one month at great expense whilst undergoing testicular growth, spermatogenesis, spemiation, and sperm capacitation, processes that require continuous euthermy. Female arctic ground squirrels and non-breeding males do not and typically enter their first torpor bout 1-2 days after immergence and first appear above ground 1-3 days after their last arousal in spring.

      The weeks spent euthermic in a cold burrow in spring by males while undergoing reproductive maturation require a significant energetic investment (can equate to the cost of the previous heterothermic period) that contrasts profoundly with the pre-mating energetic investment by females.

      Males cache food in their hibernacula and extend their active season in late summer/fall in order to do so and feed from these caches in spring after resuming euthermy, often emerging at body weights similar to that at immergence. Similar between-sex differences in the timing of hibernation and heterothermy occur in golden-mantled and Columbian ground squirrels and likely most other Urocitellus spp., though less well described in other species. These differences are related to life histories and requirements for male vs. female gameatogenesis and, at the same time, energetic considerations in the costs to males for remaining euthermic while undergoing spermatogenesis and the cost related to whether males undergo gonadal development being dependent on individual body mass and cache size. These issues should be better discussed in this review.

      It is the time required to complete spermatogenesis, spermiation, and maturation of sperm not the time for growth of different sizes of testes that drives the preparation time for males. This is relatively constant among rodents. I challenge the assumption that larger testes take longer to grow than smaller ones.

      We took this comment into account. As we found little evidence of an increase in testicular maturation time with relative testicular size (apart from table 4 in Kenagy and Trombulak, 1986), we no longer tested the effect of relative testicular size on protandry.

      We examined whether the ability to store food before hibernation might reduce protandry. Although food storage in the burrow may be favored for overcoming harsh environments or predation, model selection did not retain the food-storing factor. Thus, the ability to accumulate food in the burrow was not by itself likely to keep males of some species from emerging earlier (e.g. Cricetus cricetus, protandry : 20 day, Siutz et al., 2016). Early emerging males may benefit from consuming higher quality food or in competition with other males (e.g., dominance assertion or territory establishment, Manno and Dobson 2008).

      We developed these aspects in the discussion

      While it is admirable to include ectotherms in such a broad review and modelling, I can't tell what data from how many ectothermic species contributed to the models and summary data included in the figures.

      Too few data on ectotherms were available to include ectotherms in the meta-analysis

      Some consideration should be made to the limitations of the data extracted from the literature of the accuracy of emergence and immergence dates when derived from only observations or trapping data. The most accurate results come from the use of telemetry for location and data logging reporting below vs. above ground positioning and body temperature.

      We added a "study limits" section to the discussion to address all the limits in this commentary.

      L64 "favor reproduction", better to say "allow reproduction", since there is strong evolutionary pressure to initiate reproduction early, often anticipating favorable conditions for reproduction, to maximize the time available for young to grow and prepare for overwintering themselves.

      Also, generally, it is not how "harsh" an environment is but rather how short the growing season is.

      We took this comment into account.

      L80 More simply, individuals that have amassed sufficient energy reserves as fat and caches to survive through winter may opt to initiate dormancy. This may decrease but not obviate predation, since hibernating animals are dug from their burrows and eaten by predators such as bears and ermine.

      In this sentence, we indicated a gap between dormancy phenology and the growing season, which suggests survival benefits of dormancy other than from a physiological point of view. We've changed the sentence to make it clearer : “However, some animals immerge in dormancy while environnemental conditions would allow them (from a physiological point of view) to continue their activity, suggesting other survival benefits than coping with a short growing season”

      L88 other physiological or ecological factors.... (gameatogenesis).

      In this study, we examine possible evolutionary pressures and therefore the environmental factors that may influence hibernation phenology. We focus on reproductive effort because, assuming predation pressure, we would expect a trade-off between survival and reproduction.

      L113 beginning early to afford long active seasons to offspring while not compromising the survival of parents.

      We added to the sentence:

      “For females, emergence phenology may promote breeding and/or care of offspring during the most favorable annual period (e.g., a match of the peak in lactational energy demand and maximum food availability, Fig. 1) or beginning early to afford long active seasons to offspring while not compromising the survival of parents.”

      L117 based on adequate preparation for overwintering and enter dormancy....

      We modified the sentence as follows :

      recovering from reproduction, and after acquiring adequate energy stores for overwintering”

      L123 given that males outwardly invest the least time in reproduction yet generally have shorter hibernation seasons would seem to reject this hypothesis. This changes if you overtly include the time and energy that males expend while remaining euthermic preparing for hibernation, a cost that can be similar to energy expended during heterothermy.

      Males invest a lot of time in reproduction before females emerge (whether for competition or physiological maturation) and some males seem to be subject to long-term negative effects linked to reproductive stress (see Millesi, E., Huber, S., Dittami, J., Hoffmann, I., & Daan, S. (1998). Parameters of mating effort and success in male European ground squirrels, Spermophilus citellus. Ethology, 104(4), 298-313). Both processes may contribute to reducing the duration of male hibernation.

      L125 again, costs to support euthermy in males undergoing reproductive development is an investment in reproduction.

      You're right, but it's difficult to quantify. We tested a model that takes into account the reproductive effort during reproduction and prior to reproduction. We also considered the hypothesis that species living in a cold climate might have a low protandry while having a high reproductive effort due to their ability to feed in the burrow (interaction effect between reproductive effort and temperature). We think these changes answer your comment.

      L134 It isn't growing large testes that takes time, but instead completing spermatogenesis and maturation of sperm in the epdidymides.

      We removed this part.

      L140 Later immergence in male ground squirrels is related to accumulation and defense of cached food, activities that are related to reproduction the next spring. An experimental analysis that would be revealing is to compare immergence times in females that completed lactation to the independence of their litters vs. females that did not breed or lost their litters. Who immerges first?

      Body mass variation from emergence to the end of mating in males seems to explain the delayed immergence of males in species that don't hide food in their burrows for hibernation. For example, in spermophilus citellus, males immege on average more than 3 weeks after females, yet they do not hide food in their burrows for the winter.

      Such a study already exists and shows that non-breeding females immerge earlier than breeding females. We refer to it

      L386: “In mammals, males and females that invest little or not at all in reproduction exhibit advances in energy reserve accumulation and earlier immergence for up to several weeks, while reproductive congeners continue activity (Neuhaus 2000, Millesi et al. 2008a).”

      L164 So you examined literature from 152 species but included data from only 29 species? Did you include data from social hibernators (marmots) that mate before emergence?

      With current models, we have 28 different species. We have few species because very few have data on both sex difference data and information on reproductive effort data (especially for males).

      Data on sex differences in hibernation were not available for social hibernating species.

      L169 Were these data from trapping or observation results? How reliable are these versus the use of information from implanted data loggers or collars that definitively document when euthermy is resumed and/or when immergence and first emergence occurs (through light loggers)?

      We did not focus heterothermic hibernation, but in ecological hibernation. We have no idea of the margin of error for these types of data, but we have discussed these limitations in the "Study limitations" section.

      L180, again, it is the time required to complete spermatogenesis and spermiation not the time for the growth of different sizes of testes that drives the preparation time for males. This is relatively constant among rodents. I challenge the assumption that larger testes take longer to grow than smaller ones.

      We removed this part.

      L200 Males that accumulate caches in fall and then feed from those during the spring pre-emergence euthermic interval and after will often be at their seasonal maximum in body mass. Declining from that peak may not be stressful.

      It has been suggested that reproductive effort in Spermophilus citellus might induce long-term negative effects that delay male immergence.

      Millesi, E., Huber, S., Dittami, J., Hoffmann, I., & Daan, S. (1998). Parameters of mating effort and success in male European ground squirrels, Spermophilus citellus. Ethology, 104(4), 298-313.

      L210 How about altitude, which affects the length of the growing season at similar latitudes?

      We extracted the location of each study site to determine the temperature and precipitation at that precise location (based on interpolated climate surface). We therefore take into account differences in growing season (based on temperature) in altitude between sites.

      L267 How did whether males cache food or not figure into these comparisons? Refeeding before mating occurs during the pre-emergence euthermic interval.

      We removed this part.

      L332, 344 not a "proxy" but functionally related to advantages in mating systems with multiple mating males.

      We removed this part.

      L353 The need for a pre-emergence euthermic interval in male ground squirrels requires costs in the previous active season in accumulating and defending a cache and the proximal costs in spring while remaining at high body temperatures prior to emergence with resulting loss in body mass or devouring of the cache.

      You're right, but in this section, we quickly explain the benefits of food catching compared with other species that don't do so.

      L385 This review should discuss why females are not known to cache and contrast as "income breeders" from "capital breeder" males. What advantages of caches are females indifferent to (no need for a prolonged pre-emergence period) and what costs of accumulating caches do they avoid (prolonged activity period and defense of caches).

      We clarified the case of female emergence.

      L321 : “Thus, an early emergence of males may have evolved in response to sexual selection to accumulate energy reserve in anticipation of reproductive effort. Females, on the contrary, are not subject to intraspecific competition for reproduction and may have sufficient time before (generally one week after emergence) and during the breeding period to improve their body condition.”

      L388 I don't understand the logic of the conclusion that "did not ...adequately explain the late male immergence" in this section. The greater mass loss in males over the mating period is afforded by the presence of a cache that requires later immergence.

      We removed this part.

      L412 Not just congeners that invest less in reproduction, but within species individuals that do not attempt to breed in one or more years and thus have no reproductive costs should be an interesting comparison for differences in phenology from individuals that do breed. Non-breeders are often yearlings but can be a significant overall proportion of males that fail to fatten or cache enough to afford a pre-emergence euthermic period.

      L385: “In mammals, males and females that invest little or not at all in reproduction exhibit advances in energy reserve accumulation and earlier immergence for up to several weeks, while reproductive congeners continue activity (Neuhaus 2000, Millesi et al. 2008a).”

      The sentence refers to individuals who reproduce little or not at all.

      L445 Males that gain weight between emergence and mating may do so by feeding from a cache regardless of how "harsh" an environment is.

      We observe this phenomenon even in species that are not known to hoard food

      “Gains in body mass observed for some individuals, even in species not known to hoard food, may indicate that the environment allows a positive energy balance for other individuals with comparable energy demands.”

      L492 Some insects retreat to refugia in mid-summer to avoid parasitism (Gynaephora).

      Escape from parasites is also a benefit of dormancy.

      Fig 1 - It is difficult to see the differences in black and green colors, esp if color blind.<br /> Maternal effort is front-loaded within the active season (line for "optimal period" shown in midseason).

      Add "energy" underneath c) Prediction (H1) and "reproduction" underneath d) "Prediction (H2). Explain the orange vs black, green colors of triangles.

      We made the necessary changes

      Fig 2 - I don't buy the regression lines as significant in this figure. The red line, cannot have a regression with two sample points and without the left-hand most dot, nothing is significant.

      We deleted this graph.

      Fig 3 - females only?

      We deleted this graph.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Below I summarize points that should be addressed in a revised version of the manuscript.

      • Page 6, first paragraph: I don't understand by the signals average out to a single state. If the distribution is indeed randomly distributed, a broad signal with low intensity should be present.

      We agree that this statement may cause confusion. We changed the text (marked in bold) to clarify the statement: The mobility of the undocked SBDs will be higher than the diffusion of the whole complex, allowing the sampling of varying interdomain distances within a single burst. However, these dynamic variations are subsequently averaged to a singular FRET value during FRET calculations for each burst, and may appear as a single low FRET state in the histograms.

      • Page 6, third paragraph: how can the donor only be detected in the acceptor channel? Is this tailing out?

      Donor only signal is not detected in the acceptor channel. As described in page 5 and in the Materials & Methods section, the dye stoichiometry value is defined for each burst/dwell using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      When no acceptor fluorophore is present FAA=0 and S=1.

      Some donor photons bleed through into the acceptor channel, but we correct for this by calculating the leakage and crosstalk factors as described in the Materials and Methods (page 20).

      We changed the text (marked in bold) in the manuscript to address the question: The FRET data of both OpuA variants is best explained by a four-state model (Figure 2A,B; fourth and fifth panel) (Supplementary File 3). Two of the four states represent donor-only (S≈1) or acceptor-only (S≈0) dwells. The full bursts belonging to donor-only and acceptor-only molecules were excluded prior to mpH2MM. This means that some molecules transit to a donor-only or acceptor-only state within the burst period, which most likely reflects blinking or bleaching of one of the fluorophores. These donoronly and acceptor-only states were also excluded during further analysis. The other two states reflect genuine FRET dwells that were analyzed by mpH2MM. They represent different conformations of the SBDs.

      • Page 7, "SBD dynamics ..": why was the V149Q mutant only analyzed in the K521C background and not also in the N414C background?

      The two FRET states were best distinguished in OpuA-K521C. Therefore, we decided to focus on OpuA-K521C and not OpuA-N414C. OpuA-V149Q was used to show that reduced docking efficiency does not affect the transition rate constants and relative abundances of the two FRET states, and we regarded it sufficient to test the SBD dynamics in OpuA-K521C only.

      • Page 8, second paragraph: why was the N414C mutant analyzed only from 0 - 600 mM and not also up to 1000 mM?

      In line with the previous answer, our main focus was on OpuA-K521C, since the two FRET states were best distinguished in OpuA-K521C. OpuA-N414C was used to prove that similar states are observed when measuring with fluorophores on the opposite site of the SBD. We studied how the FRET states change in response to different conditions that correspond to different stages of the transport cycle and how it changes in response to different ionic strengths. Initially, 600 mM KCl was used to study the dynamics of the SBD at high ionic strength. Later in this study, we tested a very wide range of different salt concentrations for OpuA-K521C to get detailed insights into the dynamics of the SBDs over a wide ionic strength range. Note that 1 M KCl is a very high, non-physiological ionic strength for the typical habitat of L. lactis and was only used to show that the high FRET state occurs even under very extreme conditions.

      • Page 8, third paragraph: why was the dimer (if it is the source of the FRET signal) only partially disrupted?

      We acknowledge that this is a very good point. However, we purposely did not speculate on this point in the manuscript, because we have limited information on the molecular details of the interaction. As we highlight on page 8, the SBDs experience each other in a very high apparent concentration (millimolar range). This means that the interactions are most likely very weak (low affinity) and not very specific. Such interactions are in the literature referred to as the quinary structure of proteins and they occur at the high macromolecular crowding in the cell and in proteins with tethered domains, and thus at high local concentrations. Such interactions can be screened by high ionic strength. In the revised manuscript, we now present the partially disrupted dimer structure in the context of the quinary structure of a protein (page 11):

      In other words, the high FRET state may comprise an ensemble of weakly interacting states rather than a singular stable conformation, resembling the quinary structure of proteins. The quinary structure of proteins is typically revealed in highly crowded cellular environments and describes the weak interactions between protein surfaces that contribute to their stability, function, and spatial organization (Guin & Gruebele, 2019). Despite the current study being conducted under dilute conditions, the local concentration of SBDs (~4 mM) mimics a densely populated environment and reveal quinary structure.

      • Page 9, second paragraph: according to the EM data processing, only 20% of the particles were used for 3D reconstruction. Why? Does it mean that the remaining 80% were physiologically not relevant? If so, why were the 20% used relevant?

      We note that it is a fundamental part of image processing of single particle cryo-EM data to remove false positives or low-resolution particles throughout the processing workflow. In particular when using a very low and therefore generous threshold during automated particle picking, as we did (t=0.01 and t=0.05 for the 50 mM KCl and 100 mM KCl datasets, respectively), the initial set of particles includes a significant amount of false positives – a tradeoff to avoid excluding particles belonging to low populated classes/orientations. It is thus common that more than 50% of ‘particles’ are excluded in the first rounds of 2D classification. In our case, only 30% and 52% of particles were retained after such first clean-up steps. Subsequently, the particle set is further refined, and additional false positives and low-resolution particles are excluded during extensive rounds of 3D classification. We also note that during the final steps, most of the data excluded represents particles of lower quality that do not contribute to a high-resolution, or belong to low population protein conformations. This does not mean that such a population is not physiological relevant. In conclusion, having only 5-20% of the initial automated picked particles contributing to the reconstruction of the final cryo-EM map is common, with the vast majority of excluded particles being false positives.

      • Page 11, third paragraph: the way the proposed model is selected is also my main criticism. All alternative models do not fit the data. Therefore, the proposed model is suggested. However, I do not grasp any direct support for this model. Either I missed it or it is not presented.

      Concerning the specific model in Figure 5, the reviewer is correct. We do not provide direct evidence for a side-ways interaction. However, we have evidence of transient interactions and our data rule out several scenarios of interaction, leaving 5C as the most likely model. This is also the main conclusion of this paper: In conclusion, the SBDs of OpuA transiently interact in a docking competent conformation, explaining the cooperativity between the SBDs during transport. The conformation of this interaction is not fixed but differs substantially between different conditions.

      Because the interaction is very short-lived it was not possible to visualize molecular details of this interaction. We present Figure 5 to hypothesize the most likely type of interaction, since many possibilities can be excluded with the vast amount of presented data. To make our point more clear that we discuss models and rule out several possibilities but not demonstrate a specific interaction between the SBDs, we now write on page 10 (changes marked in bold): We have shown that the SBDs of OpuA come close together in a short-lived state, which is responsive to the addition of glycine betaine (Figure 4A). Although the occurrence of the state varies between different conditions, it was not possible to negate the high-FRET state completely, not even under very high or low KCl concentrations, or in the presence of 50 mM arginine plus 50 mM glutamate (Figure 4A,B). To evaluate possible interdomain interactions scenarios we consider the following: (1) The SBDs of OpuA are connected to the TMDs with very short linkers of approximately 4 nm, which limit their movement and allow the receptor to sample a relatively small volume near its docking site. (2) in low ionic strength condition OpuA-K521C displays a high FRET state with mean FRET values of 0.7-0.8, which correspond to inter-dye distances of approximately 4 nm. (3) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (4) The distance between the density centers of the SBDs in the cryo-EM reconstructions (based on particles with a low and high FRET state) is 6 nm, which aligns with the dimensions of an SBD (length: ~6 nm, maximal width: ~4 nm). These findings collectively indicate that two SBDs interact but not necessarily in a singular conformation but possibly as an ensemble of weakly interacting states. Hence, we discuss three possible SBD-SBD interaction models to explain the highFRET state:

      Reviewer #2 (Recommendations For The Authors):

      In the abstract and elsewhere the authors suggest that the SBDs physically interact with one another, and that this interaction is important for the transport mechanism, specifically for its cooperativity.

      I feel that this main claim is not well established. The authors convincingly demonstrate that the SBDs largely occupy two states relative to one another and that in one of these states, they are closer than in the other. Unless I have missed (or failed to understand) some major details of the results, I did not find any evidence of a physical interaction. Have the authors established that the high FRET state indeed corresponds to the physical engagement of the SBDs? I feel that a direct demonstration of an interaction is much missing.

      Along the same lines, in the low-salt cryo-EM structure, where the SBDs are relatively closer together, the SBDs are still separated and do not interact.

      See also our response to the final comment of reviewer 1. Furthermore, please carefully consider the following: (1) FRET values of 0.7-0.8 correspond to inter-dye distances of approximately 4 nm. (2) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (3) The cryo-EM reconstruction is the average of all the particles in the final dataset, including both the particles with a low and high FRET state. Further, the local resolution of the SBDs in the cryo-EM map is low, indicative of high degree of flexibility. Thus, a potential interaction is possible within the observed range of flexibility. (4) The distance between the density centers is 6 nm, aligning with the dimensions of an SBD (length: 6 nm, maximal width: 4 nm). These factors collectively indicate SBD interactions, and we present these points now more explicitly in Figure 4 and the last part of the results section (page 9).

      Once the authors successfully demonstrate that direct physical interaction indeed occurs, they will need to provide data that places it in the context of the transport cycle. Do the SBDs swap ligand molecules between them? Do they bind the ligand and/or the transporter cooperatively? What is the role of this interaction?

      We acknowledge the intriguing nature of the posed questions, but they extend beyond the scope of this study. It is extremely challenging to obtain high-resolution structures of highly dynamic multidomain proteins, like OpuA, and to probe transient interactions as we do here for the SBDs of OpuA. We therefore combined cryo-TEM with smFRET studies and perform the most advanced and state-of-theart analysis tools as acknowledged by reviewer 1. We link our observations on the structural dynamics and interactions of the SBDs to a previous study, where we showed that the two SBDs of OpuA interact cooperatively. We do not have further evidence that connect the physical interactions to the transport cycle. In our view, the collective datasets indicate that the here reported physical interactions between the SBDs increase the transport efficiency.

      As far as I understand, the smFRET data have been interpreted on the basis of a negative observation, i.e., that it is "likely" that none of the FRET states corresponds to a docked SBD. To convincingly show this, a positive observation is required, i.e., observation of a docked state.

      The aim of this study was to study interdomain dynamics and not specifically docking. We have previously shown that docking can be visualized via cryo-EM (Sikkema et al., 2020), however the SBDs of OpuA appear to only dock in specific turnover conditions. We now show that the high FRET state of OpuA cannot represent a docked state, but that the SBDs transiently interact (see our response to the first comment). Importantly, a docked state was also not found in the cryo-EM reconstructions at low ionic strength, representing the smFRET conditions where we observe the interactions between the SBDs. The high FRET state occupies 30% of the dwells in this condition, and such a high percentage of molecules would have become apparent during cryo-EM 3D classification in case they would form a docked state. Therefore, we conclude that docking does not occur in low ionic strength apo condition. We discuss this point and our reasoning on page 11 of the revised manuscript.

      In this respect, I find it troubling that in none of the tested conditions, the authors observed a FRET state which corresponds to the docked state. Such a state, which must exist for transport to occur (as mentioned in the authors' previous publications), needs to be demonstrated. This brings me to my next question: why have the authors not measured FRET between the SBDs and the transporter? Isn't this a very important piece that is missing from their puzzle?

      We agree that investigating docking behavior under varied turnover conditions requires focused experiments on FRET dynamics between the SBDs and the transporter. As noted on page 5, OpuA exists as a homodimer, implying that a single cysteine mutation introduces two cysteines in a single functional transporter. To specifically implement a cysteine mutation in only one SBD and one transmembrane domain, it is necessary to artificially construct a heterodimer. We recently published initial attempts in this direction, and this will be a subject for future research but still requires years of work.

      Additionally, I feel that important controls are missing. For example, how will the data presented in Fig1 look if the transporter is labeled with acceptor or donor only? How do soluble SBDs behave?

      In the employed labeling method, donor and acceptor dyes are mixed in a 1:1 ratio and randomly attached to the two cysteines in the transporter. This automatically yields significant fractions of donor only and acceptor only transporters which are always present during the smFRET recordings. We can visualize those molecules on the basis of the dye stoichiometry, which we calculate by using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      Unfiltered plots look as follows (a dataset of OpuA-K521C at 600 mM KCl):

      Author response image 1.

      Donor only and acceptor only molecules have a very well discernible stoichiometry of 1 and 0, respectively. The filtering procedure is described in the materials and methods section, and these plots can be found in the supplementary database. We did not add them to the main text or supplementary materials of the original manuscript, as this is a very common procedure in the field of smFRET. We now include such a dataset in the revised manuscript.

      Soluble SBDs of OpuA have been studied previously (e.g. Wolters et al., 2010 & De Boer et al. 2019). For example, we have shown by SEC-MALLLS that soluble SBDs do not form dimers, which is consistent with our notion that the SBDs interact with low affinity. It is not possible to study interdomain dynamics between soluble SBDs by smFRET, because the measurements are carried out at picomolar concentrations (monomeric conditions). We emphasize that smFRET measurements with native complexes, with SBDs near each other at apparent millimolar concentrations, is physiologically more relevant.

      Additional comments:

      (1) "It could well be that cooperativity and transient interactions between SBDs is more common than previously anticipated" and a similar statement in the abstract. What evidence is there to suggest that the transient interactions between SBDs are a common phenomenon?

      On page 11, we write: Dimer formation of SBPs has been described for a variety of proteins from different structural clusters of substrate-binding proteins [33–38,51–53]. We cite 9 papers that report SBD/SBP dimers. This suggest to us that the phenomenon of interacting substrate-binding proteins could be more common. Moreover, the concentration of maltose-binding protein and other SBPs in the periplasm of Gram-negative bacteria can reach (sub)millimolar concentrations, and low-affinity interactions may play a role not only in membrane protein-tethered SBDs (like in OpuA) but also be important in soluble substrate-receptors. Such low-affinity interactions are rarely studied in biochemical experiments.

      (2) I think that the data presented in 1B-C better suits the supplementary information.

      Figure 1B-D is already a summary of the supplementary information that describes the optimization of OpuA purification. We think it is valuable to show this part of the figure in the main text. A very clean and highly pure OpuA sample is essential for smFRET experiments. Quality of protein preparations and data analysis are key for the type of measurements we report in this paper.

      (3) "the first peak in the SEC profile corresponds...." The peaks should be numbered in the figure to facilitate their identification.

      We have changed the figure as suggested.

      (4) "smFRET is a powerful tool for studying protein dynamics, but it has only been used for a handful of membrane proteins". With the growing list of membrane proteins studied by smFRET I find this an overstatement.

      We removed this sentence in the new version of the manuscript.

      (5) "We rationalized that docking of one SBD could induce a distance shift between the two SBDs in the FRET range of 3-10 nm (Figure 1E)" How and why was this assumed?

      We realize that this is one of the sentences that caused confusion about the aim of this study. In this part of the manuscript, we should not have used docking as an example and we apologize for that. We replaced the sentence by: These variants are used to study inter-SBD dynamics in the FRET range of 310 nm (Figure 1E).

      Also Figure 1E was adjusted to prevent confusion:

      Author response image 2.

      In addition, to avoid any confusion we changed the following sentence on page 4 (changes marked in bold): We designed cysteine mutations in the SBD of OpuA to study interdomain dynamics in the full length transporter.

      (6) "However, the FRET distributions are broader than would be expected from a single FRET state, especially for OpuA-K521C" Have the authors established how a single state FRET of OpuA looks? Is there a control that supports this claim?

      Below we compare two datasets from OpuA-K521C in 600 mM KCl with a typical smFRET dataset from the well-studied substrate-binding protein MBP from E. coli, which resides in a single state. Left: OpuA-K521C; Right: MBP

      Author response image 3.

      We agree that this cannot be assumed from the presented data. Therefore we rewrote this sentence: However, the FRET distributions tail towards higher FRET values, especially OpuA-K521C.

      (7) "V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the intrinsic transport and ATP hydrolysis efficiency intact." I find this statement confusing: How can a mutation reduce docking efficiency yet leave the transport activity unchanged?

      We rewrote the sentences (changes marked in bold): V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the ionic strength sensing in the NBD and the binding of glycine betaine and ATP intact. Accordingly, a reduced docking efficiency should result in a lower absolute glycine betaine-dependent ATPase activity. At the same time the responsiveness of the system to varying KCl, glycine betaine, or Mg-ATP concentrations should not change.

      (8) Along the same lines: "whereas the glycine betaine-, Mg-ATP-, or KCl-dependent activity profiles remain unchanged" vs. "OpuA-V149Q-K521C exhibited a 2- to 3-fold reduction in glycine betainedependent ATPase activity".

      See comment at point 7.

      (9) In general, I find the writing wanting at places, not on par with the high standards set by previous publications of this group.

      We recognize the potential ambiguity in our phrasing. We hope that after incorporating the feedback provided by the reviewers our manuscript will convey our findings in a clearer manner.

      Extra changes to the text:

      (1) Title changed: The substrate-binding domains of the osmoregulatory ABC importer OpuA physically transiently interact

      (2) Second part of the abstract changed: We now show, by means of solution-based single-molecule FRET and analysis with multi-parameter photon-by-photon hidden Markov modeling, that the SBDs transiently interact in an ionic strength-dependent manner. The smFRET data are in accordance with the apparent cooperativity in transport and supported by new cryo-EM data of OpuA. We propose that the physical interactions between SBDs and cooperativity in substrate delivery are part of the transport mechanism.

      (3) Page 6, third paragraph and Figure 2B: the wrong rate number was extracted from table 1. Changed this in the text and figure: 112 s-1  173 s-1. It did not affect any of the interpretations or conclusions.

      (4) Page 8, last paragraph, changed: smFRET was also performed in the absence of KCl and with a saturating concentration of glycine betaine (100 µM). The mean FRET efficiency of the highFRET state of OpuA-K521C increased to 0.78, which corresponds to an inter-dye distance of about 4 nm. This indicates that the dyes at the two SBDs move very close towards each other (Figure 4A) (Table 1) (Supplementary File 34).

      (5) Page 9, second paragraph changed: Due to the inherent flexibility of the SBDs, with respect to both the MSP protein of the nanodisc and the TMDs of OpuA, their resolution is limited. Furthermore, the cryo-EM reconstructions average all the particles in the final dataset, including those with a low and high FRET state. Nevertheless, in both conditions, the densities that correspond to the SBDs can be observed in close proximity (Figure 4D). The distance between the density centers is 6 nm and align with the dimensions of an SBD, providing further evidence for physical interactions between the SBDs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful to the Editors for overseeing the review of our manuscript, and to the two reviewers for their thoughtful comments and suggestions for how it can be improved.

      I submit at this time a revision, as well as a detailed response (below) to each of the points raised in the first round of review.

      We feel the manuscript has been significantly improved by taking the reviewers' comments to heart. In a nutshell, we added new key pieces of data (impact of WIN site inhibition on global translation, rRNA production, as well as the requested cell biology analyses showing nucleolar stress), new analyses of the proteomics to counter potential concerns with normalization, and expanded/revised verbiage in key areas to clarify parts of the text that were confusing or problematic. The main figures have not changed; all new material is included in supplements to figures 2 and 3.

      Public Reviews

      Reviewer #1 (Public Review):

      Building on previous work from the Tansey lab, here Howard et al. characterize transcriptional and translational changes upon WIN site inhibition of WDR5 in MLL-rearranged cancer cells. They first analyze whether C16, a newer generation compound, has the same cellular effects as C6, an early generation compound. Both compounds reduce the expression of WDR5-bound RPGs in addition to the unbound RPG RPL22L1. They then investigate differential translation by ribo-seq and observe that WIN site inhibition reduces the translational RPGs and other proteins related to biomass accumulation (spliceosome, proteasome, mitochondrial ribosome). Interestingly, this reduction adds to the transcriptional changes and is not limited to RPGs whose promoters are bound by WDR5. Quantitative proteomics at two-time points confirmed the downregulation of RPGs. Interestingly, the overall effects are modest, but RPL22LA is strongly affected. Unexpectedly, most differentially abundant proteins seem to be upregulated 24 h after C6 (see below). A genetic screen showed that loss of p53 rescues the effect of C6 and C16 and helped the authors to identify pathways that can be targeted by compounds together with WIN site inhibitors in a synergistic way. Finally, the authors elucidated the underlying mechanisms and analyzed the functional relevance of the RPL22, RPL22L1, p53, and MDM4 axis.

      While this work is not conceptually new, it is an important extension of the observations of Aho et al. The results are clearly described and, in my view, very meaningful overall.

      Major points:

      (1) The authors make statements about the globality/selectivity of the responses in RNA-seq, ribo-seq, and quantitative proteomics. However, as far as I can see, none of these analyses have spike-in controls. I recommend either repeating the experiments with a spike-in control or carefully measuring transcription and translation rates upon WIN site inhibition and normalizing the omics experiments with this factor.

      The reviewer is correct that we did not include spike-in controls in our omics experiments. We would like to emphasize that none of the omics data in this manuscript have been processed in unorthodox ways, and that the major conclusions each have independent corroborating data.

      The selectivity in RPG suppression observed in RNA-Seq, for example, is supported by results from our target engagement (QuantiGene) assays; suppression of RPL22L1 mRNA levels is supported by quantitative and semi-quantitative RT-PCR, by western blotting, and by the results of our proteomic profiling; alternative splicing (and expression) of MDM4—and its dependency on RPL22—is also backed up by similar RT-PCR and western blotting data. The same applies for alternative splicing of RPL22L1.

      That said, we do appreciate the point the reviewer is making here, and have done our best to respond. We do not think it is a prudent investment in resources to repeat the numerous omics assays in the manuscript. We also considered normalizing for bulk transcription and translation rates as suggested, but it is not clear in practice how this would be done, and it could introduce additional variables and uncertainties that may skew the interpretation of results. Instead, to respond to this comment, we made the following changes to the manuscript:

      (1) We now explicitly state, for all omics assays, that spike-in controls were not included. These statements will prompt the reader to make their own assessment of the robustness of each of our findings and interpretations.

      (2) We have added new data to the manuscript (Figure 2—figure supplement 1A–B) measuring the impact of C6 and C16 on bulk translation using the OPP labeling method. These new data demonstrate that WIN site inhibitors induce a progressive yet modest decline in protein synthesis capacity. At 24 hours, there is no significant effect of either agent on protein synthesis levels. By 48 hours, a small but significant effect is observed, and by 96 hours translation levels are ~60% of what they are in vehicle-treated control cells. These new data are important because they support the idea that normalization has not blunted the responses we observe—the magnitude of the effects are consistent between the different assays and tend to cap out at two-fold in terms of RPG suppression, translation efficiency, ribosomal protein levels, and protein synthesis capacity.

      (3) We have included additional analysis regarding the LFQMS, as described below, that specifically addresses the issue of normalization in our proteomics experiments.

      (2) Why are the majority of proteins upregulated in the proteomics experiment after 24 h in C6 (if really true after normalization with general protein amount per cell)? This is surprising and needs further explanation.

      The reviewer is correct in noting that (by LFQMS) ~700 proteins are induced after 24 hours of treatment of MV4:11 cells with C16 (not C6, as stated). The reviewer would like us to examine whether this apparent increase in proteins is a normalization artifact. In response to this comment, we have made the following changes to the manuscript:

      (1) Our new OPP labeling experiments (Figure 2—figure supplement 1A–B) show that there is no significant reduction in overall protein synthesis following 24 hours of C16 treatment. In light of this finding, it is unlikely that normalization artifacts, resulting from diminution of the pool of highly abundant proteins, create the appearance of these 700 proteins being induced. We now explicitly make this point in the text.

      (2) We now clarify in the methods how we seeded identical numbers of cells for DMSO and C16-treated cultures in these experiments, and—consistent with our finding that WIN site inhibitors have little if any effect on protein synthesis or proliferation at the 24 hour timepoint— extracted comparable amounts of proteins from these two treatment conditions (DMSO: 344.75 ± 21.7 µg; C16: 366.50 ± 15.8 µg; [Mean ± SEM]).

      (3) We now include in Figure 3—figure supplement 1A a plot showing the distribution of peptide intensities for each protein detected in each run of LFQMS before and after equal median normalization. This new analysis reveals that the distribution of intensities is not appreciably changed via normalization. Specifically, there is not a reduction in peptide intensities in the unnormalized data from 24 hours of C16 treatment that is reversed or tempered by normalization. This analysis provides further support for the notion that the increase we observe is not a normalization artifact.

      (4) We now include in Figure 3—figure supplement 1B–D a set of new analyses examining the relationship between the initial intensity of proteins in DMSO control samples (a crude proxy for abundance) versus the fold change in response to WIN site inhibitor. This analysis shows that we have as many "highly abundant" (10th decile) proteins increasing as we do decreasing in response to WINi. Thus, it appears as though the wholesale clearance of highly abundant proteins from the cell is not occurring at this early treatment timepoint. In addition, this analysis also shows that ribosomal proteins (RP) are generally the most abundant, most suppressed, proteins and that their fold-change at the protein level at 24 hours is less than two-fold, consistent again with the magnitude of transcriptional effects of C16, as measured by RNA-Seq and QuantiGene. The fact that the drop in RP levels is consistent with expectations based on other analyses provides further empirical support for the notion that protein levels inferred from LFQMS are authentic and not skewed by global changes in the proteome.

      The increase in proteins at this time point, we argue, is thus most likely genuine. It is not surprising that—at a timepoint at which protein synthesis is unaffected—several hundred proteins are induced by a factor of two. How this occurs, we do not know. It may be a transient compensatory mechanism, or it may be an early part of the active response to WIN site inhibitors. Lest the reader be confused by this finding, we have now added text to this section of the manuscript discussing and explaining the phenomenon in more detail.

      (3) The description of the two CRISPR screens (GECKO and targeted) is a bit confusing. Do I understand correctly that in the GECKO screen, the treated cells are not compared with nontreated cells of the same time point, but with a time point 0? If so, this screen is not very meaningful and perhaps should be omitted. Also, it is unclear to me what the advantages of the targeted screen are since the targets were not covered with more sgRNAs (data contradictory: 4 or 10 sgRNAs per target?) than in Gecko. Also, genome-wide screens are feasible in culture for multiple conditions. Overall, I find the presentation of the screening results not favorable.

      In essence, this is a single screen performed in two tiers. In Tier 1, we screened a complete GECKO library (six sgRNA/gene) with the earliest generation (less potent) inhibitor C6, and compared sgRNA representation against the time zero population. This screen would reveal sgRNAs that are specifically associated with response to C6, as well as those that are associated with general cell fitness and viability. We then identified genes connected to these sgRNAs, removed those that are pan essential, and built a custom library for the second tier using sgRNAs from the Brunello library (four sgRNA/gene). We then screened this custom library with both C6 and the more potent inhibitor C16, this time against DMSO-treated cells from the same timepoint.

      We acknowledge that this is not the most streamlined setup for a screen. But our intention was to compare two inhibitors (C6 and C16) and identify high confidence 'hits' that are disconnected from general cell viability, rather than generate an exhaustive list of all genes that, when disrupted, skew the response to WIN site inhibitor. The final result of this screen (Figure 4E) is a gene list that has been validated with two chemically distinct WIN site inhibitors and up to 10 unique sgRNAs per gene. We may not have captured every gene that can modulate response to WIN site inhibitor, but those appearing in Figure 4E are highly validated.

      To answer the reviewer's specific questions: (i) we cannot omit the Tier 1 screen because then there would be no rationale for what was screened in the second Tier; and (ii) the advantage of the custom Tier 2 library is that it allowed us to screen hits from the Tier 1 screen with four completely independent sgRNAs. Although there are not more sgRNAs for each gene in the Tier 2 versus the Tier 1 library, these sgRNAs are different and thus, for C6 at least, hits surviving both screens were validated with up to 10 unique sgRNAs.

      We apologize that the description of the CRISPR screens was not clearer, and have reworked this section of the manuscript to make our intent and our actions clearer.

      (4) Can Re-expression of RPL22 rescue the growth arrest of C6?.

      We have not attempted to complement the RPL22 knock out. But we do note that evidence supporting the idea that loss of RPL22 confers resistance to WIN site inhibitor is strong—six (out of six) sgRNAs against RPL22 were significantly enriched in the Tier 1 screen, and independent knock out of RPL22 with the Synthego multi-guide system in MV4;11 and MOLM13 cells increases the GI50 for C16.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Howard et al reports the development of high-affinity WDR5-interaction site inhibitors (WINi) that engage the protein to block the arginine-dependent engagement with its partners. Treatment of MLL-rearranged leukemia cells with high-affinity WINi (C16) decreases the expression of genes encoding most ribosomal proteins and other proteins required for translation. Notably, although these targets are enriched for WDR5-ChIP-seq peaks, such peaks are not universally present in the target genes. High concordance was found between the alterations in gene expression due to C16 treatment and the changes resulting from treatment with an earlier, lower affinity WINi (C6). Besides protein synthesis, genes involved in DNA replication or MYC responses are downregulated, while p53 targets and apoptosis genes are upregulated. Ribosome profiling reveals a global decrease in translational efficiency due to WINi with overall ribosome occupancies of mRNAs ~50% of control samples. The magnitude of the decrements of translation for most individual mRNAs exceeds the respective changes in mRNA levels genome-wide. From these results and other considerations, the authors hypothesize that WINi results in ribosome depletion. Quantitative mass spec documents the decrement in ribosomal proteins following WINi treatment along with increases in p53 targets and proteins involved in apoptosis occurring over 3 days. Notably, RPL22L1 is essentially completely lost upon WINi treatment. The investigators next conduct a CRISPR screen to find moderators and cooperators with WINi. They identify components of p53 and DNA repair pathways as mediators of WINi-inflicted cell death (so gRNAs against these genes permit cell survival). Next, WINi are tested in combination with a variety of other agents to explore synergistic killing to improve their expected therapeutic efficacy. The authors document the loss of the p53 antagonist MDM4 (in combination with splicing alterations of RPL22L1), an observation that supports the notion that WINi killing is p53-mediated.

      Strengths:

      This is a scientifically very strong and well-written manuscript that applies a variety of state-ofthe art molecular approaches to interrogate the role of the WDR5 interaction site and WINi. They reveal that the effects of WINi seem to be focused on the overall synthesis of protein components of the translation apparatus, especially ribosomal proteins-even those that do not bind WDR5 by ChIP (a question left unanswered is how much the WDR5-less genes are nevertheless WINi targeted). They convincingly show that disruption of the synthesis of these proteins is accompanied by DNA damage inferred by H2AX-activation, activation of the p53pathway, and apoptosis. Pathways of possible WINi resistance and synergies with other antineoplastic approaches are explored. These experiments are all well-executed and strongly invite more extensive pre-clinical and translational studies of WINi in animal studies. The studies also may anticipate the use of WINi as probes of nucleolar function and ribosome synthesis though this was not really explored in the current manuscript.

      Weaknesses:

      A mild deficiency in the current manuscript is the absence of cell biological methods to complement the molecular biological and biochemical approaches so ably employed. Some microscopic observations and confirmation of nucleolar dysfunction and DNA damage would be reassuring.

      We thank the reviewer for their comments. We agree that an absence of cell biological methods was a deficiency in the original manuscript. In response to this comment, we have now added immunofluorescence (IF) analyses, examining the impact of C16 on nucleolar integrity and nucleophosmin (NPM1) distribution (Figure 3—figure supplement 4). These new data clearly show that C16 induces nucleolar stress at 72 hours—as measured by the redistribution of NPM1 from the nucleolus to the nucleoplasm. These new data fill an important gap in the story, and we are grateful to the reviewer for prompting us to perform these experiments.

      As part of the above study, we also probed for gamma-H2AX, expecting that we may see some signs of accumulation in the nucleoli (see comment #4 from Reviewer #2, below). We did not observe this response. Importantly, however, we did see that gamma-H2AX staining occurs only in what are overtly apoptotic cells. This is an important finding, because we had previously speculated that the induction of gamma-H2AX observed by Western blotting reflected part of a bona-fide response to DNA damage elicited by WIN site inhibitors. Instead, the IF data now leads us to conclude that this signal simply reflects the established fact that WIN site inhibitors induce apoptosis in this cell line (Aho et al., 2019). In response to this new finding, we have added additional discussion to the text and have removed or de-emphasized the potential contribution of DNA damage to the mechanism of action of WDR5 WIN site inhibitors. Again, we are grateful for this comment as it has prevented us from continuing to report/pursue erroneous observations.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      There is a typo in "but are are linked to mRNA instability when translation is inhibited".

      Thank you for catching this typo. It has now been corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors report that WINi initially (at 24 hrs) increases the expression of most proteins while decreasing ribosomal proteins, but at 72 hours all proteins are depressed. The transient bump-up of non-translation-related proteins seems odd. A simple resolution to this somewhat strange observation is that there is no real increase in the other proteins, but because of the loss of a large fraction of the most abundant cellular proteins (the ribosomal proteins), the relative fraction of all other proteins is increased; that is, the increase of non-ribosomal proteins may be an artifact of normalization to a lower total protein content. Can this be explored?

      We are grateful to the reviewer for this comment. We have tried our best to respond, as detailed above in response to Reviewer #1 Public Comment #2.

      (2) It would be really nice to assess nucleolar status microscopically. Do nucleoli get bigger? Smaller? Do they have abnormal morphology? Is there nucleolar stress? What happens to rRNA synthesis and processing?

      We agree and thank the reviewer for raising this point. As noted in our response to Reviewer #2, above, we have included new IF that shows: (i) no obvious effect on nucleolar integrity, (ii) redistribution of NPM1 to the nucleoplasm (indicative of nucleolar stress), and (iii) induction of gamma-H2AX staining in apoptotic cells (indicative of apoptosis).

      Additionally, in response to this comment, we also looked at the impact of WIN site inhibitors on rRNA synthesis, using AzCyd labeling. These new data appear in Figure 3—figure supplement 3. Interestingly, these new data show that there is a progressive decline in rRNA synthesis, and that by 96 hours of treatment levels of both 18S and 28S rRNAs are reduced— again by about a factor of two. Our interpretation of this finding is that in response to the progressive decline in RPG transcription there is a secondary decrease in rRNA synthesis. This result is perhaps not surprising, but it does again add an important missing piece to our characterization of WIN site inhibitors and is further support for the concept that inhibition of ribosome production is a dominant part of the response to these agents.

      (3) The WINi elicited DNA damage is incompletely characterized, rather it is inferred from H2AX activation. Comet assays would help to confirm such damage.

      As noted in our response to Reviewer #2, our original inference of DNA damage, prompted by gamma-H2AX activation, is erroneous, and due instead to the ability of WIN site inhibitors to induce apoptosis. We thus did not pursue comet assays, etc., and removed discussion of potential DNA damage from the manuscript.

      (4) Staining and microscopic observation of H2AX would be very useful. Is the WINi provoked DNA damage nucleolar-localized? Does the deficiency of ribosomal proteins lead to localized genotoxic nucleolar stress - or alternatively does the paucity of ribosomes and decreased translation lead to imbalances in other cellular pathways, perhaps including some involved in overall genome maintenance which would provoke more global DNA damage and H2AX staining, not limited to the nucleolus.

      Again, please see our response to the Public Comment from Reviewer #2.

      (5) It would be important to assess the influence and effects of WINi on some p53 mutant, p53-/- and p53 wild-type cell lines. Given their prevalence, p53 status may be expected to alter WINi efficacy.

      The issue of how p53 status impacts the response to WINi is interesting and important, but we feel this is beyond the scope of the current manuscript. It is likely that many factors contribute to the response of cancer cells to these agents, and thus simply surveying some cancer lines for their response and linking this to their p53 status is unlikely to be very informative. Making definitive statements about the contribution of p53, and the differences between wild-type, lossof-function mutants, gain of function mutants, and null mutants will require more extensive analyses and is fertile territory for future studies, in our opinion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a useful study examining the determinants and mechanisms of LRMP inhibi:on of cAMP regula:on of HCN4 channel ga:ng. The evidence provided to support the main conclusions is unfortunately incomplete, with discrepancies in the work that reduce the strength of mechanis:c insights.

      Thank you for the reviews of our manuscript. We have made a number of changes to clarify our hypotheses in the manuscript and addressed all of the poten:al discrepancies by revising some of our interpreta:on. In addi:on, we have provided addi:onal experimental evidence to support our conclusions. Please see below for a detailed response to each reviewer comment.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The authors use truncations, fragments, and HCN2/4 chimeras to narrow down the interaction and regulatory domains for LRMP inhibition of cAMP-dependent shifts in the voltage dependence of activation of HCN4 channels. They identify the N-terminal domain of HCN4 as a binding domain for LRMP, and highlight two residues in the C-linker as critical for the regulatory effect. Notably, whereas HCN2 is normally insensitive to LRMP, putting the N-terminus and 5 additional C-linker and S5 residues from HCN4 into HCN2 confers LRMP regulation in HCN2.

      Strengths:

      The work is excellent, the paper well written, and the data convincingly support the conclusions which shed new light on the interaction and mechanism for LRMP regulation of HCN4, as well as identifying critical differences that explain why LRMP does not regulate other isoforms such as HCN2.

      Thank you.

      Reviewer #2 (Public Review):

      Summary:

      HCN-4 isoform is found primarily in the sino-atrial node where it contributes to the pacemaking activity. LRMP is an accessory subunit that prevents cAMP-dependent potentiation of HCN4 isoform but does not have any effect on HCN2 regulation. In this study, the authors combine electrophysiology, FRET with standard molecular genetics to determine the molecular mechanism of LRMP action on HCN4 activity. Their study shows that parts of N- and C-termini along with specific residues in C-linker and S5 of HCN4 are crucial for mediating LRMP action on these channels. Furthermore, they show that the initial 224 residues of LRMP are sufficient to account for most of the activity. In my view, the highlight of this study is Fig. 7 which recapitulates LRMP modulation on HCN2-HCN4 chimera. Overall, this study is an excellent example of using time-tested methods to probe the molecular mechanisms of regulation of channel function by an accessory subunit.

      Weaknesses:

      (1) Figure 5A- I am a bit confused with this figure and perhaps it needs better labeling. When it states Citrine, does it mean just free Citrine, and "LRMP 1-230" means LRMP fused to Citrine which is an "LF" construct? Why not simply call it "LF"? If there is no Citrine fused to "LRMP 1-230", this figure would not make sense to me.

      We have clarified the labelling of this figure and specifically defined all abbreviations used for HCN4 and LRMP fragments in the results section on page 14.

      (2) Related to the above point- Why is there very little FRET between NF and LRMP 1-230? The FRET distance range is 2-8 nm which is quite large. To observe baseline FRET for this construct more explanation is required. Even if one assumes that about 100 amino are completely disordered (not extended) polymers, I think you would still expect significant FRET.

      FRET is extremely sensitive to distance (to the 6th power of distance). The difference in contour length (maximum length of a peptide if extended) between our ~260aa fragment and our ~130 aa fragments is on the order of 450Å (45nm), So, even if not extended it is not hard to imagine that the larger fragments show a weaker FRET signal. In fact, we do see a slightly larger FRET than we do in control (not significant) which is consistent with the idea that the larger fragments just do not result in a large FRET.

      Moreover, this hybridization assay is sensitive to a number of other factors including the affinity between the two fragments, the expression of each fragment, and the orientation of the fluorophores. Any of these factors could also result in reduced FRET.

      We have added a section on the limitations of the FRET 2-hybrid assay in the discussion section on page 20. Our goal with the FRET assay was to provide complimentary evidence that shows some of the regions that are important for direct association and we have edited to the text to make sure we are not over-interpreting our results.

      (3) Unless I missed this, have all the Cerulean and Citrine constructs been tested for functional activity?

      All citrine-tagged LRMP constructs (or close derivatives) were tested functionally by coexpression with HCN (See Table 1 and pages 10-11). Cerulean-tagged HCN4 fragments are of course intrinsically not-functional as they do not include the ion conducting pore.

      Reviewer #3 (Public Review):

      Summary:

      Using patch clamp electrophysiology and Förster resonance energy transfer (FRET), Peters and co-workers showed that the disordered N-terminus of both LRMP and HCN4 are necessary for LRMP to interact with HCN4 and inhibit the cAMP-dependent potentiation of channel opening. Strikingly, they identified two HCN4-specific residues, P545 and T547 in the C-linker of HCN4, that are close in proximity to the cAMP transduction centre (elbow Clinker, S4/S5-linker, HCND) and account for the LRMP effect.

      Strengths:

      Based on these data, the authors propose a mechanism in which LRMP specifically binds to HCN4 via its isotype-specific N-terminal sequence and thus prevents the cAMP transduction mechanism by acting at the interface between the elbow Clinker, the S4S5-linker, the HCND.

      Weaknesses:

      Although the work is interesting, there are some discrepancies between data that need to be addressed.

      (1) I suggest inserting in Table 1 and in the text, the Δ shift values (+cAMP; + LRMP; +cAMP/LRMP). This will help readers.

      Thank you, Δ shift values have been added to Tables 1 and 2 as suggested.

      (2) Figure 1 is not clear, the distribution of values is anomalously high. For instance, in 1B the distribution of values of V1/2 in the presence of cAMP goes from - 85 to -115. I agree that in the absence of cAMP, HCN4 in HEK293 cells shows some variability in V1/2 values, that nonetheless cannot be so wide (here the variability spans sometimes even 30 mV) and usually disappears with cAMP (here not).

      With a large N, this is an expected distribution. In 5 previous reports from 4 different groups of HCN4 with cAMP in HEK 293 (Fenske et al., 2020; Liao et al., 2012; Peters et al., 2020; Saponaro et al., 2021; Schweizer et al., 2010), the average expected range of the data is 26.6 mV and 39.9 mV for 95% (mean ± 2SD) and 99% (mean ± 3SD) of the data, respectively. As the reviewer mentions the expected range from these papers is slightly larger in the absence of cAMP. The average SD of HCN4 (with/without cAMP) in papers are 9.9 mV (Schweizer et al., 2010), 4.4 mV (Saponaro et al., 2021), 7.6 mV (Fenske et al., 2020), 10.0 mV (Liao et al., 2012), and 5.9 mV (Peters et al., 2020). Our SD in this paper is roughly in the middle at 7.6 mV. This is likely because we used an inclusive approach to data so as not to bias our results (see the statistics section of the revised manuscript on page 9). We have removed 2 data points that meet the statistical classification as outliers, no measures of statistical significance were altered by this.

      This problem is spread throughout the manuscript, and the measured mean effects are indeed always at the limit of statistical significance. Why so? Is this a problem with the analysis, or with the recordings?

      The exact P-values are NOT typically at the limit of statistical significance, about 2/3rds would meet the stringent P < 0.0001 cut-off. We have clarified in the statistics section (page 10) that any comparison meeting our significance threshold (P < 0.05) or a stricter criterion is treated equally in the figure labelling. Exact P-values are provided in Tables 1-3.

      There are several other problems with Figure 1 and in all figures of the manuscript: the Y scale is very narrow while the mean values are marked with large square boxes. Moreover, the exemplary activation curve of Figure 1A is not representative of the mean values reported in Figure 1B, and the values of 1B are different from those reported in Table 1.

      Y-axis values for mean plots were picked such that all data points are included and are consistent across all figures. They have been expanded slightly (-75 to -145 mV for all HCN4 channels and -65 to -135 mV for all HCN2 channels). The size of the mean value marker has been reduced slightly. Exact midpoints for all data are also found in Tables 1-3.

      The GV curves in Figure 1B (previously Fig. 1A) are averages with the ±SEM error bars smaller than the symbols in many cases owing to relatively high n’s for these datasets. These curves match the midpoints in panel 1C (previously 1B). Eg. the midpoint of the average curve for HCN4 control in panel A is -117.9 mV, the same as the -117.8 mV average for the individual fits in panel B.

      We made an error in the text based on a previous manuscript version about the ordering of the tables that has now been fixed so these values should now be aligned.

      On this ground, it is difficult to judge the conclusions and it would also greatly help if exemplary current traces would be also shown.

      Exemplary current traces have been added to all figures in the revised manuscript.

      (3) "....HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP. Thus, LRMP appears to regulate HCN4 by altering the interactions between the C-linker, S4-S5 linker, and Nterminus at the cAMP transduction centre."

      Although this is an interesting theory, there are no data supporting it. Indeed, P545 and T547 at the tip of the C-linker elbow (fig 6A) are crucial for LRMP effect, but these two residues are not involved in the cAMP transduction centre (interface between HCND, S4S5 linker, and Clinker elbow), at least for the data accumulated till now in the literature. Indeed, the hypothesis that LRMP somehow inhibits the cAMP transduction mechanism of HCN4 given the fact that the two necessary residues P545 and T547 are close to the cAMP transduction centre, remains to be proven.

      Moreover, I suggest analysing the putative role of P545 and T547 in light of the available HCN4 structures. In particular, T547 (elbow) points towards the underlying shoulder of the adjacent subunit and, therefore, is in a key position for the cAMP transduction mechanism. The presence of bulky hydrophobic residues (very different nature compared to T) in the equivalent position of HCN1 and HCN2 also favours this hypothesis. In this light, it will be also interesting to see whether a single T547F mutation is sufficient to prevent the LRMP effect.

      We agree that testing this hypothesis would be very interesting. However, it is challenging. Any mutation we make that is involved in cAMP transduction makes measuring the LRMP effect on cAMP shifts difficult or impossible.

      Our simple idea, now clarified in the discussion, is that if you look at the regions involved in cAMP transduction (HCND, C-linker, S4-S5), there are very few residues that differ between HCN4 and HCN2. When we mutate the 5 non-conserved residues in the S5 segment and the C-linker, along with the NT, we are able to render HCN2 sensitive to LRMP. Therefore, something about the small sequence differences in this region confer isoform specificity to LRMP. We speculate that this happens because of small structural differences that result from those 5 mutations. If you compare the solved structures of HCN1 and HCN4 (there is no HCN2 structure available), you can see small differences in the distances between key interacting residues in the transduction centre. Also, there is a kink at the bottom of the S4 helix in HCN4 but not HCN1. This points a putatively important residue for cAMP dependence in a different direction in HCN4. We hypothesize in the discussion that this may be how LRMP is isoform specific.

      Moreover, previous work has shown that the HCN4 C-linker is uniquely sensitive to di-cyclic nucleotides and magnesium ions. We are hypothesizing that it is the subtle change in structure that makes this region more prone to regulation in HCN4.

      Reviewing Editor (recommendations for the Authors):

      (1) Exemplar recordings need to be shown and some explanation for the wide variability in the V-half of activation.

      Exemplar currents are now shown for each channel. See the response to Reviewer 3’s public comment 2.

      (2) The rationale for cut sites in LRMP for the investigation of which parts of the protein are important for blocking the effect of cAMP is not logically presented in light of the modular schematics of domains in the protein (N-term, CCD, post-CCD, etc).

      There is limited structural data on LRMP and the HCN4 N-terminus. The cut sites in this paper were determined empirically. We made fragments that were small enough to work for our FRET hybridization approach and that expressed well in our HEK cell system. The residue numbering of the LRMP modules is based on updated structural predictions using Alphafold, which was released after our fragments were designed. This has been clarified in the methods section on pages 5-6 and the Figure 2 legend of the revised manuscript.

      (3) Role of the HCN4 C-terminus. Truncation of the HCN4 C-terminus unstructured Cterminus distal to the CNBD (Fig. 4 A, B) partially reverses the impact of LRMP (i.e. there is now a significant increase in cAMP effect compared to full-length HCN4). The manuscript is written in a manner that minimizes the potential role of the C-terminus and it is, therefore, eliminated from consideration in subsequent experiments (e.g. FRET) and the discussion. The model is incomplete without considering the impact of the C-terminus.

      We thank the reviewer for this comment as it was a result that we too readily dismissed. We have added discussion around this point and revised our model to suggest that not only can we not eliminate a role for the distal C-terminus, our data is consistent with it having a modest role. Our HCN4-2 chimera and HCN4-S719x data both suggest the possibility that the distal C-terminus might be having some effect on LRMP regulation. We have clarified this in the results (pages 12-13) and discussion (page 19).

      (4) For FRET experiments, it is not clear why LF should show an interaction with N2 (residues 125-160) but not NF (residues 1-160). N2 is contained within NF, and given that Citrine and Cerulean are present on the C-terminus of LF and N2/NF, respectively, residues 1-124 in NF should not impact the detection of FRET because of greater separation between the fluorophores as suggested by the authors.

      This is a fair point but FRET is somewhat more complicated. We do not know the structure of these fragments and it’s hard to speculate where the fluorophores are oriented in this type of assay. Moreover, this hybridization assay is sensitive to affinity and expression as well. There are a number of reasons why the larger 1-260 fragment might show reduced FRET compared to 125-260. As mentioned in our response to reviewer 2’s public comment 2, we have added a limitation section that outlines the various caveats of FRET that could explain this.

      (5) For FRET experiments, the choice of using pieces of the channel that do not correlate with the truncations studied in functional electrophysiological experiments limits the holistic interpretation of the data. Also, no explanation or discussion is provided for why LRMP fragments that are capable of binding to the HCN4 N-terminus as determined by FRET (e.g. residues 1-108 and 110-230, respectively) do not have a functional impact on the channel.

      As mentioned in the response to comment 2, the exact fragment design is a function of which fragments expressed well in HEK cells. Importantly, because FRET experiments do not provide atomic resolution for the caveats listed in the revised limitations section on page 20-21, small differences in the cut sites do not change the interpretation of these results. For example, the N-terminal 1-125 construct is analogous to experiments with the Δ1-130 HCN4 channel.

      We suspect that residues in both fragments are required and that the interaction involves multiple parts. This is stated in the results “Thus, the first 227 residues of LRMP are sufficient to regulate HCN4, with residues in both halves of the LRMP N-terminus necessary for the regulation” (page 11). We have also added discussion on this on page 21.

      (6) A striking result was that mutating two residues in the C-linker of HCN4 to amino acids found in HCN channels not affected by LRMP (P545A, T547F), completely eliminated the impact of LRMP on preventing cAMP regulation of channel activation. However, a chimeric channel, (HCN4-2) in which the C-linker, the CNBD, and the C-terminus of HCN4 were replaced by that of HCN2 was found to be partially responsive to LRMP. These two results appear inconsistent and not reconciled in the model proposed by the authors for how LRMP may be working.

      As stated in our answer to your question #3, we have revised our interpretation of these data. If the more distal C-terminus plays some role in the orientation of the C-linker and the transduction centre as a whole, these data can still be viewed consistent with our model. We have added some discussion of this idea in our discussion section.

      (7) Replacing the HCN2 N-terminus with that from HCN4, along with mutations in the S5 (MCS/VVG) and C-linker (AF/PT) recapitulated LRMP regulation on the HCN2 background. The functional importance of the S5 mutations is not clear as no other experiments are shown to indicate whether they are necessary for the observed effect.

      We have added our experiments on a midpoint HCN2 clone that includes the S5 mutants and the C-linker mutants in the absence of the HCN4 N-terminus (ie HCN2 MCSAF/VVGPT) (Fig. 7). And we have discussed our rationale for the S5 mutations as we believe they may be responsible for the different orientations of the S4-S5 linker in HCN1 and HCN4 structures that are known to impact cAMP regulation.

      Reviewer #1 (Recommendations For The Authors):

      A) Comments:

      (1) Figure 1: Please show some representative current traces.

      Exemplar currents are now shown for each channel in the manuscript.

      (2) Figure 1: There appears to be a huge number of recordings for HCN4 +/- cAMP as compared to those with LRMP 1-479Cit. How was the number of recordings needed for sufficient statistical power decided? This is particularly important because the observed slowing of deactivation by cAMP in Fig. 1C seems like it may be fairly subtle. Perhaps a swarm plot would make the shift more apparent? Also, LRMP 1-479Cit distributions in Fig. 1B-C look like they are more uniform than normal, so please double-check the appropriateness of the statistical test employed.

      We have revised the methods section (page 7) to discuss this, briefly we performed regular control experiments throughout this project to ensure that a normal cAMP response was occurring. Our minimum target for sufficient power was 8-10 recordings. We have expanded the statistics section (page 9) to discuss tests of normality and the use of a log scale for deactivation time constants which is why the shifts in Fig. 1D (revised) are less apparent.

      (3) It would be helpful if the authors could better introduce their logic for the M338V/C341V/S345G mutations in the HCN4-2 VVGPT mutant.

      See response to the reviewing editor’s comment 7.

      B) Minor Comments:

      (1) pg. 9: "We found that LRMP 1-479Cit inhibited HCN4 to an even greater degree than the full-length LRMP, likely because expression of this tagged construct was improved compared to the untagged full-length LRMP, which was detected by co-transfection with GFP." Co-transfection with GFP seems like an extremely poor and a risky measure for LRMP expression.

      We agree that the exact efficiency of co-transfection is contentious although some papers and manufacturer protocols indicate high co-transfection efficiency (Xie et al., 2011). In this paper we used both co-transfection and tagged proteins with similar results.

      (2) pg 9: "LRMP 1-227 construct contains the N-terminus of LRMP with a cut-site near the Nterminus of the predicted coiled-coil sequence". In Figure 2 the graphic shows the coiledcoil domain starting at 191. What was the logic for splitting at 227 which appears to be the middle of the coiled-coil?

      See response to the reviewing editor’s comment 2.

      (3) Figure 5C: Please align the various schematics for HCN4 as was done for LRMP. It makes it much easier to decipher what is what.

      Fig. 5 has been revised as suggested.

      (4) pg 12: I assume that the HCN2 fragment chosen aligns with the HCN4 N2 fragment which shows binding, but this logic should be stated if that is the case. If not, then how was the HCN2 fragment chosen?

      This is correct. This has been explicitly stated in the revised manuscript (page 14).

      (5) Figure 7: Add legend indicating black/gray = HCN4 and blue = HCN2.

      This has been stated in the revised figure legend.

      (6) pg 17: Conservation of P545 and T547 across mammalian species is not shown or cited.

      This sentence is not included in the revised manuscript, however, for the interest of the reviewer we have provided an alignment of this region across species here.

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):

      (1) It is not clear whether in the absence of cAMP, LRMP also modestly shifts the voltagedependent activity of the channels. Please clarify.

      We have clarified that LRMP does not shift the voltage-dependence in the absence of cAMP (page 10). In the absence of cAMP, LRMP does not significantly shift the voltagedependence of activation in any of the channels we have tested in this paper (or in our prior 2020 paper).

      (2) Resolution of Fig. 8b is low.

      We ultimately decided that the cartoon did not provide any important information for understanding our model and it was removed.

      (3) Please add a supplementary figure showing the amino acid sequence of LRMP to show where the demarcations are made for each fragment as well as where the truncations were made as noted in Fig 3 and Fig 4.

      A new supplementary figure showing the LRMP sequence has been added and cited in the methods section (page 5). Truncation sites have been added to the schematic in Fig. 2A.

      (4) In the cartoon schematic illustration for Fig. 3 and Fig.4, the legend should include that the thick bold lines in the C-Terminal domain represent the CNBD, while the thick bold lines in the N-Terminal domain represent the HCN domain. This was mentioned in Liao 2012, as you referenced when you defined the construct S719X, but it would be nice for the reader to know that the thick bold lines you have drawn in your cartoon indicate that it also highlights the CNBD or the HCN domain.

      This has been added to figure legends for the relevant figures in the revised manuscript.

      (5) On page 12, missing a space between "residues" and "1" in the parenthesis "...LRMP L1 (residues1-108)...".

      Fixed. Thank you.

      (6) Which isoform of LRMP was used? What is the NCBI accession number? Is it the same one from Peters 2020 ("MC228229")?

      This information has been added to the methods (page 5). It is the same as Peters 2020.

      Reviewer #3 (Recommendations For The Authors):

      (1) "Truncation of residues 1-62 led to a partial LRMP effect where cAMP caused a significant depolarizing shift in the presence of LRMP, but the activation in the presence of LRMP and cAMP was hyperpolarized compared to cAMP alone (Fig. 3B, C and 3E; Table 1). In the HCN4Δ1-130 construct, cAMP caused a significant depolarizing shift in the presence of LRMP; however, the midpoint of activation in the presence of LRMP and cAMP showed a non-significant trend towards hyperpolarization compared to cAMP alone (Fig. 3C and 3E; Table 1)".

      This means that sequence 62-185 is necessary and sufficient for the LRMP effect. I suggest a competition assay with this peptide (synthetic, or co-expressed with HCN4 full-length and LRMP to see whether the peptide inhibits the LRMP effect).

      We respectfully disagree with the reviewer’s interpretation. Our results, strongly suggest that other regions such as residues 25-65 (Fig. 3C) and C-terminal residues (Fig. 6) are also necessary. The use of a peptide could be an interesting future experiment, however, it would be very difficult to control relative expression of a co-expressed peptide. We think that our results in Fig. 7E-F where this fragment is added to HCN2 are a better controlled way of validating the importance of this region.

      (2) "Truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation. In the presence of both LRMP and cAMP the activation of HCN4-S719X was still significantly hyperpolarized compared to the presence of cAMP alone (Figs. 4A and 4B; Table 1). And the cAMP-induced shift in HCN4-S719X in the presence of LRMP (~7mV) was less than half the shift in the absence of LRMP (~18 mV)."

      On the basis of the partial effects reported for the truncations of the N-terminus of HCN4 162 and 1-130 (Fig 3B and C), I do not think it is possible to conclude that "truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation". Indeed, cAMP-induced shift in HCN4 Δ1-62 and Δ1-130 in the presence of LRMP were 10.9 and 10.5 mV, respectively, way more than the ~7mV measured for the HCN4-S719X mutant.

      As you rightly stated at the end of the paragraph:" Together, these results show significant LRMP regulation of HCN4 even when the distal C-terminus is truncated, consistent with a minimal role for the C-terminus in the regulatory pathway". I would better discuss this minimal role of the C-terminus. It is true that deletion of the first 185 aa of HCN4 Nterminus abolishes the LRMP effect, but it is also true that removal of the very Cterm of HCN4 does affect LRMP. This unstructured C-terminal region of HCN4 contains isotype-specific sequences. Maybe they also play a role in recognizing LRMP. Thus, I would suggest further investigation via truncations, even internal deletions of HCN4-specific sequences.

      Please see the response to the reviewing editor’s comment 3.

      (3) Figure 5: The N-terminus of LRMP FRETs with the N-terminus of HCN4.

      Why didn't you test the same truncations used in Fig. 3? Indeed, based on Fig 3, sequences 1-25 can be removed. I would have considered peptides 26-62 and 63-130 and 131-185 and a fourth (26-185). This set of peptides will help you connect binding with the functional effects of the truncations tested in Fig 3.

      Please see the response to the reviewing editor’s comment 2 and 5.

      Why didn't you test the C-terminus (from 719 till the end) of HCN4? This can help with understanding why truncation of HCN4 Cterminus does affect LRMP, tough partially (Fig. 4A).

      Please see the response to the reviewing editor’s comment 3.

      (4) "We found that a previously described HCN4-2 chimera containing the HCN4 N-terminus and transmembrane domains (residues 1-518) with the HCN2 C-terminus (442-863) (Liao et al., 2012) was partially regulated by LRMP (Fig. 7A and 7B)".

      I do not understand this partial LRMP effect on the HCN4-2 chimera. In Fig. 6 you have shown that the "HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP". How can be this reconciled with the HCN4-2 chimera? HCN4-2, "containing" P545A/T547F mutations, should not perceive LRMP.

      Please see the response to the reviewing editor’s comment 6.

      (5) "we next made a targeted chimera of HCN2 that contains the distal HCN4 N-terminus (residues 1-212) and the HCN2 transmembrane and C-terminal domains with 5 point mutants in non-conserved residues of the S5 segment and C-linker elbow (M338V/C341V/S345G/A467P/F469T)......Importantly, the HCN4-2 VVGPT channel is insensitive to cAMP in the presence of LRMP (Fig. 7C and 7D), indicating that the HCN4 Nterminus and cAMP-transduction centre residues are sufficient to confer LRMP regulation to HCN2".

      Why did you insert also the 3 mutations of S5? Are these mutations somehow involved in the cAMP transduction mechanism?

      You have already shown that in HCN4 only P545 and T547 (Clinker) are necessary for LRMP effect. I suggest to try, at least, the chimera of HCN2 with only A467P/F469T. They should work without the 3 mutations in S5.

      Please see the response to the reviewing editor’s comment 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Line 144, after eq. (1). Vectors d_i need to be defined. Are these the mapping of vectors e_i due to the active deformation? It would be useful to state then that d_3 is aligned with r'.

      Thank you for your suggestion, and the definition has been added to lines 146-149 for a better understanding of the model.

      • Line 144.Authors state a_i(0,0,Z)=0. Shouldn't this be true also for any angle, i.e., a_i(0,Theta, Z)=0?

      Thank you, we have revised it in line 144.

      • Line 156. G_0 is defined as Diag(1,g_0(t), 1), which seems to be using cylindrical coordinates. Previously, in line 147, vector argument X of \chi is defined with Cartesian coordinates (X,Y,Z). Shouldn't these be also cylindrical?

      We are very sorry for this error, our initial configuration is defined with cylindrical coordinates, we have revised it in the manuscript line 151.

      • Line 162. "where alpha and beta lie in the range [-pi/2, pi/2]" has already been indicated.

      Thank you for your mention, we have deleted duplicate information in line 166.

      • Line 171. W is defined as the strain energy density, while in equation (2), symbol W is the total energy (which depends on the previous W). Letters for total elastic and strain energy must be distinguished.

      Thank you, we have changed the letter for total energy in Eq.(2).

      • Line 176. "we take advantage of the weakness of" -> "we take advantage of the small value of".

      We have revised it in line 179.

      • Line 177. Why is there a subscript i in p_i? If these do not correspond to penalty p, but to parameters in eqn (3), the latter should have been introduced before this line.

      We have revised this error in line 180.

      • Line 186. "as the overall elongation \zeta". This parameter, axial extension, has not been defined yet.

      Thank you for your mention, the definition of \zeta is now given in line 146.

      • Figure 4. Why are the values of g_0 from the elastic model and equations (30)-(32) so non-smooth? Clarify what is being fit and what is the input in the latter equations. Final external radius R_3? Final internal radius R_1'?

      (1) To mimic the embryo, we consider a multi-layered cylindrical body so that the shear modulus of each layer is different. The continuity of both deformations and stresses is imposed (see Eq.(26)-Eq.(30). This is the usual treatment for complex morpho-elastic systems. Obviously, $g_0$ originates from the actomyosin cortex so it appears only in the corresponding layer. Finally, all physical quantities such as deformations and stresses must be continuous.

      (2) The final outer radius is R_3, which represents the outer radius of C. elegans embryos. In addition to R_3, what we need to consider in this model are R_1’=0.7, R_1’=0.768, R_2=0.8 and R_2’=0.96, these definitions have been added in the caption of Appendix 2—figure 1.

      • Line 663, equation (19). Parameter mu is multiplying penalisation term with p, while in equation (2) mu is only affecting the elastic part.

      These two different ways of expressing the energy function will ultimately affect the value of p, but the two p are not the same quantities, so they will not affect our results. To avoid misunderstandings, we will replace p in equation (19) with q.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in my public summary, I find the writing really not adequate. I provide here a list of specific points that the authors should in my opinion address. As a general comment, I would delete many instances of 'the'.

      First, here are figures and whole paragraphs that do not seem to bring anything to the understanding of the phenomenon of C. elegans elongation, notably, Figs. 2, 3C-H, 5m, and 6. Figures 6G and 7 are the only figures containing results it seems. Some elements of the figures are repeated, for example, the illustration of the system's cross-section in Figs 3 and 5.

      Thank you for your suggestion, we have made some adjustments to our images to remove some of the duplicate information.

      Second, and this is my most important criticism: the mechanism of elongation by releasing elastic stress introduced by muscle contraction is not explained in clear terms anywhere in the text. At least, I was unable to understand it. On p 10 you write "This energy exchange causes the torsion-bending energy to convert into elongation energy, (...)" How this is done is not explained. I assume that the reference state is somehow changed through muscle contraction. The new reference state probably has a longer axis than the one before, but this would then be a plastic deformation and not purely elastic as claimed by the authors (ll 76: "This work aims to answer this paradox within the framework of finite elasticity without invoking cell plasticity (...)"). Is torsion important for this process or is it 'just' another way to store elastic energy in the system?

      We perfectly explain most of the exchange of energy between bending, torsion and elongation: indeed, we quantify all aspects of this transformation as the elastic elongation energy, and the dissipation processes which will cost energy. The dissipation evaluated here concerns the rotation of the worm due to the muscle geometry and the viscous friction at the inner surface of the egg. Torsion seems to appear in the late stages and only in some cases. As we show, it comes from a torque induced by the muscles which are not vertical. vertical. Finally, our quantitative predictions of the modelling which recovers most of the experimental published results.

      Third, there are a number of strange phrasings and the notation is not helpful in places.

      We feel sorry for that, the manuscript is now more precise.

      Fourth, the title promises to explain how cyclic muscle contractions reinforce acto-myosin motors. I can't see this done in this work.

      The fact that the acto-myosin is reorganized between two sequences of contraction justifies the title. The complete reorganization of the actomyosin network would require a chemico-mechanical model that is not achieved here, perhaps in future work as data become available.

      In addition:

      We have chosen to respond globally rather than point by point to the referee’s recommendations.

      Typographic errors and vocabulary

      All English corrections and typos are now included in the main text.

      Figures and captions:

      Figures and captions have been improved.

      • Figure 1: Make the caption and the illustration more coherent. For example, only two cell types are distinguished; in the caption, you mention lateral cells, in the sketch seam cells. What is the difference between acto-myosin and muscle contraction? Muscle contraction is also auto-myosin-based.

      (1) The caption for Fig.1 is revised.

      (2) From a mechanical point of view, actomyosin bundles in C elegans are orthoradial, whereas muscles are essentially parallel to the main axis of the body are essentially parallel to the main axis of the body, so the geometry is completely different and of extreme importance for deformation. Muscle contractions are quasi-periodic, we do not know the dynamics of the attached molecular motor of myosin. So of course, both contain actin and myosin (not exactly the same proteins), but our model is sensitive to more macroscopic properties.

      • Figure 2: I do not find this figure helpful. I might expect such a figure in a grant proposal, but much less in an article.

      Figure 2 shows the strategy of our work, we hope that readers can see at a glance what kind of analysis has been done through this figure: since our work is divided into several parts, readers can also unravel the logic through this scheme after reading the whole manuscript. So, this diagram is a guide, and it may be helpful and necessary.

      • Figure 3: Figure 3 A, right: What is the dashed line? B You indicate fibers, but your model does not contain fibers, does it? How do I get from the cube to the deformed object? What is the relation of C-H with the rest of the work? Furthermore, you mention seam cells in Fig. 1, but they are absent here. Why can you neglect them? Why introduce them in the first place? E What is a plant vine? F-H What rods are you referring to? Plants do not have muscles, right?

      We have modified this figure, and the original Figure 3 now corresponds to Figures 3 and 4.

      (1) The dashed line is the centerline after deformation.

      (2) The referee is wrong: our model represents the fibers by a higher shear modulus for the actomyosin cortex and for the muscles (see Table Appendix 1) and G_1 reflects the activities of the muscle and actin fibers.

      (3) The cube in Figure 3 is a mathematical 3D volume element that is subjected to stresses. Hyperelasticity modelling is based on such a representation.

      (4) C-H(new version: Fig.4 A-F): These images show similar deformations: bending and torsion as our C. elegans study. These figures indicate that such deformations are quite common in nature, even if the underlying mechanism is different.

      (5) This is a point we have already mentioned: we ignore the difference between the different types of epidermal cells and average their role in the early and second stages of elongation.

      (6) The plant vine is the 'botanical vine', see Goriely's article and book.

      (7) F-H(new version: Fig.4 D-F) do not have fixed rods, we set a curvature and torsion to fit the actual biological behavior.

      (8) Plants do not have muscles, but they grow, and our formalism for growth, pre-strain and material plasticity is very similar to the hyper-elasticity formalism.

      • Figure 4: Fig .4 A: "The central or inner part (0 < 𝑅 < 𝑅2, shear modulus 𝜇𝑖) except the muscles which are stiffer." I do not understand.

      In the new version, this figure corresponds to Fig.5. The shear modulus of the intrinsic part is very small, but the muscles are harder so we have to consider them separately, we have revised this sentence to avoid misunderstanding.

      • Figure 5: Fig 5 A and D: The schematic of the cross-section has appeared already in the previous figure. No need to repeat it here. The same holds for the schematic of the cylindrical embryo. Caption: "But, the yellow region is not an actual tissue layer and it is simply to define the position of muscles." Why do you introduce the yellow region at all? I do not think that it clarifies anything. "Deformation diagram, when left side muscles M_1 and M_2." Something seems to be missing here. Similarly in the next sentence. "the actin fiber orientation changes from the 'loop' to the 'slope'" Do the rings break up and form a helix?

      In the new version, this figure corresponds to Fig.6.

      (1) We have made revisions to these figures.

      (2) The yellow part can show the accurate location of four muscles, which is important for our model and further calculations.

      (3) We have revised this sentence in the caption of Fig. 6.

      (4) Actin rings do not change to a helix pattern, they will be only sloping.

      • Figure 6: Fig 6 A-C These panels do not go beyond Fig 5B. Fig 6D: what are these images supposed to show? They are not really graphs, but microscopy images. The caption is not helpful to understand, what the reader is supposed to see here. Fig 6F: do you really want to plot a linear curve?

      In the new version, Fig.5 and Fig.6 respectively correspond to Fig.6 and Fig.7.

      (1) Fig.6 shows the simulated images, and Fig.7 A-C is the real calculation results, they are different.

      (2) Fig.7 D can show the real condition during C. elegans late elongation, here, we would like to show the torsion of the C. elegans.

      (3) Yes, it is our result.

      Discussions concerning the biological referee questions:

      Ll 75: “how the muscle contractions couple to the acto-myosin activity" Again I find this misleading because muscle contraction relies on auto-myosin activity. Probably, you can find a better expression to refer to the activity of the actomyosin network in the epidermis. Do you propose any mechanism for how muscle contraction increases epidermal contractility? This does not seem to be the mechanism that you propose for elongation, is it?

      The actomyosin activity will not stop because of the muscle contraction. Obviously, these two processes cannot be independent. The energy released by a muscle contraction event can and must contribute to the reorganization of the actomyosin network that occurs during the elongation process. Indeed, despite the fact that the embryo elongates, the density of actin cables appears to be maintained, which automatically requires a redistribution of actin monomers. We propose a scenario in which muscle contraction increases actomyosin contractility via energy conversion. We show that after unilateral contraction there is an energy release for this once all dissipation factors are eliminated. We invite the reviewer to re-examine Figure 2 and invite biologists to seriously evaluate the density of molecular motors attached to the circumferential actin cable throughout the stretch process.

      Ll 133: "we decide to simplify the geometrical aspect because of the mechanical complexity" This is hardly a justification. Why is it appropriate?

      Yes, we would like to offer the reader the simplest modelling with a limiting technicity and a limited number of unknown parameters.

      L 135: "active strains" Why not active stress?

      The two are equivalent, the choice is dictated by the simplicity of deriving quantitative results for comparison with experiments.

      L 170: "hyperelastic" Please, explain this term.

      It is the elasticity of very soft samples subjected to large deformations. For classic references, see the books of Ogden, Holzapfel and Goriely, all of which are mentioned in our paper.

      Major criticism

      Eq. 3 and Ll 227: "𝑝1 is the ratio between the free available myosin population and the attached ones divided by the time of recruitment" Why is the time of recruitment the same for all motors? "inverse of the debonding time" Is it the same as the unbinding rate? Why use the symbol p_2 for it? What is p_3?

      The model proposed to justify the increase in the activity of the actomyosin motors during the first phase is a mean-field model: thus all quantities are averaged: we are not considering the theory of a single molecular motor, but a collection in a dynamic environment, so we do not need stochasticity here. Equation (3) concerns the compressive pre-strain, which by definition is a quantity varying between $0$ and $1$ and $X_g=1-G$. ... The debonding time is not the same as the debonding rate. The term $p_3$ indicates saturation and is derived from the law of mass action. The good agreement with the experimental data is shown in Fig.5 (A) and (B). An equivalent model has been developed by (M. Serra et al.).

      Serra M, Serrano Nájera G, Chuai M, et al. A mechanochemical model recapitulates distinct vertebrate gastrulation modes[J]. Science Advances, 2023, 9(49)

      Ll 275: "This energy exchange causes the torsion-bending energy to convert into elongation energy, leading to a length increase during the relaxation phase, as shown in Fig.1 of Appendix 5." You have posed the puzzle of how contraction leads to elongation, and now that you resolve the puzzle, you simply say that torsion and bending energy are converted into elongation. How? Usually, if I deform an elastic object, it will return to its original configuration after releasing the external forces. Why is this not the case here?

      Furthermore, the central result of your work is presented in an Appendix!?

      We agree with the referee that an elastic object will return to its initial configuration by releasing stress, i.e. by giving up its accumulated elastic energy to the environment. But the elastic energy has to go somewhere, such as heat. We do not dare to say that the temperature of the worm increases during the muscle contractions.

      In fact, the referee's comment also assumes that full relaxation of the stresses is possible, so the object is not a multi-layered specimen and/or it is not enclosed in a box. Most living species are under stress, usually called residual stress. Our skin is under stress. Our fingerprints result from an elastic instability of the epidermis, occurring on foetal life as our brain circumvolutions or our vili. . So, it is obvious that stresses are maintained in multilayered living systems. Closer to the case of C. elegans, the existence of stresses has been demonstrated by experiments with laser ablation fractures in the first stage. The fact that the fractures open proves the existence of stress: if not, there is no opening and only a straight line.

      Ll 379: "Although a special focus is made on late elongation, its quantitative treatment cannot avoid the influence of the first stage of elongation due to the acto-myosin network, which is responsible for a prestrain of the embryo." This statement is made repeatedly through the manuscript, but I do not understand, why you could not use an initial state without pre-strain.

      This is the basic concept of hyperelasticity. The reference state must be free of stress, so we cannot evaluate the first muscle contraction without treating the first elongation stage.

      Grammar, vocabulary and writing errors

      ll 31: "the influence of mechanical stresses (...) becomes more complex to be identified and quantified" Is the influence of mechanical stress too complex or too difficult to be identified/quantified?

      We have revised it in line 31, “The superposition of mechanical stresses, cellular processes (e.g., division, migration), and tissue organization is often too complex to identify and quantify.”

      Ll 41: "The embryonic elongation of C. elegans represents an attractive model of matter reorganization without a mass increase before hatching." Maybe "Embryonic elongation of C. elegans before hatching represents an attractive model of matter reorganization in the absence of growth.".

      We have revised it in line 41.

      L 42: "It happens after the ventral enclosure (...)" Maybe "It happens after ventral enclosure (...)".

      We have revised it in line 42.

      Ll 52: "The transition is well defined since the muscle participation makes the embryo rather motile impeding any physical experiments such as laser ablation (...)" Ablation of what?

      We have revised it in line 53:The transition is well defined, because the muscle involvement makes the embryo rather motile, and any physical experiments such as laser fracture ablation of the epidermis, which could be performed and achieved in the first period (\cite{vuong2017interplay}), become difficult,.

      Ll 59: "a hollow cylinder composed of four parts (seam and dorso-ventral cells)" It is not clear, what the four parts are - in the parenthesis, two are mentioned.

      We have revised it in line 59. Fig.1 shows the whole structure, dorsal, ventral and seam cells form four parts of the epidermis.

      L 78: "several important issues at this stage remain unsettled" At which stage?

      It means the late elongation stage, we have added this information in line 78.

      Ll 85: "but how it works at small scales remains a challenge." Maybe "but how it works at small scales remains to be understood.".

      We have revised it in line 86.

      Ll 99: "the osmolarity of the interstitial fluid" The comes out of the blue. Before you only talked about mechanics, why now osmolarity? Also, the interstitial fluid is only mentioned now. It is important for the dissipative effects that you discuss later, right? If yes, then you should probably introduce it earlier.

      For a better understanding, we have change osmolarity into viscosity in line 99.

      l 120: "The cortex is composed of three distinct cells" Maybe "distinct cell types".

      Thank you, and we have revised it in line 120.

      L 121: "cytoskeleton organization and actin network configurations" What is the difference between cytoskeleton organization and actin network configuration? Also, either both should be plural or both singular, I guess.

      (1) Cytoskeleton (which involves microtubules) forms the epidermis of C. elegans embryos, and the actin network surrounds the epidermis.

      (2) Thank you for your suggestion, we have revised it in line 121.

      L 130: "which will be introduced hereafter" Maybe "which will be used hereafter".

      We have revised it in line 130.

      Ll 148: "The geometric deformation gradient" You usually denote vectors in bold face, so \chi should be bold, right? Define d_i in Eq.(1).

      Yes, we have added this information in line 147.

      L 172: "auxiliary energy density" Please, explain this term.

      We have changed "auxiliary energy density" into "associated energy density" in line 175. Energy density is the amount of energy stored in a given system or region of space per unit volume, the associated energy density in our manuscript can help us to do some calculations.

      Ll 188: "Similar active matter can be found in biological systems, from animals to plants as illustrated in Fig.3(C)-(E), they have a structure that generates internal stress/strain when growing or activity. (...)" Why such a general statement during the presentation of the results? The second part of the sentence seems to be incomplete.

      Answers: We would like to show our method is general, and can be used in many situations. We have revised the wrong sentence in line 192.

      Ll 243: "a bending deformation occurs on the left for active muscles localized on left" Maybe "bending to the left occurs if muscles on the left are activated".

      Thank you, we have revised it in line 247.

      L 250: "we assume them are perfectly synchronous" Maybe "we assume them to contract simultaneously". We have revised it in line 252.

      L 258: "the muscle and acto-myosin activities are assumed to work almost simultaneously." Before it was simultaneously, now only almost!? What does almost mean?

      Sorry, we would like to express the same meaning in theses two sentences, we have deleted the word ‘almost’ in line 261.

      Ll 294: "one can hypothesize several scenarios" After that, only one scenario is described it seems.

      Thank you, we have revised this sentence in line 299.

      L 341: "and then is more viscous than water" Maybe "and that is more viscous than water".

      We have revised it in line 345.

      L 373: "before the egg hatch" Maybe "before the embryo (or larva) hatches"?

      We have revised the sentence in line 367.

      L 409: "elephant trunk elongated" maybe "elephant trunk elongation".

      We have revised it in line 412.

      Ll 417: "As one imagines, it is far from triviality (...)" Does this remake help in any way to understand better C. elegans elongation? Also maybe "it is far from trivial".

      We have revised it in line 423.

      Ll 428: "can map the initial stress-free state B_0 to a state B_1, which reflects early elongation process" Maybe: "maps the initial stress-free state B_0 to a state B_1, which describes early elongation".

      We have revised it in line 428.

      L 429: "After in the residually stressed (...)" Maybe "Subsequently, we impose an incremental strain filed G_1 that maps the state B_1 to the state B_2, which represents late elongation".

      We have revised it in line 429.

      l 763: "Modelling details of without pre-strain case" Maybe "Case without pre-strain" or "Modelling in the absence of pre-strain" Similarly for l 784.

      We have revised them in line 763 and line 784.

      Some questions of definition and understanding

      Ll 71: "We can imagine that once the muscle is activated on one side, it can only contract, and then the contraction forces will be transmitted to the epidermis on this side." I do not understand the sentence. Muscle activation leads to contraction, there is nothing to imagine here. Maybe you hypothesize that the muscles are attached to the epidermis such that muscle contraction leads to epidermis deformation?

      Yes, four muscle bands are attached to the epidermis, as shown in Fig.1. The deformation does not concern only the epidermis but the whole embryo during the bending events. We have modified the sentence to avoid misunderstanding, the sentence change to “Once the muscle is activated on one side, it can only contract, and then the contraction forces will be transmitted to the epidermis on this side.” in line 71.

      Ll 110: "However, it is less widely known that its internal striated muscles share similarities with skeletal muscles found in vertebrates in terms of both function and structure" Is it important for what you report, whether this fact is widely known?

      Yes, it is our opinion.

      Ll 112: "the role of the four axial muscles (...) is nearly contra-intuitive" Is it or is it not? If yes, why?

      Yes it is. Muscles exert contractions, so compressive deformations. Their localization are along the axis of symmetry (up to a small deviation) so they cannot mechanically realize the expected elongation, contrary to the orthoradial actomyosin network.

      However, elongation of the C. elegans is observed experimentally, so yes, we think the result contraintuitive.

      L 116: "fully heterogeneous cylinder" What is this?

      It means that the C. elegans embryo does not have the same elastic properties in different parts (or layers).

      L 129: "will collaborate to facilitate further elongation" To facilitate or to drive? If the former, what drives elongation?

      Contraction of muscles and actin bundles together drive elongation

      Ll 141: "the deformation in each section can be quantified since the circular geometry is lost with the contractions" The deformation could also be quantified if the sections remained circular, right?

      Yes. However, circularity is lost during each bending event.

      Ll 151: "we need to evaluate the influence of the C. elegans actin network during the early elongation before studying the deformation at the late stage. So, the deformation gradient can be decomposed into: (...) where (...) is the muscle-actomyosin supplementary active strain in the late period" I thought you were now studying the early stage?

      In this part, we are outlining how we can study the whole elongation (early and late), not just the early elongation stage. To evaluate the deformation induced by the first contraction of the muscles, we need to know the state of stress of the worm prior to this event, so we also need to recover the early period using the same formalism for the same structure.

      L 160: "When considering a filamentary structure with different fiber directions" Which filamentary structure are you talking about?

      Fig.3 B shows this model and the filamentary structure, which contains the actin and muscle fibers.

      Ll 174: "When the cylinder involves several layers with different shear modulus 𝜇 and different active strains, the integral over 𝑆 covers each layer" I do not understand this sentence. Also, you should probably write 'moduli' instead of modulus.

      This implies that when integrating over the whole cross-section S, we need to take into account each layer independently with its own shear modulus and sum the results.

      L 176: "weakness of 𝜀" Do you mean \epsilon << 1?

      Yes

      Ll 178: "Given that the Euler-Lagrange equations and the boundary conditions are satisfied at each order, we can obtain solutions for the elastic strains at zero order 𝐚(𝟎) and at first order 𝐚(𝟏)." Are you thinking about different orders in an \epsilon expansion or the early and the late stages of elongation?

      Answers: Different orders are considered only for the late elongation study, the early elongation is treated exactly so do not need a correction in \epsilon.

      L 197: "fracture ablation" Please, define.

      This is an experiment in which a laser is used to make a cut in a small-scale object of study and then the internal stresses are obtained based on the morphology of the cut, please see the Ref ‘Assessing the contribution of active and passive stresses in C. elegans elongation’. We have added this definition in line 200.

      Ll 203: What motivated your choice of notations for the radii R_2'? The inner part of the cylinder is fluid? But above you wrote about a solid cylinder. Why should the inner part be compressible?

      (1) We need to define the location of actin cables, which concentrate at the outer periphery.

      (2) Our model is a hollow cylinder, and the inner part of the cylinder contains internal organs, tissues, fluids, and so on, so we consider it to be a compressible extremely soft material (Line 213).

      Ll 212: "𝑟(𝑅) is the radius after early elongation." And during?

      R is variable, r(R) depends on R but also on time t, it represents the radius of C. elegans embryos after the onset of elongation, i.e., after acto-myosin and muscle activities begin.

      L 232: \tau_p is probably t_p?

      Yes.

      L 240: "quite simultaneously" Please, be precise.

      In practice, it is difficult to define the concept of simultaneous occurrence unless there is rigorous experimental data to show it, but all we can get in the Ref ‘Remodelage des jonctions sous stress mécanique’, is that it occurs almost simultaneously, which we define as quite simultaneously.

      Ll 246: "a short period" What does short mean? Why is it relevant?

      From the experimental observations and data, we know that each contraction occurs very rapidly: a few seconds so we define a short period for one contraction.

      L 263: "the bending of the model will be increased" Is it really the model that is bent?

      Yes, the bending deformation predicted by the model, we have revised in line 266.

      Ll 265: "we observed a consistent torsional deformation (Fig.6(E)) that agrees with the patterns seen in the video" In which sense do these configurations agree? I do not see any similarity between panels D and E.

      Both show a torsion deformation.

      L 267: "torsion as the default of symmetry of the muscle axis" I do not understand.

      We discuss two cases in this research, one where the muscle follows the axis of the C. elegans in the initial configuration, and the other where the muscle has a slight angle of deflection, and we have added more information in the manuscript (line 270).

      Ll 274: "Each contraction of a pair increases the energy of the system under investigation, which is then rapidly released to the body." Do you mean the elastic energy stored in the epidermis and central part of the embryo?

      Yes, the whole body.

      Ll 284: "The activation of actin fibers 𝑔𝑎1 after muscle relaxation can be calculated and determined by our model." Have you done it?

      Yes, we can obtain the value of g_a1, and then calculate the elongation.

      Ll 286 I do not understand, why you write about mutants at this place. Am I supposed to have already understood the basic mechanism of elongation? Why do you now write about the first stage?

      I would like to show our formalism can model wild-type and mutant C.elegans, and the comparison results are good.

      L 302: "The result is significantly higher than our actual size 210𝜇𝑚." How was significance assessed? Your actual size is probably more than 210µm.

      Here, we have considered two situations, one is that the accumulated energy is totally applied to the elongation so that the length will be much larger than the experimental result of 210 µm, the length value that we have obtained by calculation. In the other case, we have considered the energy dissipation, which leads to 210 µm.

      L 433: "where 𝜆 is the axial extension due to the pre-strained" Maybe ""where 𝜆 is the axial extension due to the pre-stress".

      In our manuscript, we define the pre-strain, not the pre-stress.

      L 438: "active filamentary tensor" Please, define.

      Active filamentary tensor defines the tensor representing the activities of a cylindrical model composed of different orientations fibers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study convincingly shows that the less common D-serine stereoisomer is transported in the kidney by the neutral amino acid transporter ASCT2 and that it is a noncanonical substrate for sodium-coupled monocarboxylate transporter SMCTs. With a multihierarchical approach, this important study further shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption carried out, in part, by ASCT2.

      Public Reviews:

      Reviewer #1 (Public Review):

      Most amino acids are stereoisomers in the L-enantiomer, but natural D-serine has also been detected in mammals and its levels shown to be connected to a number of different pathologies. Here, the authors convincingly show that D-serine is transported in the kidney by the neutral amino acid transporter ASCT2 and as a non-canonical substrate for the sodium-coupled monocarboxylate transporter SMCTs. Although both transport D-serine, this important study further shows in a mouse model for acute kidney injury that ASCT2 has the dominant role.

      Strengths:

      The paper combines proteomics, animal models, ex vivo transport analyses, and in vitro transport assays using purified components. The exhaustive methods employed provide compelling evidence that both transporters can translocate D-serine in the kidney.

      Weakness:

      In the model for acute kidney injury, the SMCTs proteins were not showing a significant change in expression levels and were rather analysed based on other, circumstantial evidence. Although its clear SMCTs can transport D-serine its physiological role is less obvious compared to ASCT2.

      We greatly value the reviewer's efforts and feedback in reviewing our manuscript. We acknowledge the reviewer's observation that the changes indicated by our proteomic results are not markedly pronounced. To reinforce our findings, we have incorporated an analysis of gene alterations at the single-cell level (snRNA-seq) from the publicly accessible IRI mouse model data (Figure supplement 7). The snRNA-seq data align with our proteomic data in terms of the general trend of gene/protein alterations, but reveal more substantial changes in both ASCT2 and SMCTs. These discrepancies might stem from the different quantification methods used, suggesting a possible underestimation in our label-free proteomic quantification. The differences we see between the functional changes in transporters and their quantification in proteomics can be explained by the unique challenges posed by membrane proteins. Post-translational modifications and the complex nature of multiple transmembrane domains often impact the accurate measurement of these proteins in proteomic studies. This complexity can lead to a mismatch between the actual functional changes occurring in the transporters and their perceived abundance or alterations as detected by proteomic methods (Figure 4A) (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). However, this label-free quantitative proteomics approach is well-suited for our study, given its screening efficiency, compatibility with animal models, and the absence of a labeling requirement. We may consider incorporating alternative quantitative proteomic methods in future for a more thorough comparison. We have included these considerations in lines 351-356 of the revised manuscript.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNAsequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      Regarding the roles of ASCT2 and SMCTs in renal D-serine transport, snRNA-seq showed that ASCT2 expression in the controls is less than 10% of the cell population. We suggest that ASCT2 contributes to D-serine reabsorption because of its high affinity and SMCTs (SMCT1 and SMCT2) would play a role in D-serine reabsorption in the cells without ASCT2 expression. In addition, we included other factors (the turnover rate and the presence of local canonical substrates) that may determine the capability of D-serine reabsorption. We have included this suggestion in the Discussion lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript "A multi-hierarchical approach reveals D-1 serine as a hidden substrate of sodium-coupled monocarboxylate transporters" by Wiriyasermkul et al. is a resubmission of a manuscript, which focused first on the proteomic analysis of apical membrane isolated from mouse kidney with early Ischemia-Reperfusion Injury (IRI), a well-known acute kidney injury (AKI) model. In the second part, the transport of D-serine by Asct2, Smct1, and Smct2 has been characterized in detail in different model systems, such as transfected cells and proteoliposomes.

      Strengths:

      A major problem with the first submission was the explanation of the link between the two parts of the manuscript: it was not very clear why the focus on Asct2, Smct1, and Smct2 was a consequence of the proteomic analysis. In the present version of the manuscript, the authors have focused on the expression of membrane transporters in the proteome analysis, thus making the reason for studying Asct2, Smct1, and Smct2 transporters more clear. In addition, the authors used 2D-HPLC to measure plasma and urinary enantiomers of 20 amino acids in plasma and urine samples from sham and Ischemia-Reperfusion Injury (IRI) mice. The results of this analysis demonstrated the value of D-serine as a potential marker of renal injury. These changes have greatly improved the manuscript and made it more convincing.

      We deeply appreciate the reviewer’s comments on the manuscript. We have responded to the recommendations one by one in the later section.

      Reviewer #3 (Public Review):

      Summary:

      The main objective of this work has been to delve into the mechanisms underlying the increment of D-serine in serum, as a marker of renal injury.

      Strengths:

      With a multi-hierarchical approach, the work shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption of D-serine that, at least in part, is due to the increased expression of the apical transporter ASCT2. In this way, the authors revealed that SMCT1 also transports D-serine.

      The experimental approach and the identification of D-serine as a new substrate for SMCT1 merit publication in Elife.

      The manuscript also supports that increased expression of ASCT2, even together with the parallel decreased expression of SMCT1, in renal proximal tubules underlies the increased reabsorption of D-serine responsible for the increment of this enantiomer in serum in a murine model of Ischemia-Reperfusion Injury.

      Weaknesses:

      Remains to be clarified whether ASCT2 has substantial stereospecificity in favor of D- versus L-serine to sustain a ~10-fold decrease in the ratio D-serine/L-serine in the urine of mice under Ischemia-Reperfusion Injury (IRI).

      It is not clear how the increment in the expression of ASCT2, in parallel with the decreased expression of SMCT1, results in increased renal reabsorption of D-serine in IRI.

      We thoughtfully appreciate the reviewer’s comment on the manuscript. Considering the alteration of D-/L-serine ratios, there are several factors including protein expression levels at both apical and basolateral sides, properties of the transporters (e.g. transport affinities, substrate stereoselectivities), and the expression of DAAO (D-amino acid oxidase) which selectively degrades D-amino acids. Moreover, the mechanism becomes more complicated when the transport systems of L- and D-enantiomers are different and have distinct stereoselectivities as in the case of serine. Future studies are required to complete the mechanism. However, we would like to explore the mechanism based on the current knowledge.

      From this study, we identified ASCT2 and SMCTs (SMCT1 and SMCT2) as D-serine transport systems. We showed that SMCT1 prefers D-serine. Although we did not analyze ASCT2 stereoselectivity, based on the previous studies, ASCT2 recognizes both D- and Lserine with high affinities and slightly prefers L-enantiomer (Km of 18.4 µM for L-serine in oocyte expression system (Utsunomiya-Tate et al. J Biol Chem 1996) and 167 µM for Dserine in oocyte expression system (Foster et al. Plos ONE 2016), and the IC50 of 0.7 mM for L-serine and 4.9 mM for D-serine (in HEK293 expression systems, Foster et al. PLOS ONE 2016). The proteomics showed an increase of ASCT2 (1.6-fold increase) and a decrease of SMCTs (1.7-fold decrease in SMCT1, and 1.3-fold decrease in SMCT2) in IRI conditions. The table below summarizes D-serine transport by ASCT2 and SMCTs.

      In the case of L-serine, ASCT2 and B0ATs (in particular B0AT3) have been revealed as L-serine transport systems in the kidneys (Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Proteomics showed that B0ATs have higher expression levels than ASCT2 supporting the idea that B0ATs are the main L-serine transport system (Table S1: Abundance of B0AT1 = 1.34E+09, B0AT3 = 2.13E+08, ASCT2 = 1.46E+07). In IRI conditions, B0AT3 decreased 1.8 fold and B0AT1 decreased 1.1 fold. From these results, we included the contribution of B0ATs in L-serine transport in Author response table 1.

      Author response table 1.

      Taken together, we suggest that high ratios of D-/L-serine in IRI conditions are a combinational result of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction and 2) decrease of L-serine reabsorption by B0ATs. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratio, with low rations in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a D-serine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/L-serine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomic analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a thorough study that was reviewed previously under the old system. I think the authors have strengthened their findings and have no further suggestions.

      We appreciate reviewer 1 for his/her effort and comments, which greatly contributed to improving this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The experiments seem to me to have been well performed and the data are readily available.

      Weaknesses:

      More than weakness I would speak of discussion points: I have a few suggestions that may help to make the paper more accessible to a general audience.

      (1) In the Introduction, when the authors introduce the term "micromolecules", it would be beneficial to provide a precise definition or clarification of what they mean by this term. Adding a brief explanation may help the reader to better understand the context.

      Following the reviewer’s comment, we have included the explanation of the micromolecule and membrane transport proteins in lines 41-43.

      Manuscript lines 41-43

      “Membrane transport proteins function to transport micromolecules such as nutrients, ions, and metabolites across membranes, thereby playing a pivotal role in the regulation of micromolecular homeostasis.”

      (2) In line 91, I suggest specifying that this is a renal IRI model.

      Following the reviewer’s comment, we have added the information that it is a renal IRI model of AKI (lines 90-92).

      Manuscript lines 90-92

      “We applied 2D-HPLC to quantify the plasma and urinary enantiomers of 20 amino acids of renal ischemia-reperfusion injury (IRI) mice, a model of AKI and AKI-to-CKD transition (Sasabe et al., 2014; Fu et al., 2018).”

      (3) Lines 167-168 state that Asct2 is localised to the apical side of the renal proximal tubules. Is there any expression of Asct2 in other nephron segments?

      To our knowledge, there is no report of ASCT2 expression in other nephron segments. Our immunofluorescent data of the ASCT2 staining in the whole kidney at the low magnification and another region of Figure 3 (below) as well as immunohistochemistry from Human Protein Atlas (update: Jun 9th, 2023) did not show a strong signal of ASCT2 expression in other regions besides the proximal tubules. Thus, we conclude that ASCT2 is mainly expressed in proximal tubules, but not in other nephron regions.

      Author response image 1.

      (4) Lines 225-226: Have the authors expressed the candidate genes in HEK293 cells with ASCT2 knockdown?

      This experiment was done by expressing the candidate genes in the presence of endogenous ASCT2. We have added the information in lines 225-227 to emphasize this process.

      Manuscript lines 225-227

      “Based on this finding, we utilized cell growth determination assay as the screening method even in the presence of endogenous ASCT2 expression. HEK293 cells were transfected with human candidate genes without ASCT2 knockdown.”

      (5) Lines 254-255: why was D-serine transport enhanced by ASCT2 knockdown in FlpInTRSMCT1 or 2 cells?

      We appreciate the reviewer to point out this data. We apologize for causing the confusion in the text. The total amount of D-serine uptake in the cells did not enhance but the net uptake (uptake subtracted from the background) was increased. This enhancement is a result of the lower background by ASCT2 knockdown. We have revised the texts and explained this result in more detail (lines 256-258).

      Manuscript lines 256-258

      “In the cells with ASCT2 knockdown, the background level was lower, thereby enhancing the D-[3H]serine transport contributed by both SMCT1 and SMCT2 (the net uptake after subtracted with background) (Figure 5C).”

      (6) Line 265: The low affinity of SMCT1 for D-serine alone makes it an unlikely transporter for urinary D-serine.

      We admitted the reviewer’s concern about the low affinity of SMCT1. However, Km at mM range is widely accepted for several low-affinity amino acid transporters such as proton-coupled amino acid transporter PAT1 (Km = 2 – 5 mM; Miyauchi et al. Biochem J 2010), cationic amino acid transporter CAT2A (Km = 3 – 4 mM; Closs et al. Biochem 1997), and large-neutral amino acid transporter LAT4 (Km = 17 mM; Bodoy et al. J Biol Chem 2005). In the kidneys, many compounds are well-known to be reabsorbed by the low-affinity but high-capacity (high-expression) transporters. Similarly, D-serine was reported to be reabsorbed by the low-affinity transporter (Kragh-Hansen and Sheikh, J Physiol 1984; Shimomura et al. BBA 1988; Silbernagl et al. Am J Physiol Renal Physiol 1999). Moreover, amino acid profile showed urinary D-serine in the range of 100 – 200 µM (Figure supplement 2). This concentration range could drive SMCT1 function (Figure 5). Combined with the high and ubiquitous expression of SMCT1, we propose that SMCT1 is a low-affinity but highcapacity D-serine transporter in the kidneys.

      snRNA-seq is a method that can directly compare the expression levels between different genes within the same cells. From Figure supplement 7, expression of SMCT1 is much more abundant than ASCT2. ASCT2 was presented in less than 10% of cell population. It is possible that 90% of the cells that do not express ASCT2 use SMCT1 to reabsorb Dserine.

      We have revised the Discussion regarding this comment (lines 386-404).

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      (7) Line 316: The authors state that there is a high tubular D-serine reabsorption in IRI and in line 424 there is an inactivation of DAAO during the pathology. This suggests that there is a reabsorption of D-serine mediated by a transport system in the basolateral membrane domain of proximal tubular cells. Do the authors have any information about this transporter?

      We agree with the reviewer that transporters at the basolateral membrane are important to complete the D-serine reabsorption in the kidney, and have included this issue in the original manuscript. We stated that transport systems at the basolateral side are necessary to be analyzed in order to complete the picture of D-serine transport systems in the kidney (lines 481-483 of the revised manuscript). However, we did not have any strong candidates for basolateral D-serine transport systems. Because we analyzed the proteome of BBMV, which concentrates on the apical membrane proteins, the analysis did not detect several transporters at the basolateral side.

      (8) In lines 462-463, the authors state: "It is suggested that PAT1 is less active at the apical membrane where the luminal pH is neutral". However, the pH of urine in the proximal tubules is normally acidic due to the high activity of NH3. I suggest rewording this sentence.

      Thank you for your comment. Proximal tubule (PT) is the first and the main region to maintain acid-base homeostasis in the kidney. In PT cells, NH3 secretes H+ to titrate luminal HCO3- and creates CO2, which is absorbed into PT cells and produces "new intracellular HCO3-", which is subsequently reabsorbed into the blood. Although ion fluxes in PT is to maintain the pH homeostasis, the pH regulation in both luminal and intracellular PT cells is highly dynamic. We totally agree with the reviewer and to follow that, we have revised the text by emphasizing the pH around PT segments, rather than the final urine pH, and leaving the discussion open for the possibility of PAT1 function in PT of normal kidneys (lines 474481).

      Manuscript lines 474-481

      “PAT1, a low-affinity proton-coupled amino acid transporter (Km in mM range), has been found at both sub-apical membranes of the S1 segment and inside of the epithelia (The Human Protein Atlas: https://www.proteinatlas.org; updated on Dec 7th, 2022) (Sagné et al., 2001; Vanslambrouck et al., 2010). PAT1 exhibits optimum function at pH 5 - 6 but very low activity at pH 7 (Miyauchi et al., 2005; Bröer, 2008b). Future research is required to address the significance of PAT1 on D-serine transport in the proximal tubule segments where pH regulation is known to be highly dynamic (Boron, 2006; Nakanishi et al., 2012; Bouchard and Mehta, 2022; Imenez Silva and Mohebbi, 2022).”

      Reviewer #3 (Recommendations For The Authors):

      The authors proposed that the increased expression of ASCT2, even together with the decreased expression of SMCT1/2, causes the increased renal reabsorption of D-serine that occurs in IRI. In the discussion, the main argument to sustain this hypothesis is the higher apparent affinity for D-serine of ASCT2 (<200 uM Km) versus SMCT1 (3.4 mM Km). In the Discussion section (page 18- 1st complete paragraph), the authors indicate that the Mass Spec intensities of SMCT1 and 2 are two and one order of magnitude higher respectively than that of ASCT2. This suggests that SMCT1 is clearly more expressed than ASCT2 in control conditions. IRI increments ASCT2 protein expression in brush-border membrane vesicle from kidney 1.6 folds and decreases that of SMCT1 0.6 folds. How this fold changes, even taking into account the lower Km of ASCT2 versus SMCT1 would explain the dramatic changes in the D-/L-serine ratios in plasma and urine in IRI? The authors might discuss whether other transport characteristics, even unknown (e.g., a higher turnover rate of ASCT2 vs SMCT1), would also contribute to the higher D-serine reabsorption in IRI.

      SMCT1 shows some enantiomer selectivity for D- vs L-serine. At 50 uM concentration the transport is almost double for D. vs L-serine, but is ASCT2 stereoselective between the two enantiomers of serine? Some of the authors of this manuscript showed in a previous paper that the basolateral transporter Asc1 also participates in the accumulation of D-serine in serum caused by renal tubular damage. (Serum D-serine accumulation after proximal renal tubular damage involves neutral amino acid transporter Asc-1. Suzuki M et al. Sci Rep. 2019 Nov 13;9(1):16705 (PMID: 31723194)). Asc1 shows no stereoselectivity between L- and D-serine. Can the authors discuss possible mechanisms resulting in increased renal reabsorption of Dserine than L-serine in IRI with the participation of transporters with modest stereoselectivity for D- vs L-serine?

      We appreciate the reviewer’s comments on the degree of protein alteration in proteomics, the functional contributions of ASCT2 and SMCTs, and the alteration of D/L ratios. We have included the possibilities of the technical concerns and the discussion on the roles of ASCT2 and SMCTs as follows.

      • Regarding the expression levels, proteomics and snRNA-seq showed the same tendency that ASCT2 increase and SMCTs decrease in IRI conditions. However, the degrees of alterations are more contrast in snRNA-seq. This may be due to the difference in quantification methods and probably points out the underestimated quantification of membrane transport proteins in label-free proteomics. The accuracy of protein quantifications in the label-free proteomics are often impacted by the presence of post-translational modifications and multiple trans-membrane domains like in the case of the membrane transport proteins (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). Alternative methods of quantitative proteomics may be added in the future for a more thorough comparison. We have added this issue in lines 351-356 of the revised version.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNA-sequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      • For the functional contributions of ASCT2 and SMCTs in the kidney, we admitted the reviewer’s concern about the low affinity of SMCT1. Following the reviewer’s comment, we have included other factors besides transport affinities, e.g. expression levels and turnover rates of the transporters. From the results of both proteomics and snRNA-seq, ASCT2 expression is significantly lower than SMCTs in the normal conditions. snRNA-seq showed that ASCT2 was presented in less than 10% of the cell population (Figure supplement 7). We propose that most of the cells that do not express ASCT2 may use SMCT1 to reabsorb D-serine. This topic was included in the revised manuscript lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of D-serine transport systems.”

      • As for the dramatic alterations of D/L-serine ratios juxtaposed with minimal changes in ASCT2 and SMCTs expression level, we cautiously refrain from drawing a definitive conclusion regarding the entire mechanism. This caution is grounded in the scientific understanding of a comprehensive elucidation of both L-serine transport systems and D-serine transport systems at both apical and basolateral membranes. Nevertheless, we would like to suggest a mechanism at the apical membrane based on the current knowledge.

      For D-serine transport systems, we found ASCT2 and SMCTs contributions in this study. Meanwhile, L-serine was previously reported to be mediated mainly by the neutral amino acid transporters B0AT3 (in particular B0AT3; Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Hence, the mechanism behind the alterations of D/L-serine ratios should include B0AT3 functions as well. In IRI conditions, B0AT3 decreased 1.8 fold. We suggest that high ratios of D-/L-serine in IRI conditions are a combined outcome of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction, and 2) decrease of L-serine reabsorption by B0AT3. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratios, with low ratios in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested the differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a Dserine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/Lserine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomics analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      • In the case of Asc-1, it was reported to be a D-serine transporter in the brain (Rosenberg et al. J Neurosci 2013). Suzuki et al. 2019 showed the increase of Asc-1 in cisplatin-induced tubular injury. Notably, the mRNA of Asc-1 is predominantly found in Henle’s loop, distal tubules, and collecting ducts but not in proximal tubules, and its protein expression level is dramatically low in the kidney (Human Protein Atlas: update on Jun 19, 2023). Furthermore, in this study, Asc-1 expression was not detected in the brush border membrane proteome. Consequently, we have decided not to include Asc-1 in the Discussion of this study, which primarily focuses on the proximal tubules.
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to express our gratitude to the reviewers for their comments, which helped us to improve the quality of our manuscript. Below are the responses to each comment. We hope that these responses will satisfy the reviewers.

      Reviewer #1

      Evidence, reproducibility and clarity

      Summary: The nonsense-mediated mRNA decay (NMD) is and RNA quality pathway that eliminates mRNAs containing premature termination codons. Its mechanism has been studied for several decades but despite enormous progress we still don't have a satisfactory model that would explain most of the published observations. In particular, the mechanism has been proposed to differ substantially between yeast and metazoa. Yeast Nmd4 protein was previously shown to be involved in NMD, to interact with UPF1 and exhibit similarities with metazoan SMG6 and SMG5/7, that are normally believed to be specific for metazoan NMD (Dehecq et al., EMBO J, 2018). Barbarin-Bocahu et al now describe the crystal structure of the complex between the yeast UPF1 RNA helicase and Nmd4. Importantly, the authors show that interaction is required for NMD activity and increases the ATPase activity of UPF1. Barbarin-Bocahu et al equally show that this interaction and its role in NMD is conserved in the human UPF1-SMG6 complex, thus providing additional novel evidence for universal conservation of the NMD mechanism in eukaryotes. The manuscript carefully combines biochemistry, biophysics with functional in vivo studies. In my opinion, all the experiments are very well executed, generally convincing and interpretations appear correct, so the manuscript is certainly suitable for publication. I have included some suggestions below that I believe could strengthen the manuscript and enhance our confidence in the findings.

      We are grateful for the useful suggestions that have enabled us to improve our manuscript.

      Major comments:*

      *Page 7 - "Since the D1353A mutation completely abolishes the enzymatic activity of SMG6 (34), this strongly suggests that the PIN domain of Nmd4 is not endowed with endonucleolytic activity. " Could/was the endonucleolytic activity of NMD4 be tested?

      We agree with this important point. Our statement is based on previous site directed mutagenesis experiments on the PIN domain of human SMG6 (Galvan et al; 2006; EMBO Journal; PMID : 17053788 / Eberle et al; 2008; Nat. Struct. Mol. Biol.; PMID : 19060897), which showed that D1353 is the critical residue of SMG6 active site involved in the endonuclease enzymatic activity. Given that in yeast Nmd4 proteins, the corresponding residue is hydrophobic (Leu112 in S. cerevisiae Nmd4 and Phe114 in Kluyveromyces lactis Nmd4) and therefore cannot participate directly in catalysis, we assume that yeast Nmd4 proteins have no endonucleolytic activity.

      Furthermore, despite decades of research in this field, no endonucleolytic activity has been described as being involved in the NMD pathway of S. cerevisiae (the model system in which the NMD mechanism was discovered in the 1970's), whereas it has been well characterized in the NMD pathway of metazoans for more than twenty years (Gatfield and Izaurralde; Nature; 2004; PMID : 15175755 / Huntzinger et al; RNA; 2008; PMID : 18974281 / Eberle et al; Nat. Struct. Mol. Biol.; 2009; PMID : 19060897 / Lykke-Andersen et al; Genes Dev.; 2014; PMID : 25403180). Our attempts to demonstrate an endonucleolytic activity of purified Nmd4 in vitro were not successful. This negative result could be due to many reasons, including loss of enzymatic activity in the tested buffer, the absence of an important cofactor or the choice of the tested RNA. For these reasons, we prefer not to include this type of negative result in the current manuscript.

      We hope that, on the basis of the above informations, the reviewer will agree that further substantial efforts to demonstrate a hypothetical endonucleolytic activity of Nmd4 are unlikely to be fruitful. Moreover, we believe that even if yeast Nmd4 turns out to behave as an endonuclease, this fact does not change the main message of the manuscript.

      Page 10 - The two proteins bind RNA with reasonable affinity. The complex binds polyU RNA with Kd of 0.44 μM . The authors suggest, based on structure superpositions, that RNA fragments bound to the PIN domain and Upf1-HD have opposite orientations. But since they have the complex ready to crystallize, did they attempt to determine the structure with of the complex with RNA? The complex is quite small (~100 kDa with RNA) but it could be even visible by cryo-EM. I don't insist that such a structure needs to be included but it might perhaps be easy to do and would surely strengthen the story. If it is too difficult, it could at least be mentioned that it was tried?

      We agree that it would be interesting to determine the crystal structure of the complex with a short RNA fragment. Unfortunately, despite extensive efforts, we could not obtain crystals of the complex in the presence of RNA. This is probably due to the large movements of the RecA2 and 1B domains relative to the RecA1 domain observed in former studies upon RNA binding to Upf1. We have mentioned that we tried to crystallize this complex in the absence or the presence of a short oligonucleotide in our revised manuscript.

      As far as single-particle cryo-EM is concerned, we are aware that recent advances in this field should make it possible to determine the structure of the Nmd4-Upf1-RNA complex, but we do not yet have the necessary expertise in this technique. Despite the interesting information that such a structure could provide, we therefore consider that this would require a very significant investment and that it is beyond the scope of this manuscript.

      I think it is important to demonstrate that the structure-based mutants don't significantly impact the overall structure of the proteins (e.g. glycine residues are mutated within helices). At least gel filtration profiles with gels of the WT and mutated proteins should be shown in SI.

      Thank you very much for highlighting this point. We fully agree that it is important to demonstrate that the Upf1 and Nmd4 mutants used in the in vitro experiments (pull-down and ATPase assays) are not affected in their overall folding. As suggested by the reviewer, we have included gel filtration chromatograms for WT and mutant proteins (Figures S2A for Upf1-HD proteins and S2B for His6-ZZ-Nmd4 proteins). These chromatograms clearly show that the different mutants behave very similarly to the WT proteins during purification, demonstrating that the overall structures of the mutants are very similar to those of the wild-type proteins. We have also included the Coomassie blue stained SDS-PAGE analysis of the proteins present in the main peak to show the purity of the final proteins.

      Perhaps the main finding of this manuscript is the conservation of the UPF1-Nmd4 interaction in human UPF1-SMG6. But the interaction is only demonstrated by co-IP with ectopically expressed human proteins in human cells that contain all the other human proteins as well. It would probably be more convincing to demonstrated the interaction in pull-downs with purified proteins as done for the yeast complex.

      Thank you for highlighting what we consider to be one of the most interesting findings presented in our manuscript. We agree that pull-down experiments using pure protein fragments expressed in E. coli would have been ideal to further confirm our co-IP results and to validate that mutations do not affect the overall structure of SMG6. Unfortunately, despite considerable efforts, we were unable to express sufficient quantities of the SMG6-[207-580] fragment or shorter versions as soluble proteins in E. coli. Indeed, Elena Conti's laboratory had the same experience according to a statement in a paper on SMG6 (Chakrabarti et al; 2014 Nucleic Acids Research; PMID: 25013172), indicating that this region protein is very difficult to work with. As we have not yet set up protein over-expression techniques in human cells or baculovirus-infected insect cells in our laboratory, we have not been able to try these expression systems to express these SMG6 domains. These are the reasons why we decided to demonstrate this interaction by co-IP experiment using ectopically expressed tagged proteins in human cells and all appropriate controls.

      In addition, using purified proteins would enable testing whether the mutations in SMG6 don't affect the overall structure of the mutants compared to the WT.

      We agree that this is an important issue. Several bioinformatics tools, including AlphaFold2 (identifier: AF-Q86US8-F1), predict that the human SMG6-[207-580] fragment is largely unstructured (see panel A of figure below). Furthermore, the pLDDT values or confidence scores for this region in the AlphaFold2 model are very low (below 50), indicating that the structure of this region is poorly predicted (see panel B of figure below). Therefore, biophysical techniques to assess that the overall structure of this fragment is not affected by the introduced mutations are very limited. However, we did not observe reduced levels of SMG6 mutants compared with WT in human cells expressing these variants (Fig. 4B and S4), so we believe that these mutants behave similarly to the wild-type fragment, as is often postulated by scientists for in cellulo studies. Furthermore, if these mutants drastically affect the overall structure of SMG6, we would expect NMD to be strongly affected, resulting in a notable accumulation of NMD RNA substrates in our in cellulo experiments when the effect of the double mutant (M2) is compared to that of the SMG6 WT protein (Fig. 4C). This was not the case. On the basis of all these elements, we assume that the overall structure of the SMG6 protein is not affected by these mutations.

      Figure for reviewing purpose : Model of the three-dimensional structure of human SMG6 protein generated by AlphaFold2.

      A. Model of human SMG6 protein (green) with the region 207-580 used in our study colored in red.

      B. Model of human SMG6 protein (green) colored according to the pLDDT values. Orange : pLDDT 90.

      Since the detected similarity to Nmd4 is only in a region covering residues 440-470, why is the tested construct much larger (207-580) including extra, large disordered regions.

      For in cellulo studies, it has previously been shown that the SMG6-[207-580] fragment is expressed as a stable protein in human cells and is responsible for the phospho-independent interaction between UPF1 and SMG6 (Chakrabarti et al; 2014; Nucleic Acids Research; PMID: 25013172). As our aim was not to reduce this SMG6 region to a shorter peptide but to conduct an amino acid-level analysis by site-directed mutagenesis, we decided to perform our experiments using the same SMG6 domain as Conti's laboratory and to mutate conserved residues on this fragment.

      Finally, the most convincing way to show and characterize the human UPF1-SMG6 interaction would be an X-ray structure. It might be feasible to crystallize human UPF1 HD domain with a SMG6 peptide. Or at least an Alphafold model could be included? I had a quick try just with the Colabfold and using the HD domain and the SMG6 peptide, Alphafold can predict convincingly the binding of the region around W456 and in some models even around R448. I think that this would strengthen the conclusions in this part of the manuscript.

      We agree that determination of the crystal structure of human UPF1 HD linked to this region of SMG6 protein interaction would have further supported our conclusions on the conservation of UPF1-Nmd4 interaction in human UPF1-SMG6. However, due to the SMG6 expression problems mentioned above, we were unable to reconstitute the human complex in vitro, which precluded crystallization assays.

      Based on this suggestion, we generated a model of human UPF1-HD bound to the 421-480 region of human SMG6 using AlphaFold2 Colabfold. Of the various models proposed (25 in total), most are very similar and show that the side chains of R448 and W456 of SMG6 bind to regions of human UPF1 corresponding to the region of the yeast protein that interacts with R210 and W216 of Nmd4. This model is consistent with our hypothesis and we have decided to include it in the revised manuscript as suggested (Fig. EV6). We thank the reviewer for this constructive comment.

      We have added the following text to mention this model : « Based on this observation, we generated a model of the complex between human UPF1-HD and the region 421-480 of SMG6 using AlphaFold2 software (1,2). In this model, the SMG6 fragment binds to the same region of UPF1-HD as the Nmd4 « arm » (Fig. EV6). In particular, the R448 and W456 side chains of SMG6 match almost perfectly with R210 and W216 side chains of S. cerevisiae Nmd4, suggesting that this conserved region from SMG6 is involved in the interaction between the SMG6 and UPF1-HD proteins. »

      Does the SMG6 addition also increases the ATPase activity of UPF1?

      This is a very good point and we agree that the results of such an experiment may have further supported our conclusions about the conservation of the Upf1-Nmd4 interaction in human UPF1-SMG6. Unfortunately, due to the SMG6 protein expression problems mentioned above, we could not perform these in vitro experiments.

      Minor comments: Examples of electron density omit maps of the key interaction interfaces should be shown in Supplementary Information for the reader to be able to judge the crystallography data quality.

      Following this suggestion, we have added two panels showing electron density omit maps of residues at the interface in Fig. S1. We hope that this will convince the reader of the quality of our crystallographic data. We have also added the following sentence to the main text : « The overall quality of the electron density map allowed us to unambiguously identify the residues of the two proteins involved in the formation of the complex (Fig. S1A-B). »

      I suggest to add the Kd values to ITC panels for clarity in main and EV figures.

      We have taken this suggestion into account for figures 2A and EV5.

      On page 10: What experiment is this referring to : "This is in agreement with our ITC experiments (carried out in the absence of a non-hydrolyzable ATP analog), which revealed no major synergistic effect between the two proteins for RNA binding." Results in EV4A? Or some other not shown data? The results in EV4A do show an increase in RNA binding when both proteins are in a complex.

      Thank you for your comment. We realize that this sentence was not clear. We refer to the ITC data for the interaction of Upf1-HD, Nmd4 or the complex with RNA (Fig. EV5A). These data show a 2.3-fold increase in the affinity of Upf1 for RNA in the presence of Nmd4, which we consider to be a notable effect but not a major one. Based on the second reviewer's comments that our comparison between Nob1 and the PIN domain of Nmd4 is not convincing, we have decided to delete this speculative section, which did not address an important point in our current study. We will address this point using more direct and sophisticated methods in future work.

      On page 16, "organsms" should be" organisms"

      Typo corrected.

      In certain figure legends the panel labels (A,B,C..) are missing (e.g. Fig 3, EV1, EV5).

      We apologize for this problem ,which was due to a conversion problem when preparing the PDF file of the submitted article. This problem has now been corrected.

      The PIN domain structure was solved only to determine the structure of the complex? I only found it mentioned in the methods and no other mention of this structure in the main text. Maybe one sentence could be added to the results to explain why this structure was solved and how it compares to the complex structure.

      We agree that we forgot to explain why we solved the structure of the PIN domain of Nmd4. The point was to help in the determination of the structure of the complex. We have added the following sentence to the main text to explain this point: « We also determined the 1.8 Å resolution crystal structure of the PIN domain of Nmd4 (residues 1 to 167) to help us determine the structure of the Nmd4/Upf1-HD complex. As this structure is virtually identical to the structure of the PIN domain of Nmd4 in the complex (rmsd of 0.5 Å over 163 C𝛼 atoms between the two structures), we will only describe the structure of this domain in the Upf1-Nmd4 complex. »

      Significance

      This is a important study, providing detailed insight into the function on Nmd4, SMG6 and UPF1 NMD. The results also point towards a conserved mechanism on NMD between yeast and human. I would like to highlight the quality of the experiments. This study will be of great interest to people working on NMD but also more broadly to scientists working on RNA, helicases and structural biologists.

      We are very grateful for the reviewer's comments about the broad interest and overall quality of our work.

      Reviewer #2

      Evidence, reproducibility and clarity

      In this study, the authors solved the crystal structure of the UPF1 helicase domain in complex with Nmd4. Through the structure and biochemical studies, they uncovered a region responsible for Nmd4 binding to UPF1, also important for their function in NMD. In the end, the authors also extended their findings to the human SMG6, proposing a conserved mechanism for Nmd4 and SMG6.

      The mechanism of UPF1 functioning during NMD is a long-existing question. For decades, people have been trying to find out the roles of all the NMD factors during this process. This study visualized the first direct connection between UPF1 and the putative SMG6 homolog, Nmd4. Undoubtedly, it will aid our understanding of how the whole process works.

      One of the limitations of this study is the conservation between Nmd4 and SMG6. Although they both have a PIN domain, Nmd4 is inactive while SMG6 is active. During NMD, SMG6 is thought to work to cut the mRNA, thus promoting the degradation of the non-functional mRNA. Therefore, Nmd4 and SMG6 may only share a similar binding mode with UPF1, however, they do not share similar functions. This study might only apply to yeast study.

      We respectfully disagree with this comment. The role of SMG6 in NMD cannot be attributed solely to the endonuclease activity of the SMG6 PIN domain alone. Indeed, recruitment of the SMG6 PIN domain alone to an mRNA is not sufficient to destabilize it (Nicholson et al; 2014; Nucleic Acids Research; PMID: 25053839). This clearly indicates that other regions of SMG6 are critical for NMD. In our manuscript, we unveil the conservation of the Upf1-Nmd4 interaction in human UPF1-SMG6 (and probably more generally in metazoans) and show that this interaction plays a role in the optimal removal of NMD substrates. We strongly believe that our results are not only applicable to the study of yeast, but will fuel future studies in human cells aimed at describing the mechanistic details of the human NMD pathway.

      comments: the study write in a very clear way, and most of the experiments are clear and sound. I do not have any major comments. I only have a few minor comments, listed below:

      We are very grateful for the reviewer's comments about the overall quality of our manuscript and of the experimental work.

      1:The authors also solved the PIN domain of the SMG6. This is a result worth showing in the main figure.

      In our study, we did not solve the structure of the human SMG6 PIN domain. This was done by Dr. Conti's group in 2006 (Galvan et al; 2006; EMBO Journal; PMID : 17053788). This is the reason why we do not include this in the main figure. However, we have solved the crystal structure of Nmd4 PIN domain alone to help us determine the structure of the complex. Since it is very similar to the structure of the Nmd4 PIN domain in the complex with Upf1, we do not describe this structure in details. Following up the suggestion from another reviewer, we have included the following sentence mentioning that we have also determined the structure of Nmd4 PIN domain in the main text : « We also determined the 1.8 Å resolution crystal structure of the PIN domain of Nmd4 (residues 1 to 167) to help us determine the structure of the Nmd4/Upf1-HD complex. As this structure is virtually identical to the structure of the PIN domain of Nmd4 in the complex (rmsd of 0.5 Å over 163 C𝛼 atoms between the two structures), we will only describe the structure of this domain in the Upf1-Nmd4 complex. »

      2:It would be easier to read if the authors could add all the binding constants directly into the ITC panels.

      We have taken this suggestion into account for figures 2A and EV5.

      3:I am confused with His6-ZZ. Is ZZ a protein tag?

      The ZZ protein is a tag consisting of a tandem of the Z-domain from Staphylococcus aureus protein A. This domain binds to the Fc region of IgG and has been shown to improve expression levels and stability of recombinant proteins. In our case, it proved crucial to obtain mg amounts of the yeast Nmd4 protein and to enhance considerably its stability. We have added the following sentence in the « Materials and methods » section of the manuscript : « The ZZ-tag consists in a tandem of the Z-domain from Staphylococcus aureus protein A and was used as an enhancer of protein expression and stability. »

      4:The comparison between Nob1 and the PIN domain of Nmd4 is not convincing for me. Since the PIN domain is not required for the binding between Nmd4 and UPF1, the conformation of the PIN domain could be a result of the crystal packing. Thus, it is still possible that Nmd4 and UPF1 bind to the same RNA. To this end, I challenge the conclusion the authors have made on the mRNA binding part.

      We agree with your comment. Since this comparison is purely speculative and is not a major focus of our study, we decided to remove this section. We will address this point using more direct and sophisticated methods in future work aimed at elucidating this aspect.

      5: "Showing that Nmd4 stabilizes Upf1-HD on RNA in the absence of ATP and that Upf1 is the main RNA binding factor in the Nmd4/Upf1-HD complex." As mentioned above, I don't think one can make the conclusion UPF1 is the main RNA binding factor; there shouldn't be a main and minor. Meanwhile, what will happen if you add ATP in? Or AMPPNP? Or ADP?

      We agree with your comment that our current data do not allow to conclude precisely about the role of Upf1 as major RNA binding factor. We have replaced this sentence by the following one : « Whether this increase in affinity is due to a synergistic effect between both proteins or to an allosteric stimulation of one partner on the RNA binding property of the second partner remains to be clarified. ».

      Regarding the role of the nucleotides on RNA binding properties of the Upf1 helicase domain or the complex, we faced precipitation problems when mixing high concentrations Upf1 and nucleotides for ITC experiments, making difficult to determine Kd values for the interaction between Upf1 and RNA in the presence of nucleotides. However, in a previous study (Dehecq et al; 2018; EMBO J; PMID : 30275269), we observed that AMPPNP did not affect the amount of Nmd4 and Upf1-HD co-precipitated by an RNA oligonucleotide, indicating that nucleotide does not significantly affect the interaction of the complex with RNA.

      6: "But also that a physical interaction between Upf1-HD and the PIN domain exists in vitro, although we were unable to detect it using our various interaction assays." This also confused me, since one cannot detect the interaction in any assay, how could you be so confident there is a physical interaction? Have you tested assays which are good for weak binding?

      We understand that this sentence may be confusing. The tests we have used to determine whether there is a physical interaction between the PIN domain of Nmd4 and Upf1-HD are ITC and pull-down. These are excellent methods for detecting stable interactions with dissociation constants (Kd) in the nanomolar to tens of micromolar range. These two methods did not indicate any direct interaction between the PIN domain of Nmd4 and Upf1-HD. However, we observed that the PIN domain of Nmd4 stimulates the ATPase activity of Upf1-HD to the same extent as the « arm » of Nmd4. This is an indirect indication that the Nmd4 PIN may interact with Upf1-HD, otherwise a stimulatory effect would not be expected. Our radioactivity-based ATPase assay is very sensitive, allowing the detection of a stimulatory effect due to a transient interaction between the PIN domain of Nmd4 and Upf1-HD, which, as indicated above, could not be detected with the interaction assays used. We would also like to point out that in our ATPase conditions, Upf1-HD (0.156 µM) is incubated with a 20-fold molar excess (3.12 µM) of its partners (Nmd4-FL, Nmd4 « arm » or Nmd4 PIN). Such an excess cannot be used in our interaction tests. This could explain the stimulatory effect detected for the PIN domain of Nmd4 in our ATPase assay.

      We have clarified this section by adding the following sentences: « We were unable to detect such an interaction using our different interaction assays (pull-down and ITC), which are optimal for studying interactions with dissociation constants (Kd) in the nanoM to tens of microM range. We therefore assume that a transient low-affinity interaction (high Kd value not detected by our binding assays) exists between Upf1-HD and PIN Nmd4 and can only be detected by highly sensitive assays such as our radioactivity-based ATPase assay, which was performed with a 20-fold molar excess of PIN Nmd4 domain over Upf1-HD. »

      7: Figure 4B should be done in the context of the full length of SMG6 and UPF1.

      **Referees cross-commenting**

      *This session contains comments from both Rev1 and Rev2*

      Rev1:

      There seems to be a contradiction in comments on Figure 4B. I agree with Reviewer 2 that using FL proteins will be informative to see whether the FL proteins indeed interact (or not in the case of the mutants).

      If one wants to use this experiment to map the interacting regions, then I think that the UPF1 HD domain and the short conserved region of SMG6 should be used. The long fragment SMG6 207-580 is not ideal for either. The short constructs would be more suited for a pull-down experiments (like done for the yeast proteins).

      Rev2

      Response to reviewer #1, It is necessary to use the full-length protein (FL protein) to map the interface unless they have pre-existing information to support mapping down to short fragments.

      In addition, performing further structural work would be beyond the scope of this study. Given the additional time and effort required, I do not recommend doing so for this study.

      Rev1:

      As I said, I agree with using the FL proteins. The pre-existing information supporting the mapping comes from sequence alignments with the yeast structure and the mutagenesis. This is further confirmed by Alphafold modeling which in my opinion should be included. As I mentioned in my review, I don't insist on further structural work

      Thank you very much for this comment and the discussions between reviewers, which show that we didn't explain our experimental strategy clearly. Human UPF1 has been shown to interact with SMG6 in both phospho-dependent and phospho-independent modes. In our manuscript, we focus on characterizing the phospho-independent interaction. For this reason, we cannot perform this experiment using the full-length version of SMG6 and UPF1, otherwise the effects of our point mutants on the UPF1-SMG6 interaction could be masked by the phospho-dependent interaction occurring between domain 14.3.3 of SMG6 and the C-terminus of Upf1. To circumvent this problem, we were inspired by former in cellulo studies, which have shown that the SMG6-[207-580] fragment is expressed as a stable protein in human cells and is responsible for the phospho-independent interaction between UPF1 and SMG6 (Chakrabarti et al; 2014; Nucleic Acids Research; PMID: 25013172). Similarly, the helicase domain of UPF1 was found to be sufficient for this phospho-independent interaction with human SMG6 (Nicholson et al; 2014; Nucleic Acids Research; PMID: 25053839). These are the reasons why we decided to use this protein domains in our in cellulo studies to test the effect of our point mutants on the interaction. As indicated above in an answer to one comment to reviewer #1, as our aim was not to reduce this SMG6 region to a shorter peptide but to conduct an amino acid-level analysis by site-directed mutagenesis, this is also why we decided to perform our experiments using the same SMG6 domain as Conti's laboratory and to mutate conserved residues on this fragment. We have also included the AlphaFold2 model of the complex between human UPF1 and SMG6 in our revised version.

      To clarify this point, we have amended the relevant section as follows: « To determine whether this motif might be involved in the interaction between SMG6 and UPF1-HD proteins, we ectopically expressed the region comprising residues 207-580 of human SMG6 fused to a C-terminal HA tag (SMG6-[207-580]-HA) and human UPF1-HD (residues 295-921 fused to a C-terminal Flag tag; UPF1-HD-Flag) in human HEK293T cells, as these regions have previously been shown to be responsible for the phosphorylation-independent interaction between these two proteins. Compared to the full-length UPF1 and SMG6 proteins, these constructs also preclude our findings of any interference from the phosphorylation-dependent interaction occurring between the C-terminus of UPF1 and the 14-3-3 domain of SMG6. »

      8: "The NMD mechanism not only targets mRNAs but also small nucleolar RNAs (snoRNAs) and long noncoding RNAs (lncRNAs) harboring bona fide stop codons but in a specific context such as short upstream open reading frame (uORF), long 3'-UTRs, low translational efficiency or exon-exon junction located downstream of a stop codon." "First, for mRNAs with long 3'-UTRs, the 3'-faux UTR model posits that a long 3 spatial distance between a stop codon and the mRNA poly(A) tail destabilizes NMD substrates by preventing the interaction between the eRF1-eRF3 translation termination complex bound to the A- site of a ribosome recognizing a stop codon and the poly(A)-binding protein (Pab1 or PABP in S. cerevisiae and human, respectively)." These are difficult to read.

      Thank you for this suggestion to improve the clarity of our manuscript. We have tried to make these sentences easier to read as follow:

      « The NMD mechanism also targets mRNAs, small nucleolar RNAs (snoRNAs) and long noncoding RNAs (lncRNAs) carrying normal stop codons located in a specific context (short upstream open reading frame or uORF, long 3'-UTRs, low translational efficiency or exon-exon junction located downstream of a stop codon (3-11)). »

      « The first model, the 3'-faux UTR model posits that for mRNAs with long 3'-UTRs, a long spatial distance between a stop codon and the mRNA poly(A) tail destabilizes NMD substrates. Indeed, it would prevent the physical interaction between the eRF1-eRF3 translation termination complex recognizing a stop codon in the A-site of the ribosome and the poly(A)-binding protein (Pab1 or PABP in S. cerevisiae and human, respectively) bound to the 3' poly(A) tail (12-14). »

      9: please add the Ramachandran plot values.

      Thank you for pointing out this omission. These values have been included in Table EV1.

      __Significance __

      NMD is one of the major topics in the field of gene translational regulation research. this study will be of interest to a broad audience. i am an expert in the structure study in translation. However, I have limited experience in the in vivo study of NMD substrates.

      We are very grateful for the reviewer's comments about the broad interest and the overall quality of our work.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this study, the authors solved the crystal structure of the UPF1 helicase domain in complex with Nmd4. Through the structure and biochemical studies, they uncovered a region responsible for Nmd4 binding to UPF1, also important for their function in NMD. In the end, the authors also extended their findings to the human SMG6, proposing a conserved mechanism for Nmd4 and SMG6.

      The mechanism of UPF1 functioning during NMD is a long-existing question. For decades, people have been trying to find out the roles of all the NMD factors during this process. This study visualized the first direct connection between UPF1 and the putative SMG6 homolog, Nmd4. Undoubtedly, it will aid our understanding of how the whole process works.

      One of the limitations of this study is the conservation between Nmd4 and SMG6. Although they both have a PIN domain, Nmd4 is inactive while SMG6 is active. During NMD, SMG6 is thought to work to cut the mRNA, thus promoting the degradation of the non-functional mRNA. Therefore, Nmd4 and SMG6 may only share a similar binding mode with UPF1, however, they do not share similar functions. This study might only apply to yeast study.

      comments: the study write in a very clear way, and most of the experiments are clear and sound. I do not have any major comments. I only have a few minor comments, listed below:

      1:The authors also solved the PIN domain of the SMG6. This is a result worth showing in the main figure.

      2:It would be easier to read if the authors could add all the binding constants directly into the ITC panels.

      3:I am confused with His6-ZZ. Is ZZ a protein tag?

      4:The comparison between Nob1 and the PIN domain of Nmd4 is not convincing for me. Since the PIN domain is not required for the binding between Nmd4 and UPF1, the conformation of the PIN domain could be a result of the crystal packing. Thus, it is still possible that Nmd4 and UPF1 bind to the same RNA. To this end, I challenge the conclusion the authors have made on the mRNA binding part.

      5: "Showing that Nmd4 stabilizes Upf1-HD on RNA in the absence of ATP and that Upf1 is the main RNA binding factor in the Nmd4/Upf1-HD complex." As mentioned above, I don't think one can make the conclusion UPF1 is the main RNA binding factor; there shouldn't be a main and minor. Meanwhile, what will happen if you add ATP in? Or AMPPNP? Or ADP?

      6: "But also that a physical interaction between Upf1-HD and the PIN domain exists in vitro, although we were unable to detect it using our various interaction assays." This also confused me, since one cannot detect the interaction in any assay, how could you be so confident there is a physical interaction? Have you tested assays which are good for weak binding?

      7: Figure 4B should be done in the context of the full length of SMG6 and UPF1.

      8: "The NMD mechanism not only targets mRNAs but also small nucleolar RNAs (snoRNAs) and long noncoding RNAs (lncRNAs) harboring bona fide stop codons but in a specific context such as short upstream open reading frame (uORF), long 3'-UTRs, low translational efficiency or exon-exon junction located downstream of a stop codon." "First, for mRNAs with long 3'-UTRs, the 3'-faux UTR model posits that a long 3 spatial distance between a stop codon and the mRNA poly(A) tail destabilizes NMD substrates by preventing the interaction between the eRF1-eRF3 translation termination complex bound to the A- site of a ribosome recognizing a stop codon and the poly(A)-binding protein (Pab1 or PABP in S. cerevisiae and human, respectively)." These are difficult to read.

      9: please add the Ramachandran plot values.

      Significance

      NMD is one of the major topics in the field of gene translational regulation research. this study will be of interest to a broad audience. i am an expert in the structure study in translation. However, I have limited experience in the in vivo study of NMD substrates.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors report a molecular mechanism for recruiting syntaixn 17 (Syn17) to the closed autophagosomes through the charge interaction between enriched PI4P and the C-terminal region of Syn17. How to precisely control the location and conformation of proteins is critical for maintaining autophagic flux. Particularly, the recruitment of Syn17 to autophagosomes remains unclear. In this paper, the author describes a simple lipid-protein interaction model beyond previous studies focusing on protein-protein interactions. This represents conceptual advances.

      We would like to thank Reviewer #1 for the positive evaluation of our study.

      Reviewer #2 (Public Review):

      Summary:

      Syntaxin17 (STX17) is a SNARE protein that is recruited to mature (i.e., closed) autophagosomes, but not to immature (i.e., unclosed) ones, and mediates the autophagosome-lysosome fusion. How STX17 recognizes the mature autophagosome is an unresolved interesting question in the autophagy field. Shinoda and colleagues set out to answer this question by focusing on the C-terminal domain of STX17 and found that PI4P is a strong candidate that causes the STX17 recruitment to the autophasome.

      Strengths:

      The main findings are: 1) Rich positive charges in the C-terminal domain of STX17 are sufficient for the recruitment to the mature autophagosome; 2) Fluorescence charge sensors of different strengths suggest that autophagic membranes have negative charges and the charge increases as they mature; 3) Among a battery of fluorescence biosensors, only PI4P-binding biosensors distribute to the mature autophagosome; 4) STX17 bound to isolated autophagosomes is released by treatment with Sac1 phosphatase; 5) By dynamic molecular simulation, STX17 TM is shown to be inserted to a membrane containing PI4P but not to a membrane without it. These results indicate that PI4P is a strong candidate that STX17 binds to in the autophagosome.

      We would like to thank Reviewer #2 for pointing out these strengths.

      Weaknesses:

      • It was not answered whether PI4P is crucial for the STX17 recruitment in cells because manipulation of the PI4P content in autophagic membranes was not successful for unknown reasons.

      As we explained in the initial submission, we tried to deplete PI4P in autophagosomes by multiple methods but did not succeed. In this revised manuscript, we added the result of an experiment using the PI 4-kinase inhibitor NC03 (Figure 4―figure supplement 1), which shows no significant effect on the autophagosomal PI4P level and STX17 recruitment.

      Author response image 1.

      The PI 4-kinase inhibitor NC03 failed to suppress autophagosomal PI4P accumulation and STX17 recruitment. HEK293T cells stably expressing mRuby3–STX17TM (A) or mRuby3–CERT(PHD) (B) and Halotag-LC3 were cultured in starvation medium for 1 h and then treated with and without 10 μM NC03 for 10 min. Representative confocal images are shown. STX17TM- or CERT(PHD)-positive rates of LC3 structures per cell (n > 30 cells) are shown in the graphs. Solid horizontal lines indicate medians, boxes indicate the interquartile ranges (25th to 75th percentiles), and whiskers indicate the 5th to 95th percentiles. Differences were statistically analyzed by Welch’s t-test. Scale bars, 10 μm (main), 1 μm (inset).

      • The molecular simulation study did not show whether PI4P is necessary for the STX17 TM insertion or whether other negatively charged lipids can play a similar role.

      As the reviewer suggested, we performed the molecular dynamics simulation using membranes with phosphatidylinositol, a negatively charged lipid. STX17 TM approached the PI-containing membrane but was not inserted into the membrane within a time scale of 100 ns in simulations of all five structures. This data suggests that PI4P, which is more negatively charged than PI, is required for STX17 insertion. Thus, we have included these data in Figure 5E and F and added the following text to Lines 242–244. “Moreover, if the membrane contained phosphatidylinositol (PI) instead of PI4P, STX17 approached the PI-containing membrane but was not inserted into the membrane (Figure 5E, F, Video 3)."

      Author response image 2.

      (E) An example of a time series of simulated results of STX17TM insertion into a membrane consisting of 70% phosphatidylcholine (PC), 20% phosphatidylethanolamine (PE), and 10% phosphatidylinositol (PI). STX17TM is shown in blue. Phosphorus in PC, PE and PI are indicated by yellow, cyan, and orange, respectively. Short-tailed lipids are represented as green sticks. The time evolution series are shown in Video 3. (F) Time evolution of the z-coordinate of the center of mass (z_cm) of the transmembrane helices of STX17TM in the case of membranes with PI. Five independent simulation results are represented by solid lines of different colors. The gray dashed lines indicate the locations of the lipid heads. A scale bar indicates 5 nm.

      • The question that the authors posed in the beginning, i.e., why is STX17 recruited to the mature (closed) autophagosome but not to immature autophagic membranes, was not answered. The authors speculate that the seemingly gradual increase of negative charges in autophagic membranes is caused by an increase in PI4P. However, this was not supported by the PI4P fluorescence biosensor experiment that showed their distribution to the mature autophagosome only. Here, there are at least two possibilities: 1) The increase of negative charges in immature autophagic membranes is derived from PI4P. However the fluorescence biosensors do not bind there for some reason; for example, they are not sensitive enough to recognize PI4P until it reaches a certain level, or simply, their binding does not occur in a quantitative manner. 2) The negative charge in immature membranes is not derived from PI4P, and PI4P is generated abundantly only after autophagosomes are closed. In either case, it is not easy to explain why STX17 is recruited to the mature autophagosome only. For the first scenario, it is not clear how the PI4P synthesis is regulated so that it reaches a sufficient level only after the membrane closure. In the second case, the mechanism that produces PI4P only after the autophagosome closure needs to be elucidated (so, in this case, the question of the temporal regulation issue remains the same).

      We thank the reviewers for pointing this out. While the probe for weakly negative charges (1K8Q) labeled both immature and mature autophagosomes, the probes for intermediate charges (5K4Q and 3K6Q) and PI4P labeled only mature autophagosomes (Figure 2F, Figure 2–figure supplement 1B). Thus, we think that the autophagosomal membrane rapidly and drastically becomes negatively charged, and at the same time, PI4P is enriched. Although immature membranes may have weak negative charges, we did not examine which lipids contribute to the negative charges. Thus, we have added the following sentences to the Discussion part.

      “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283) “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors set out to address the question of how the SNARE protein Syntaxin 17 senses autophagosome maturation by being recruited to autophagosomal membranes only once autophagosome formation and sealing is complete. The authors discover that the C-terminal region of Syntaxin 17 is essential for its sensing mechanism that involves two transmembrane domains and a positively charged region. The authors discover that the lipid PI4P is highly enriched in mature autophagosomes and that electrostatic interaction with Syntaxin 17's positively charged region with PI4P drives recruitment specifically to mature autophagosomes. The temporal basis for PI4P enrichment and Syntaxin 17 recruitment to ensure that unsealed autophagosomes do not fuse with lysosomes is a very interesting and important discovery. Overall, the data are clear and convincing, with the study providing important mechanistic insights that will be of broad interest to the autophagy field, and also to cell biologists interested in phosphoinositide lipid biology. The author's discovery also provides an opportunity for future research in which Syntaxin 17's c-terminal region could be used to target factors of interest to mature autophagosomes.

      Strengths:

      The study combines clear and convincing cell biology data with in vitro approaches to show how Syntaxin 17 is recruited to mature autophagosomes. The authors take a methodical approach to narrow down the critical regions within Syntaxin 17 required for recruitment and use a variety of biosensors to show that PI4P is enriched on mature autophagosomes.

      We would like to thank Reviewer #3 for the positive comments.

      Weaknesses:

      There are no major weaknesses, overall the work is highly convincing. It would have been beneficial if the authors could have shown whether altering PI4P levels would affect Syntaxin 17 recruitment. However, this is understandably a challenging experiment to undertake and the authors outlined their various attempts to tackle this question.

      We thank Reviewer #3 for pointing this out. Please see our above response to Reviewer #2 (Public Review).

      In addition, clear statements within the figure legends on the number of independent experimental repeats that were conducted for experiments that were quantitated are not currently present in the manuscript.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      Reviewer #1 (Recommendations For The Authors):

      This paper is well written and all experiments were conducted with a high standard. Several minor issues should be addressed before final publication.

      (1) To further confirm the charge interaction, a charge screening experiment should be performed for Fig. 2A.

      We have asked Reviewer #1 through the editor what this experiment meant and understood that it was to see the effects of high salt concentrations. We monitored the association of GFP-STX17TM with liposomes in the presence or absence of 1 M NaCl and found that it was blocked in a high ionic buffer. This data supports the electrostatic interaction of STX17 with membranes. We have included this data in Figure 2B and added the following sentences to Lines 124–126.

      “The association of STX17TM with PI4P-containing membranes was abolished in the presence of 1 M NaCl (Figure 2B). These data suggest that STX17 can be recruited to negatively charged membranes via electrostatic interaction independent of the specific lipid species.”

      Author response image 3.

      GFP–STX17TM translated in vitro was incubated with rhodamine-labeled liposomes containing 70% PC, 20% PE and 10% PI4P in the presence of 1 M NaCl or 1.2 M sucrose. GFP intensities of liposomes were quantified and shown as in Figure 1C (n > 30).

      (2) The authors claim that "Autophagosomes become negatively charged during maturation", based on experiments using membrane charge probes. Since it's mainly about the membrane, it's better to refine the claim to "The membrane of autophasosomes becomes...", which would be more precise and close to the topic of this paper.

      We would like to thank the reviewer for pointing this out. This point is valid. As recommended, we have collected the phrases “Autophagosomes become negatively charged during maturation” to “The membrane of autophagosomes becomes negatively charged during maturation” (Line 72, 118, 262, 969 (title of Figure2), 1068 (title of Figure2–figure supplyment1)).

      (3) The authors should add more discussion regarding the "specificity" for recruiting Syn17 through the charge interaction. Particularly, how Syn17 could be maintained before the closure of autophagosomes? For the MD simulations in Fig. 5, the current results don't add much to the manuscript. The cell biology experiments have demonstrated the conclusion. The authors could try to find more details about the insertion by analyzing the simulation movies. Do membrane packing defects play a role during the insertion process? A similar analysis was conducted for alpha-synuclein (https://pubmed.ncbi.nlm.nih.gov/33437978/).

      Regarding the mechanism of STX17 maintenance in the cytosol, we do not think that other molecules, such as chaperones, are essential because purified recombinant mGFP-STX17TM used in this study is soluble. However, it does not rule out such a mechanism, which would be a future study.

      In the paper by Liu et al. (PMID: 33437978), small liposomes with diameters of 25–50 nm are used. Therefore, there are packing defects in the highly curved membranes, to which alpha-synuclein helices are inserted in a curvature-dependent manner. On the other hand, autophagosomes are much larger (~1 um in diameter) and almost flat for STX17 molecules, so we think it is unlikely that STX17 recognizes the packing defect.

      Reviewer #2 (Recommendations For The Authors):

      • The two (and other) possibilities with regards to the interpretation of the negative charge/PI4P result in autophagic membranes are hoped to be discussed.

      As mentioned above, we have added the following sentences to the Discussion section. “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283)

      “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      • Fluorescence biosensors are convenient to give an overview of the intracellular distribution of various lipids, but some of them show false-negative results. For example, evectin-2-PH for PS binds to endosomes but not to the plasma membrane, even though the latter contains abundant PS. With regards to PI4P, some biosensors illuminate both the Golgi and autophagosome, while others do not appear to bind the Golgi. Moreover, fluorescence biosensors for PI(3,5)P2 and PI(3,4)P2, which are also candidates for the STX17 insertion issue, are less reliable than others (e.g., those for PI3P and PI(4,5)P2). These problems need to be considered.

      We agree with Reviewer #2 that fluorescence biosensors are not perfect for detecting specific lipids. Based on the Reviewer’s suggestion, we have included a comment on this in the Discussion section as follows (Lines 265–268).

      “Given the possibility that fluorescence lipid probes may give false-negative results, a more comprehensive biochemical analysis, such as lipidomics analysis of mature autophagosomes, would be imperative to elucidate the potential involvement of other negatively charged lipids.”

      • A negative control for the PI4P biosensor, i.e., a mutant lacking the PI4P binding ability, is better to be tested to confirm the presence of PI4P in autophagosomes.

      We would like to thank the Reviewer for this comment. We conducted the suggested experiment and confirmed that the CERT(PHD)(W33A) mutant, which is deficient for PI4P binding (Sugiki et al., JBC. 2012), was diffusely present in the cytosol and did not localize to STX17-positive autophagosomes. This data supports our conclusion that PI4P is indeed present in autophagosomes. We have included this data in Figure 3–figure supplement 2A and explained it in the text (Lines 164–166).

      Author response image 4.

      Mouse embryonic fibroblasts (MEFs) stably expressing GFP–CERT(PHD)(W33A) and mRuby3–STX17TM were cultured in starvation medium for 1 h. Bars indicate 10 μm (main images) and 1 μm (insets).

      • As a control to the molecular dynamic simulation study, STX17 TM insertion into a membrane containing other negative charge lipids, especially PI, needs to be tested. PI is a negative charge lipid that is likely to exist in autophagic membranes (as suggested by the authors' past study).

      We thank the reviewers for this suggestion. As mentioned above (Reviewer #2, Public Review), we performed the molecular dynamics simulation using membranes containing PI and added the results in Figure 5E and F and Video 3.

      • If the putative role of PI4P could be shown in the cellular context, the authors' conclusion would be much strengthened. I wonder if overexpression of PI4P fluorescence biosensors, especially those that appear to bind to the autophagosome almost exclusively, may suppress the recruitment of STX17 there.

      We would like to thank the Reviewer for asking this question. In MEFs stably overexpressing PI4P probes driven by the CMV promoter, STX17 recruitment was not affected. Thus, simple overexpression of PI4P probes does not appear to be effective in masking PI4P in autophagosomes.

      Another idea is to use an appropriate molecule (e.g., WIPI2, ATG5) and to recruit Sac1 to autophagic membranes by using the FRB-FKBP system or the like. I hope these and other possibilities will be tested to confirm the importance of PI4P in the temporal regulation of STX17 recruitment.

      We tried the FRB-FKBP system using the phosphatase domain of yeast Sac1 fused to FKBP and LC3 fused to FRB, but unfortunately, this system failed to deplete PI4P from the autophagosomal membrane.

      Reviewer #3 (Recommendations For The Authors):

      A few areas for suggested improvement are:

      (1) It would be helpful if the authors could clarify for all figures how many independent experiments were conducted for all experiments, particularly those that have quantitation and statistical analyses.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      The authors made several attempts to modulate PI4P levels on autophagosomes although understandably this proved to be challenging. A couple of suggestions are provided to address this area:

      (2) Given the reported role of GABARAPs in PI4K2a recruitment and PI4P production on autophagosomes, as well as autophagosome-lysosome fusion (Nguyen et al (2016) J Cell Biol) it would be worthwhile to assess whether GABARAP TKO cells have reduced PI4P and reduced Stx17 recruitment

      According to the Reviewer’s suggestion, we examined the localization of STX17 TM and the PI4P probe CERT(PHD) in ATG8 family (LC3/GABARAP) hexa KO HeLa cells that were established by the Lazarou lab (Nguyen et al., JCB 2016). As in WT cells, STX17 TM and CERT(PHD) were still colocalized with each other in hexa KO cells, suggesting that neither STX17 recruitment nor PI4P enrichment depends on ATG8 family proteins (note: the size of autophagosomes in HeLa cells is smaller than in MEFs, making it difficult to observe autophagosomes as ring-shaped structures). We have included this result in Figure 3–figure supplement 2(F) and explained it in the text (Lines 194–196, 198).

      Author response image 5.

      (F) WT and ATG8 hexa KO HeLa cells stably expressing GFP–STX17TM and transiently expressing mRuby3–CERT(PHD) were cultured in starvation medium. Bars indicate 10 μm (main images) and 1 μm (insets).

      (3) Can the authors try fusing Sac1 to one of the PI4P probes (CERT(PHD)) that were used, or alternatively to the c-terminus of Syntaxin 17? This approach would help to recruit Sac1 only to mature autophagosomes and could therefore prevent the autophagosome formation defect observed when fused to LC3B that targeted Sac1 to autophagosomes as they were forming. Understandably, this approach might seem a bit counterintuitive since the phosphatase is removing PI4P which is what is recruiting it but it could be a viable approach to keep PI4P levels low enough on mature autophagosomes so that Syntaxin 17 is no longer recruited. A Sac1 phosphatase mutant might be needed as a control.

      We would like to thank the Reviewer for these suggestions. We tried the phosphatase domain of yeast Sac1 or human SAC1 fused with STX17TM, but unfortunately, these fusion proteins did not deplete PI4P from autophagosomes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on the roles of the axon growth regulator Sema7a in the formation of peripheral sensory circuits in the lateral line system of zebrafish. The evidence supporting the claims of the authors is solid, although further work directly testing the roles of different sema7a isoforms would strengthen the analysis. The work will be of interest to developmental neuroscientists studying circuit formation.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, Dasguta et al. have dissected the role of Sema7a in fine tuning of a sensory microcircuit in the posterior lateral line organ of zebrafish. They attempt to also outline the different roles of a secreted verses membrane-bound form of Sema7a in this process. Using genetic perturbations and axonal network analysis, the authors show that loss of both Sema7a isoforms causes abnormal axon terminal structure with more bare terminals and fewer loops in contact with presynaptic sensory hair cells. Further, they show that loss of Sema7a causes decreased number and size of both the pre- and post-synapse. Finally, they show that overexpression of the secreted form of Sema7a specifically can elicit axon terminal outgrowth to an ectopic Sema7a expressing cell. Together, the analysis of Sema7a loss of function and overexpression on axon arbor structure is fairly thorough and revealed a novel role for Sema7a in axon terminal structure. However, the connection between different isoforms of Sema7a and the axon arborization needs to be substantiated. Furthermore, an autocrine role for Sema7a on the presynaptic cell is not ruled out as a contributing factor to the synaptic and axon structure phenotypes.

      Finally, critical controls are absent from the overexpression paradigm.

      Comments: Thank you for your valuable comments. We have analyzed the hair cell scRNA transcriptome data of zebrafish neuromasts from published works and have not identified known expression of receptors of the Sema7A protein, particularly PlexinC1 and Integrin β1 molecules (reference 4 and 15) in hair cells. This result suggests that the Sema7A protein molecule, either secreted or membrane-bound, does not possess its cognate receptor to elicit an autocrine function on the hair cells. Moreover, the GPI-anchored Sema7A lacks a cytosolic domain. So it is unlikely that Sema7A signaling directly induces the formation of presynaptic ribbons. We propose that the decrease in average number and area of synaptic aggregates likely reflects decreased stability of the synaptic structures owing to lack of contact between the sensory axons and the hair cells, which has been identified in zebrafish neuromasts (reference 38).

      Thank you for pointing missing critical control experiments. Additional control experiments (lines 333-346) with a new figure (Figure 5) have been added.

      These issues weaken the claims made by the authors including the statement that they have identified differential roles for the GPI-anchored verses secreted forms of Sema7a on synapse formation and as a chemoattractant for axon arborization respectively.

      Comments: We have rephrased our statement and argue in lines 428-430 that our experiments “suggest a potential mechanism for hair cell innervation in which a local Sema7Asec diffusive cue likely consolidates the sensory arbors at the hair cell cluster and the membrane-anchored Sema7A-GPI molecule guides microcircuit topology and synapse assembly.”

      The manuscript itself would benefit from the inclusion of details in the text to help the reader interpret the figures, tools, data, and analysis.

      Comments: We have made significant revisions to the text and figures to improve clarity and consistency of the manuscript.

      Reviewer #2 (Public Review):

      In this work, Dasgupta et al. investigates the role of Sema7a in the formation of peripheral sensory circuit in the lateral line system of zebrafish. They show that Sema7a protein is present during neuromast maturation and localized, in part, to the base of hair cells (HCs). This would be consistent with pre-synaptic Sema7a mediating formation and/or stabilization of the synapse. They use sema7a loss-of-function strain to show that lateral line sensory terminals display abnormal arborization. They provide highly quantitative analysis of the lateral line terminal arborization to show that a number of specific topological parameters are affected in mutants. Next, they ectopically express a secreted form of Sema7a to show that lateral line terminals can be ectopically attracted to the source. Finally, they also demonstrate that the synaptic assembly is impaired in the sema7a mutant. Overall, the data are of high quality and properly controlled. The availability of Sema7a antibody is a big plus, as it allows to address the endogenous protein localization as well to show the signal absence in the sema7a mutant. The quantification of the arbor topology should be useful to people in the field who are looking at the lateral line as well as other axonal terminals. I think some results are overinterpreted though. The authors state: "Our findings demonstrate that Sema7A functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development." However, they have not actually demonstrated which isoform functions in HCs (also see comments below).

      Comments: Thank you for making this point. To investigate the presence of both sema7a transcripts in the hair cells of the lateral-line neuromasts, we used the Tg(myo6b:actb1EGFP) transgenic fish to capture the labeled hair cells by fluorescence-activated cell sorting (FACS) and isolated total RNA. Using transcript specific DNA oligonucleotide primers, we have identified the presence of both sema7a transcript variants in the hair cell of the neuromast. Even though we have not developed transcript specific knockout animals, we speculate that the presence of both transcript variants in the hair cell implies that they function in distinct fashion. We have changed our interpretation in lines 32-34 to “Our findings propose that Sema7A likely functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development.”

      In future we will utilize the CRISPR/Cas9 technique to target the unique C-terminal domain of the GPI-anchored sema7a transcript variant. We believe that this will only perturb the formation of the full-length Sema7A protein and help us determine the role of the membrane-bound Sema7AGPI molecule as well as the Sema7Asec in sensory arborization and synaptic assembly.

      In addition, they have to be careful in interpreting their topology analysis, as they cannot separate individual axons. Thus, such analysis can generate artifacts. They can perform additional experiments to address these issues or adjust their interpretations.

      Comments: Thank you for this insightful comment. In a previous eLife publication from our laboratory, we utilized the serial blockface scanning electron micrograph (SBFSEM) technique to characterize the connectome of the neuromast microcircuit where patterns of innervation of all the individual axons can be delineated in five-days-old larvae (reference 8). However, the collective behavior of all the sensory axons that build the innervation network remained enigmatic, especially in a living animal during development. In this paper we addressed how the sensory-axon collective behaves around the clustered hair cells and build the innervation network in living animals during diverse developmental stages. Our analyses have not only identified how the axons associates with the hair cell cluster as the organ matures, but also discovered distinct topological features in the arbor network that emerges during organ maturation, which may influence assembly of postsynaptic aggregates (lines 384-403, Figure 6G-I). We believe that our quantitative approach to capture collective axonal behaviors and their topological attributes during circuit formation have highlighted the importance of understanding network assembly during sensory organ development.

      Reviewer #3 (Public Review):

      Summary:

      This study demonstrates that the axon guidance molecule Sema7a patterns the innervation of hair cells in the neuromasts of the zebrafish lateral line, as revealed by quantifying gain- and loss-of function effects on the three-dimensional topology of sensory axon arbors over developmental time. Alternative splicing can produce either a diffusible or membrane-bound form of Sema7a, which is increasingly localized to the basolateral pole of hair cells as they develop (Figure 1). In sema7a mutant zebrafish, sensory axon arbors still grow to the neuromast, but they do not form the same arborization patterns as in controls, with many arbors overextending, curving less, and forming fewer loops even as they lengthen (Figure 2,3). These phenotypes only become significant later in development, indicating that Sema7a functions to pattern local microcircuitry, not the gross wiring pattern. Further, upon ectopic expression of the diffusible form of Sema7a, sensory axons grow towards the Sema7a source (Figure 4). The data also show changes in the synapses that form when mutant terminals contact hair cells, evidenced by significantly smaller pre- and post-synaptic punctae (Figure 5). Finally, by replotting single cell RNA-sequencing data (Figure 6), the authors show that several other potential cues are also produced by hair cells and might explain why the sema7a phenotype does not reflect a change in growth towards the neuromast. In summary, the data strongly indicate that Sema7a plays a role in shaping connectivity within the neuromast.

      Strengths:

      The main strength of this study is the sophisticated analysis that was used to demonstrate fine-level effects on connectivity. Rather than asking "did the axon reach its target?", the authors asked "how does the axon behave within the target?". This type of deep analysis is much more powerful than what is typical for the field and should be done more often. The breadth of analysis is also impressive, in that axon arborization patterns and synaptic connectivity were examined at 3 stages of development and in three-dimensions.

      Weaknesses:

      The main weakness is that the data do not cleanly distinguish between activities for the secreted and membrane-bound forms of Sema7a, which the authors speculate may influence axon growth and synapse formation respectively. The authors do not overstate the claims, but it would have been nice to see some additional experimentation along these lines, such as the effects of overexpressing the membrane-bound form,

      Comments: We have accepted this useful suggestion. In lines 333-346 and in Figure 5 we have demonstrated the impact of overexpressing the membrane-bound transcript variant on arborization pattern of the sensory axons.

      Some analysis of the distance over which the "diffusible" form of Sema7a might act (many secreted ligands are not in fact all that diffusible), or

      Comments: We have reported this in lines 311-317 and in Figure 4F,G.

      Some live-imaging of axons before they reach the target (predicted to be the same in control and mutants) and then within the target (predicted to be different).

      Comments: We have accepted this useful suggestion. We demonstrate the dynamics of the sensory arbors that are attracted to an ectopic Sema7Asec source in lines 325-332, Figure 4I,J; Figure 4—figure supplement 2A, and Videos 13-16.

      Clearly, although the gain-of-function studies show that Sema7a can act at a distance, other cues are sufficient. Although the lack of a phenotype could be due to compensation, it is also possible that Sema7a does not actually act in a diffusible manner within its natural context. Overall, the data support the authors' carefully worded conclusions. While certain ideas are put forward as possibilities, the authors recognize that more work is needed. The main shortcoming is that the study does not actually distinguish between the effects of the two forms of Sema7a, which are predicted but not actually shown to be either diffusible or membrane linked (the membrane linkage can be cleaved). Although the study starts by presenting the splice forms, there is no description of when and where each splice form is transcribed.

      Comments: We have utilized the HCR™ RNA-FISH Technology to generate transcript specific probes. To generate transcript-specific HCR probes to distinctly detect the sema7aGPI (NM_001328508) and the sema7asec (NM_001114885) transcripts, Molecular Instruments could design only 11 probes against the sema7aGPI transcript and only one probe against the sema7asec transcript (personal correspondence with Mike Liu, PhD, Head of Operations and Product Development Lead Molecular Instruments, Inc.). The HCR probe against the sema7aGPI transcript showed a very faint signal. Unfortunately, the HCR probe against the sema7asec transcript failed to detect the presence of any transcript. For robust detection of transcripts, the protocol demands a minimum of 20 probes. We believe that the very low number of probes against our transcripts is the primary reason for the absence of a signal.

      We therefore utilized fluorescence-activated cell sorting (FACS) to capture the labeled hair cells and isolated total RNA to perform RT-PCR using transcript specific DNA oligonucleotide primers. We identified the presence of both the secreted and the membrane-bound transcripts at four-days-old neuromasts (lines 80-84, Figure 1B-D).

      Additionally, since the mutants are predicted to disrupt both forms, it is a bit difficult to disentangle the synaptic phenotype from the earlier changes in circuit topology - perhaps the change at the level of the synapse is secondary to the change in topology.

      Comments: Thank you for the insightful suggestion. We have analyzed the relationship between the sensory arbor network topology and the distribution of postsynaptic structures (lines 384-403, Figure 6G-I). We identified that the distribution of the postsynaptic aggregates is closely associated with the topological attributes of the sensory circuit. We further clarify the potential origin of disrupted synaptic assemblies in sema7a-/- mutants in lines 380-382 and lines 417-420.

      Further, the authors do not provide any data supporting the idea that the membrane bound form of Sema7a acts only locally. Without these kinds of data, the authors are unable to attribute activities to either form.

      Comments: We have accepted this useful suggestion and have prepared the Figure 5 with the necessary details.

      The main impact on the field will be the nature of the analysis. The field of axon guidance benefits from this kind of robust quantification of growing axon trajectories, versus their ability to actually reach a target. This study highlights the value of more careful analysis and as a result, makes the point that circuit assembly is not just a matter of painting out paths using chemoattractants and repellants, but is also about how axons respond to local cues. The study also points to the likely importance of alternative splice forms and to the complex functions that can be achieved using different forms of the same ligand.

      Reviewer #4 (Public Review):

      Summary:

      The work by Dasgupta et al identifies Sema7a as a novel guidance molecule in hair cell sensory systems. The authors use the both genetic and imaging power of the zebrafish lateralline system for their research. Based on expression data and immunohistochemistry experiments, the authors demonstrate that Sema7a is present in lateral line hair cells. The authors then examine a sema7a mutant. In this mutant, Sema7a proteins levels are nearly eliminated. Importantly, the authors show that when Sema7a is absent, afferent terminals show aberrant projections and fewer contacts with hair cells. Lastly the authors show that ectopic expression of the secreted form of Sema7a is sufficient to recruit aberrant terminals to non-hair cell targets. The sema7a innervation defects are well quantified. Overall, the paper is extremely well written and easy to follow.

      Strengths:

      (1) The axon guidance phenotypes in sema7a mutants are novel, striking and thoroughly quantified.

      (2) By combining both loss of function sema7a mutants and ectopic expression of the secreted form of Sema7a the authors demonstrate the Sema7a is both necessary and sufficient to guide sensory axons

      Weaknesses:

      (1) Control. There should be an uninjected heatshock control to ensure that heatshock itself does not cause sensory afferents to form aberrant arbors. This control would help support the hypothesis that exogenously expressed Sema7a (via a heatshock driven promoter) is sufficient to attract afferent arbors.

      Comments: Thank you for the suggestion. We have added the uninjected heatshock control experiment in Figure 5 and described experimental details in the text, lines 343-345.

      (2) Synapse labeling. The numbers obtained for postsynaptic labeling in controls do not match up with the published literature - they are quite low. Although there are clear differences in postsynaptic counts between sema7a mutants and controls, it is worrying that the numbers are so low in controls. In addition, the authors do not stain for complete synapses (pre- and post-synapses together). This staining is critical to understand how Sema7a impacts synapse formation.

      Comments: Thank you for raising this issue. We believe the low average numbers of the postsynaptic punctae in control neuromasts arise from lack of formation of postsynaptic aggregates beneath the immature hair cells, which are abundant in early stages of neuromast maturation. We have performed exhaustive analysis on the formation of pre- and postsynaptic structures and have identified how their distribution changes along neuromast development in control larvae. We have further analyzed how such distribution is perturbed in the sema7a-/- mutants. We do not think analyzing the complete synapse structure will add much to our understanding of how Sema7A influence synapse formation and maintenance.

      (3) Hair cell counts. The authors need to provide quantification of hair cell counts per neuromast in mutant and control animals. If the counts are different, certain quantification may need to be normalized.

      Comments: We have added the raw data with the hair cell counts in both control and sema7a-/- mutants across developmental stages. The homozygous sema7a-/- mutants have slightly less hair cells and we have normalized all our topological analyses by the corresponding hair cell numbers for each neuromast in each experiment (lines 669-675).

      (4) Developmental delay. It is possible that loss of Sema7a simply delays development. The latest stage examined was 4 dpf, an age that is not quite mature in control animals. The authors could look at a later age, such as 6 dpf to see if the phenotypes persist or recover.

      Comments: The homozygous sema7a-/- mutants are unviable and die at 6 dpf. We therefore restricted our analysis till 4 dpf. The association of the sensory arbors with the clustered hair cells gradually decreases as the neuromasts mature from 2 dpf to 4dpf in the sema7a-/- mutants (lines 174-176, Figure 2I). Moreover, in the sema7a-/- mutants the sensory axons throw long projections that keep getting farther away from the clustered hair cells as the neuromast matures from 2 dpf to 4 dpf (lines 166-168, Figure 2H; Figure 2—figure supplement 1K,L). These observations suggest that if the phenotypes in the sema7a-/- mutants were due to developmental delays, then we should have seen a recovery of disrupted arborization patterns over time. But instead, we observe a further deterioration of the arborization patterns and other architectural assemblies. These findings confirm that the observed phenotypes in the sema7a-/- mutants are not due to delayed development of the larvae, but a specific outcome for the loss of Sema7A protein.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      Issue 1: One of the most interesting conclusions in this manuscript is the function of the GPIanchored vs. secreted form of Sema7a in axon structure and synapse formation. In lines 357360 of the discussion (for example) the authors state that they have shown that the GPIanchored form of Sema7a is responsible for contact-mediated synapse formation while the secreted form functions as a chemoattractant for axon arbor structure. "We have discovered dual modes of Sema7A function in vivo: the chemoattractive diffusible form is sufficient to guide the sensory arbors toward their target, whereas the membrane-attached form likely participates in sculpting accurate neural circuitry to facilitate contact-mediated formation and maintenance of synapses." However, the data do not support this conclusion. Specifically, no analysis is done showing unique expression of either isoform in hair cells and no functional analysis is done to conclusively determine which isoform is important for either phenotype.

      Comments: We have shown that both sema7a transcripts are expressed in the hair cells of four-day-old neuromasts (lines 78-84, Figure 1C,D). Ectopic expression of the sema7asec transcript variant robustly attracts the lateral-line sensory arbors toward itself, whereas ectopic expression of the sema7aGPI variant fails to impart sensory guidance from a distance, suggesting that the membrane-bound form likely participates in contact-mediated neural guidance. These experiments decisively show, for the first time in zebrafish, the dual modes of Sema7A function in vivo. However, we agree that the sema7aGPI transcript-specific knockout animal would be essential to conclusively prove that the membrane-attached form is primarily involved in forming accurate neural circuitry and contact-mediated formation and maintenance of synapses. Hence, we have very carefully stated in lines 427-428 that “the membrane-attached form likely participates in sculpting accurate neural circuitry to facilitate contact-mediated formation and maintenance of synapses”. We will follow up on this suggestion in our upcoming manuscript that will incorporate transcript-specific genetic ablations.

      Though the authors present RT-PCR analysis of sema7a isoforms, it is not interpretable. The second reverse primer will also recognize the full-length transcript (from what I can gather) so it does not simply show the presence of the secreted form. Is there a unique 3'UTR for the short transcript that can be used? Additionally, for the GPI-anchored version can you use a forward primer that is not present in the short isoform? This would shed some light on the respective levels of both transcripts.

      Comments: The C-termini of the two transcript variants are distinct and we have designed distinct primers that will selectively bind to each transcript (lines 503-511). Since, we have not performed quantitative polymerase chain reaction (qPCR), relative levels of each transcript are hard to determine.

      Alternatively, and perhaps of more use, in situ hybridization using unique probes for each isoform would allow you to determine which are actually present in hair cells.

      Comments: We have tried this approach and explained the point earlier (refer to lines 203212 of this response letter).

      To decisively state that these isoforms have unique functions in axon terminal structure and synapse formation, other experiments are also essential. For example, RNA-mediated rescue analyses using both isoforms would tell you which can rescue the axonal structure and synapse size/number phenotypes. Overexpression of the GPI-anchored form, like the secreted form in Figure 4, would allow you to determine if only the secreted form can cause abnormal axon extension phenotypes. Expression of both forms in hair cells (using a myo6b promotor for example) would allow assessment of their role in presynapse formation.

      Comments: We have ectopically expressed the sema7aGPI transcript variant near the sensory arbor network and observed that Sema7A-GPI fails to impart sensory axon guidance from a distance.

      Thank you for suggesting the rescue experiments. We are in the process of generating CRISPR/Cas9-mediated transcript-specific knockout animals. We are currently preparing another manuscript that incorporates the above-mentioned rescue experiments to dissect the role of each transcript in regulating arbor topology and synapse formation.

      For the overexpression experiments, expression of mKate alone (with and without heat shock) is also a critical control to include.

      Comments: We have incorporated two control experiments: (1) larvae injected with hsp70:sema7asec-mKate2 plasmid that were not heat shocked and (2) Uninjected larvae that were heatshocked. We think these two controls are sufficient to demonstrate that the abnormal arborization patterns are not artifacts generated due to plasmid injection and heatshocking.

      Issue 2: A second concern is the lack of data showing support cell and hair cell formation and function is unaffected. Analysis of support and hair cell number with loss of Sema7a as well as simple analyses of mechanotransduction (FM4-64) would help alleviate concerns that phenotypes are due to disrupted neuromast formation and basic hair cell function rather than a specific role for Sema7a in this process.

      Comments: We have measured the hair cell numbers in both control and sema7a-/- mutants across developmental stages. We have added this to our submitted raw data.

      We have utilized the styryl fluorophore FM4-64 to test the mechanotransduction function of the hair cells in sema7a-/- mutants. We have detailed our finding in lines 137141 and in Figure 2—figure supplement 1C,D.

      Expression analysis of Sema7a receptors would also help strengthen the argument for a specific effect on lateral line afferent axons.

      Comments: Thank you for this suggestion. Currently, we do not possess an RNA transcriptome dataset for the lateral line ganglion. This deficit limits a systematic screen for lateral-line sensory neuronal gene expressions either through antibody stains or via HCRmediated in situ techniques. In future we plan to develop an RNA transcriptome for the lateral-line ganglion and identify potential binding partners for Sema7A.

      Issue 3: The manuscript could also be improved to include more detail in some areas and less in others. In general, each section has a fairly long lead up but lacks important experimental details that would help the reader interpret the data. For example:

      Figure 1: What is the label for the lateral line axons? Is it a specific transgenic? The legend states that 3 asterisks indicate p<0.0001. What about the other asterisk combinations?

      Comments: We have clarified these issues in lines 118-121 and in lines 906-907.

      Figure 2: For the network analysis, are the traces for all axons that branch to innervate the neuromast?

      Comments: Yes, we have traced the entire arbor containing all the axons that branched from the lateral line nerve and extended toward the clustered hair cells. The three-dimensional traces depict a skeletonized representation of the arbor network.

      Can the tracing method distinguish individual axons?

      Comments: No, our goal is to understand how the axon-collective behave around the clustered hair cells during development.

      How do you know where an end is versus continued looping?

      Comments: We have categorically defined the topological attributes in lines 187-191 and in Figure 3A.

      Also, are all neuromasts similarly affected or is there a divergence based on which organ you are imaging? What neuromast was imaged in this and other figures?

      Comments: Yes, all the neuromasts in the trunk and tail regions were affected similarly by the sema7a mutation. We did not observe any region-specific phenotypic outcome. We consistently imaged the trunk neuromasts, particularly the second, third, and fourth neuromasts.

      Discussion: The short discussion failed to put these findings into context or to discuss how this unique topological arrangement of axon terminals impacts function.

      Comments: We have added a new segment, lines 432-448, in the discussion section which mentions the potential role of the topological features in arranging the distribution pattern of the postsynaptic densities and thereby potentially influencing the network’s ability to gather sensory inputs through properly placed postsynaptic aggregates.

      Can you speculate on how the looping structure may alter number of synaptic contacts per axon for instance? For this, it would be useful to know if normally the synapses form on loops versus bare terminals.

      Comments: Thank you for this insightful suggestion. We have performed detailed analysis, as mentioned in lines 384-397, to characterize the distribution of the postsynaptic densities between the two topological attributes.

      Does this looping facilitate single axons contacting more hair cells of the same polarity? Would that be beneficial?

      Comments: Looping behaviors indeed facilitate the contact between the axons and the hair cells. As we have observed, the primary topological attribute that the sensory arbor network underneath the clustered hair cells adopts is a loop. The bare terminals are predominantly projected transverse to the clustered hair cells and lack contact with them. Whether a single axon, being part of a loop, preferentially contacts hair cells of same polarity is yet to be determined. We can address this question by mosaic labeling a single axon in the arbor network and determine its association with the hair cells. We intend to do these experiments in our upcoming manuscript.

      Minor concerns:

      (1) For the stacked charts quantifying topological features, I found interpreting them challenging. Is it possible to put these into overlapping histograms or line graphs to better compare wild type to mutant directly?

      Comments: Thank you for your suggestion. We tried several ways to represent our data and found that the stacked charts optimally signify our analysis and depict the characteristic phenological differences between the control and the sema7a-/- mutants.

      (2) There are numerous strong statements throughout not directly supported by the data, e.g. lines 110-113; 206-208; 357-360 and others. These should be tempered.

      Comments: For lines 110-113, we have updated this section with new experiments and the new segment is represented in lines 115-126.

      For lines 206-208, we have updated the statement to “This result suggests that the stereotypical circuit topology observed in the mature organ may emerge through transition of individual arbors from forming bare terminals to forming closed loops encircling topological holes” in lines 225-227.

      Reviewer #2 (Recommendations For The Authors):

      The authors should be careful about making any assumptions which form of sema7a is active in NMs. Their RT-PCR demonstrates presence of both isoforms in a whole animal; however, whether they are similarly present in HCs is not investigated here.

      Comments: We have addressed this concern and have updated the manuscript with new experiments, detailed in lines 78-84.

      Also, there is an issue of translation and trafficking to the membrane with subsequent secretion. An important experiment that would address this question is expressing two sema7a isoforms in mutant HCs and asking whether this can suppress the mutant phenotype.

      Comments: Thank you for suggesting the rescue experiments. We are in the process of generating CRISPR/Cas9-mediated transcript-specific knockout animals. We are currently preparing another manuscript that incorporates the above-mentioned rescue experiments to dissect the role of each transcript in regulating arbor topology and synapse formation.

      Presumably, sema7a is trafficked to the membrane during HC maturation. This is consistent with the authors' observation that sema7a localization is changing as NM mature. However, actin-sema7a co-labeling does not actually show whether sema7a is on the membrane. Labeling HCs with a membrane marker (transgene) would be much more convincing. Alternatively, can the authors show sema7a localization actually correlates with the presence of sensory axon terminals? They already have immunos that label both. Thus, this should be pretty straightforward.

      Comments: Thank you for these suggestions. We have addressed these issues in lines 112114, and in lines 119-126.

      Figure 2 should have a control panel, so the reduced sema7a staining can be compared to the control side-by-side.

      Comments: We have depicted Sema7A staining in control neuromasts in multiple images, including Figure 1E, Figure 1H, and in Figure 2—figure supplement 1B. We have kept the control panel in the supplementary figure due to space restrictions in Figure 2.

      Arborization topology: While I appreciate the very careful characterization of the topology for wild-type and mutant NMs, I think it would be much more informative to mark individual axons and then analyze their topology. The main reason is that the authors cannot really distinguish whether some aspects of topology they describe are really due to the densely packed overlapping terminals of multiple axons or these are really characteristic, higher order organization of individual axons. Because of this, they cannot be certain what is really happening with sema7a mutant terminals. Related to the point above. While it is clear that the overall topology is abnormal in the mutant, the authors should be careful in concluding that sema7a regulates specific aspects of it. The overall structure is probably highly interconnected perturbing one parameter would likely affect all the others.

      Comments: Thank you for this comment. In a previous eLife publication from our laboratory, we utilized the serial blockface scanning electron micrograph (SBFSEM) technique to characterize the connectome of the neuromast microcircuit where patterns of innervation of all the individual axons can be delineated in five-days-old larvae (reference number 8). However, the collective behavior of all the sensory axons that build the innervation network remained enigmatic, especially in a living animal during development. In this paper we addressed how the sensory axon-collective behave around the clustered hair cells and build the innervation network in living animals during diverse developmental stages. Our analyses have not only identified how the axon-collective associates itself with the hair cell cluster as the organ matures, but also discovered distinct topological features in the arbor network that emerges during organ maturation, which may influence assembly of postsynaptic aggregates (lines 384-403, Figure 6G-I). We believe that our quantitative approach to capture collective axonal behaviors and their topological attributes during circuit formation have highlighted the importance of understanding network assembly during sensory organ development.

      Experiments with the secreted sema7a isoform would be much more informative if they were compared/contrasted to the GPI anchored isoform.

      Comments: We added a new section, lines 338-351, and a new Figure 5 to address this issue.

      The phenotype of ectopic projections in sema7a overexpression experiments is pretty dramatic, especially given the fact that these were performed in wild-type animals. Does this mean that the phenotype would be even more dramatic in sema7a mutants, as they have more bare axon terminals according to the authors' analysis. Have the authors attempted this type of experiments?

      Comments: That is an interesting suggestion. We have not tested that yet. Our guess is that in the sema7a-/- mutants, the abundant bare terminals will be far more sensitive to an ectopic source of Sema7A. But even in the sema7a-/- mutants, other chemotropic cues are still functional, which may impart certain restrictions on how many bare terminals are allowed to leave the neuromast region.

      Reviewer #3 (Recommendations For The Authors):

      (1) No raw data are shown, such that it is difficult to assess variability across animals or within animals, just the overall trends within the whole dataset. Raw data need to be shown for every measurement, at least in supplemental figures. It would also be useful to reliably show control next to mutant in the same plot, as it is a bit hard to compare across panels, which occurs in several figures.

      Comments: We have uploaded all the raw data related to each experiment.

      (2) Given the focus on the two possible forms of Sema7a, the authors should use HCR or another form of reliable in situ hybridization to show the spatiotemporal pattern of expression of each isoform.

      Comments: We have utilized the HCR™ RNA-FISH Technology to generate transcript specific probes. To generate transcript-specific HCR probes to distinctly detect the sema7aGPI (NM_001328508) and the sema7asec (NM_001114885) transcripts, Molecular Instruments could design only 11 probes against the sema7aGPI transcript and only one probe against the sema7asec transcript (personal correspondence with Mike Liu, PhD, Head of Operations and Product Development Lead Molecular Instruments, Inc.). The HCR probe against the sema7aGPI transcript showed a very faint signal. Unfortunately, the HCR probe against the sema7asec transcript failed to detect the presence of any transcript. For robust detection of transcripts, the protocol demands a minimum of 20 probes. We believe that the very low number of probes against our transcripts is the primary reason for the lack of a signal.

      (3) The authors should explain the criteria used to select the 22 embryos used to analyze the effects of expressing diffusible Sema7a.

      Comments: We have explained this in lines 291-292. We identified 22 mosaic sema7asecmKate2 integration events, in which a single mosaic ectopic integration had occurred near the network of sensory arbors, from a total of almost 100 integrations. We rejected events where the sema7asec-mKate2 integration occurred either farther away from the sensory arbor network or had happened in multiple neighboring cells.

      (4) Although arbors were imaged in live embryos, time is never presented as a variable, so I cannot tell whether axon topology was changing as the images were collected. This needs to be clarified.

      Comments: We imaged the trunk neuromasts of both control and sema7a-/- mutant live zebrsfish larvae at 2, 3, and 4 dpf. We imaged the control and the sema7a-/- mutants of each developmental stage in parallel, within a span of two hours, and repeated these experiments multiple times to gather almost a hundred larvae from each genotype. Even though the sensory arbor network is dynamic, we believe imaging both the genotypes in parallel and within a span of two hours, and averaging almost a hundred larvae from each genotype minimize the temporal variability observed in the arbor architecture.

      (5) Ideally, the authors should use CRISPR/cas-9 to create a mutation in the C-terminus that would prevent production of the GPI-anchored form and not of the diffusible form. I understand if this is too much work to do in a short time, and would be satisfied with another experiment that could distinguish roles for at least one isoform more clearly. For instance, it would be interesting to see an analysis of how far an axon can be from a source to detect diffusible Sema7a (live imaging would be ideal for this) and then to show that the effect is different when the membrane bound form is expressed.

      Comments: Thank you for this comment. We are currently working in generating transcript specific knockout animals.

      We have added live timelapse video microscopy data in lines 330-337, Figure 4H-J, Figure 4—figure supplement 2, Video15,16.

      We have added a new segment analyzing the membrane-bound transcript variant in lines 338-351.

      Reviewer #4 (Recommendations For The Authors):

      Feedback to authors

      Overall, this is a very important and novel study. Currently the manuscript does need revision.

      Major concerns:

      (1) Controls. For the ectoptic expression of Sema7a, injection of a construct expressing Sema7a under a heatshock promoter is used to drive ectopic expression. No heatshock (injected) animal are used as a control. In many systems heatshock can impact neuron morphology. And heatshock proteins are required for normal neurite and synapse formation. Please examine sensory axons in uninjected wildtype animals with heatshock.

      Comments: We have added this control experiment in a new segment, explained in detail in lines 348-350 and Figure 5.

      (2) Synapse staining - regarding Figure 5 and related supplement

      Understanding whether guidance defects ultimately impact synapse formation is an important aspect of this paper. Therefore, is necessary to have accurate measurements of the number of complete synapses, and the overall numbers of pre- and postsynaptic components. Currently the data plotted in Figure 5 is extensive, but the way the data is laid out, the relevant comparisons are challenging to make. Perhaps include this quantification in the supplement, and move the data from the supplement to the main figure? The quantifications in the supplement are easier to follow and easier to compare between genotypes.

      Comments: We have performed exhaustive analysis on the formation of pre- and postsynaptic structures and have identified how their distribution changes along neuromast development in control larvae. We have further analyzed how such distribution is perturbed in the sema7a-/- mutants. We believe that showing only the average numbers will not reveal the changes in the distribution of the synaptic structures during development and across genotypes.

      Looking at the data itself, there seems to be some discrepancies with the synaptic counts compared to published work. While the CTBP numbers seem in order, the Maguk numbers do not. In both mutant and control there are many hair cells without any Maguk puncta/aggregates-leading to 0.75-1 postsynapses per hair cell (Figure 5 supplement H-I). Typically, the numbers should be more comparable to what was obtained for CTBP, 3-4 puncta per cells (Figure 5 supplement B-C), especially by 3-4 dpf. 3-4 CTPB or Maguk puncta per cell is based on previously published immunostaining and EM work.

      The Maguk immunostaining, especially at early stages (2-3 dpf) is challenging. To compound a challenging immunostain, around 2019 Neuromab began to outsource the purification of their Maguk antibody. After this outsourcing our lab was no longer able to get reliable label with the Maguk antibody from Neuromab.

      Millipore sells the same monoclonal antibody and it works well: https://www.emdmillipore.com/US/en/product/Anti-pan-MAGUK-Antibody-clone-K2886,MM_NF-MABN72

      I would recommend this source.

      Comments: Thank you for suggesting the new MAGUK antibody. We have utilized this new MAGUK antibody from Millipore and added a new segment in lines 389-408. In future publication we will utilize this antibody to capture the postsynaptic densities in the sensory arbors.

      The discrepancies in the postsynaptic punctae number in our control larvae may arise due to the reliability of the Neuromab MAGUK antibody. We have utilized this same antibody to stain the sema7a-/- mutants and have observed a significant decrease in MAGUK punctae number and area. On grounds of keeping parity between the control and the sema7a-/- mutants, we have decided to keep our experimental results in the manuscript.

      In addition to a more accurate Maguk label, a combined pre- and post-synaptic label is essential to understand whether synapses pair properly in the sema7a mutants. This can be accomplished using subtype specific antibodies using goat anti-mouse IgG1/Maguk and goat anti-mouse IgG2a/CTBP secondaries.

      Comments: Thank you for suggesting this. We are preparing another manuscript in which we will utilize this technique along with other suggestions to tease apart the role of distinct transcript variants in regulating neural guidance and synapse formation.

      (3) Does sema7a lesion impact the number of hair cells per neuromast? If hair cell numbers are reduced several of the quantifications could be impacted.

      Comments: We have added the raw data with the hair cell counts in both control and sema7a-/- mutants across developmental stages. The homozygous sema7a-/- mutants have slightly less hair cells and we have normalized all our topological analyses by the corresponding hair cell numbers for each neuromast in each experiment (lines 669-675).

      (4) Could innervation just be developmentally delayed in sema7a mutants? At 4 dpf the sensory system is just starting to come online and could still be in the process of refinement. Did you look at slightly older ages, after the sensory system is functional behaviorally, for example, 6 dpf? Do the cores phenotypes (synapse defects and excess arbors) persist at 6 dpf in the sema7a mutants?

      Comments: The homozygous sema7a-/- mutants are unviable and start to die at 6 dpf. We therefore restricted our analysis until 4 dpf. The association of the sensory arbors with the clustered hair cells gradually decreases as the neuromasts mature from 2 dpf to 4dpf in the sema7a-/- mutants (lines 174-176, Figure 2I). Moreover, in the sema7a-/- mutants the sensory axons throw long projections that keep getting farther away from the clustered hair cells as the neuromast matures from 2 dpf to 4 dpf (lines 166-168, Figure 2H; Figure 2—figure supplement 1K,L). These observations suggests that if the phenotypes in the sema7a-/- mutants were due to developmental delays, then we should have seen a recovery of disrupted arborization patterns over time. But instead, we observe a further deterioration of the arborization patterns and other architectural assemblies. These findings confirm that the observed phenotypes in the sema7a-/- mutants are not due to delayed development of the larvae, but a specific outcome for the loss of Sema7A protein.

      Minor comments to address:

      Results

      Page 4 lines 89-91. For the readers, explain why you examined levels in Sema7a in rostral and caudal hair cells. Also, this sentence is, in general, a little bit misleading-initially reading that there is no difference in Sema7a at 1.5-4 dpf.

      Comments: In lines 44-48, we explain that the hair cells in the neuromast contain mechanoreceptive hair cells of opposing polarities that help them detect water currents from opposing directions. In lines 93-106, we tested whether the Sema7A level varies between the two polarities. We observed that the Sema7A level is similar between the two polarities of hair cells, but the average Sema7A intensity increases significantly over the developmental period of 2 dpf to 4 dpf in both rostrally and caudally polarized hair cells.

      Page 10-11 Lines 263-270. What was the frequency of these 2 outcomes- out of the 22 cases with ectopic expression?

      Comments: We have explained this in lines 291-292. We identified 22 mosaic sema7asecmKate2 integration events, in which a single mosaic ectopic integration had occurred near the network of sensory arbors, from a total of almost 100 integrations. We rejected events where the sema7asec-mKate2 integration occurred either farther away from the sensory arbor network or had happened in multiple neighboring cells.

      Discussion

      Page 14 Lines 359-360. There is not enough evidence provided in this work to suggest that the membrane attached form of Sema7a is playing a role. Both the secreted and membrane form are gone in the sema7a mutants. If the membrane attached form was specifically lesioned, and resulted in a phenotype, then there would be sufficient evidence. Currently there is strong evidence for a distinct role for the secreted form. Although the authors qualify the outlined statement with the word 'likely', stating this possibility in the discussion take-home is misleading.

      Comments: In future we will utilize the CRISPR/Cas9 technique to target the unique Cterminal domain of the GPI-anchored sema7a transcript variant. We believe that this will only perturb the formation of the full-length Sema7A protein and help us differentiate between the roles of the membrane-bound Sema7AGPI molecule and the secreted Sema7Asec in sensory arborization and synaptic assembly.

      It might be interesting in either the intro or discussion to reference the role Sema3F in axon guidance in the mouse auditory epithelium. https://elifesciences.org/articles/07830

      Comments: We have added this reference in lines 61-64.

      Figures

      Please indicate on one of your Figures where the mutation is (roughly) in the sema7a mutant (in addition to stating it in the results).

      Comments: We have added this information in Figure 2—figure supplement 1A.

      Either state or indicate in a Figure where the epitope used to make the Sema7a antibody-to show that the antibody is predicted to recognize both isoforms.

      Comments: We have stated the details of the epitope in lines 528-529.

      Figure 2-S1 what is the scale in panel A, is it different between mutant and wildtype?

      Comments: We have updated the images. New images are depicted in Figure 2—figure supplement 1A.

      Methods

      What were the methods used to quantify synapse number and area?

      Comments: We have added a new section in lines 702-708 to explain the measurement techniques.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This article presents important results describing how the gathering, integration, and broadcasting of information in the brain changes when consciousness is lost either through anesthesia or injury. They provide convincing evidence to support their conclusions, although the paper relies on a single analysis tool (partial information decomposition) and could benefit from a clearer explication of its conceptual basis, methodology, and results. The work will be of interest to both neuroscientists and clinicians interested in fundamental and clinical aspects of consciousness.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Luppi et al., apply the recently developed integrated information decomposition to the question how the architecture of information processing changes when consciousness is lost. They explore fMRI data from two different populations: healthy volunteers undergoing reversible anesthesia, as well as from patients who have long-term disorders of consciousness. They show that, in both populations, synergistic integration of information is disrupted in common ways. These results are interpreted in the context of the SAPHIRE model (recently proposed by this same group), that describes information processing in the brain as being composed of several distinct steps: 1) gatekeeping (where gateway regions introduce sensory information to the global synergistic workspace where 2) it is integrated or "processed" before 3) by broadcast back to to the brain.

      I think that this paper is an excellent addition to the literature on information theory in neuroscience, and consciousness science specifically. The writing is clear, the figures are informative, and the authors do a good job of engaging with existing literature. While I do have some questions about the interpretations of the various information-theoretic measures, all in all, I think this is a significant piece of science that I am glad to see added to the literature.

      One specific question I have is that I am still a little unsure about what "synergy" really is in this context. From the methods, it is defined as that part of the joint mutual information that is greater than the maximum marginal mutual information. While this is a perfectly fine mathematical measure, it is not clear to me what that means for a squishy organ like the brain. What should these results mean to a neuro-biologist or clinician?

      Right now the discussion is very high level, equating synergy to "information processing" or "integrated information", but it might be helpful for readers not steeped in multivariate information theory to have some kind of toy model that gets worked out in detail. On page 15, the logical XOR is presented in the context of the single-target PID, but 1) the XOR is discrete, while the data analyzed here are continuous BOLD signals w/ Gaussian assumptions and 2) the XOR gate is a single-target system, while the power of the Phi-ID approach is the multi-target generality. Is there a Gaussian analog of the single-target XOR gate that could be presented? Or some multi-target, Gaussian toy model with enough synergy to be interesting? I think this would go a long way to making this work more accessible to the kind of interdisciplinary readership that this kind of article with inevitably attract.

      We appreciate this observation. We now clarify that:

      “redundancy between two units occurs when their future spontaneous evolution is predicted equally well by the past of either unit. Synergy instead occurs when considering the two units together increases the mutual information between the units’ past and their future – suggesting that the future of each is shaped by its interactions with the other. At the microscale (e.g., for spiking neurons) this phenomenon has been suggested as reflecting “information modification” 36,40,47. Synergy can also be viewed as reflecting the joint contribution of parts of the system to the whole, that is not driven by common input48.”

      In the Methods, we have also added the following example to provide additional intuition about synergy in the case of continuous rather than discrete variables:

      “As another example for the case of Gaussian variables (as employed here), consider a 2-node coupled autoregressive process with two parameters: a noise correlation c and a coupling parameter a. As c increases, the system is flooded by “common noise”, making the system increasingly redundant because the common noise “swamps” the signal of each node. As a increases, each node has a stronger influence both on the other and on the system as a whole, and we expect synergy to increase. Therefore, synergy reflects the joint contribution of parts of the system to the whole that is not driven by common noise. This has been demonstrated through computational modelling (Mediano et al 2019 Entropy).”

      See below for the relevant parts of Figures 1 and 2 from Mediano et al (2019 Entropy), where Psi refers to the total synergy in the system.

      Author response image 1.

      Strengths

      The authors have a very strong collection of datasets with which to explore their topic of interest. By comparing fMRI scans from patients with disorders of consciousness, healthy resting state, and various stages of propofol anesthesia, the authors have a very robust sample of the various ways consciousness can be perturbed, or lost. Consequently, it is difficult to imagine that the observed effects are merely a quirk of some biophysical effect of propofol specifically, or a particular consequence of long-term brain injury, but do in fact reflect some global property related to consciousness. The data and analyses themselves are well-described, have been previously validated, and are generally strong. I have no reason to doubt the technical validity of the presented results.

      The discussion and interpretation of these results is also very nice, bringing together ideas from the two leading neurocognitive theories of consciousness (Global Workspace and Integrated Information Theory) in a way that feels natural. The SAPHIRE model seems plausible and amenable to future research. The authors discuss this in the paper, but I think that future work on less radical interventions (e.g. movie watching, cognitive tasks, etc) could be very helpful in refining the SAPHIRE approach.

      Finally, the analogy between the PID terms and the information provided by each eye redundantly, uniquely, and synergistically is superb. I will definitely be referencing this intuition pump in future discussions of multivariate information sharing.

      We are very grateful for these positive comments, and for the feedback on our eye metaphor.

      Weaknesses

      I have some concerns about the way "information processing" is used in this study. The data analyzed, fMRI BOLD data is extremely coarse, both in spatial and temporal terms. I am not sure I am convinced that this is the natural scale at which to talk about information "processing" or "integration" in the brain. In contrast to measures like sample entropy or Lempel-Ziv complexity (which just describe the statistics of BOLD activity), synergy and Phi are presented here as quasi-causal measures: as if they "cause" or "represent" phenomenological consciousness. While the theoretical arguments linking integration to consciousness are compelling, is this is right data set to explore them in? For example, the work by Newman, Beggs, and Sherril (nee Faber), synergy is associated with "computation" performed in individual neurons: the information about the future state of a target neuron that is only accessible when knowing both inputs (analogous to the synergy in computing the sum of two dice). Whether one thinks that this is a good approach neural computation or not, it fits within the commonly accepted causal model of neural spiking activity: neurons receive inputs from multiple upstream neurons, integrate those inputs and change their firing behavior accordingly.

      In contrast, here, we are looking at BOLD data, which is a proxy measure for gross-scale regional neural activity, which itself is a coarse-graining of millions of individual neurons to a uni-dimensional spectrum that runs from "inactive to active." It feels as though a lot of inferences are being made from very coarse data.

      We appreciate the opportunity to clarify this point. It is not our intention to claim that Phi-R and synergy, as measured at the level of regional BOLD signals, represent a direct cause of consciousness, or are identical to it. Rather, our work is intended to use these measures similarly to the use of sample entropy and LZC for BOLD signals: as theoretically grounded macroscale indicators, whose empirical relationship to consciousness may reveal the relevant underlying phenomena. In other words, while our results do show that BOLD-derived Phi-R tracks the loss and recovery of consciousness, we do not claim that they are the cause of it: only that an empirical relationship exists, which is in line with what we might expect on theoretical grounds. We have now clarified this in the Limitations section of our revised manuscript, as well as revising our language accordingly in the rest of the manuscript.

      We also clarify that the meaning of “information processing” that we adopt pertains to “intrinsic” information that is present in the system’s spontaneous dynamics, rather than extrinsic information about a task:

      “Information decomposition can be applied to neural data from different scales, from electrophysiology to functional MRI, with or without reference to behaviour 34. When behavioural data are taken into account, information decomposition can shed light on the processing of “extrinsic” information, understood as the translation of sensory signals into behavioural choices across neurons or regions 41,43,45,47. However, information decomposition can also be applied to investigate the “intrinsic” information that is present in the brain’s spontaneous dynamics in the absence of any tasks, in the same vein as resting-state “functional connectivity” and methods from statistical causal inference such as Granger causality 49. In this context, information processing should be understood in terms of the dynamics of information: where and how information is stored, transferred, and modified 34.”

      References:

      (1) Newman, E. L., Varley, T. F., Parakkattu, V. K., Sherrill, S. P. & Beggs, J. M. Revealing the Dynamics of Neural Information Processing with Multivariate Information Decomposition. Entropy 24, 930 (2022).

      Reviewer #2 (Public Review):

      The authors analysed functional MRI recordings of brain activity at rest, using state-of-the-art methods that reveal the diverse ways in which the information can be integrated in the brain. In this way, they found brain areas that act as (synergistic) gateways for the 'global workspace', where conscious access to information or cognition would occur, and brain areas that serve as (redundant) broadcasters from the global workspace to the rest of the brain. The results are compelling and consisting with the already assumed role of several networks and areas within the Global Neuronal Workspace framework. Thus, in a way, this work comes to stress the role of synergy and redundancy as complementary information processing modes, which fulfill different roles in the big context of information integration.

      In addition, to prove that the identified high-order interactions are relevant to the phenomenon of consciousness, the same analysis was performed in subjects under anesthesia or with disorders of consciousness (DOC), showing that indeed the loss of consciousness is associated with a deficient integration of information within the gateway regions.

      However, there is something confusing in the redundancy and synergy matrices shown in Figure 2. These are pair-wise matrices, where the PID was applied to identify high-order interactions between pairs of brain regions. I understand that synergy and redundancy are assessed in the way the brain areas integrate information in time, but it is still a little contradictory to speak about high-order in pairs of areas. When talking about a "synergistic core", one expects that all or most of the areas belonging to that core are simultaneously involved in some (synergistic) information processing, and I do not see this being assessed with the currently presented methodology. Similarly, if redundancy is assessed only in pairs of areas, it may be due to simple correlations between them, so it is not a high-order interaction. Perhaps it is a matter of language, or about the expectations that the word 'synergy' evokes, so a clarification about this issue is needed. Moreover, as the rest of the work is based on these 'pair-wise' redundancy and synergy matrices, it becomes a significative issue.

      We are grateful for the opportunity to clarify this point. We should highlight that PhiID is in fact assessing four variables: the past of region X, the past of region B, the future of region X, and the future of region Y. Since X and Y each feature both in the past and in the future, we can re-conceptualise the PhiID outputs as reflecting the temporal evolution of how X and Y jointly convey information: the persistent redundancy that we consider corresponds to information that is always present in both X and Y; whereas the persistent synergy is information that X and Y always convey synergistically. In contrast, information transfer would correspond to the phenomenon whereby information was conveyed by one variable in the past, and by the other in the future (see Luppi et al., 2024 TICS; and Mediano et al., 2021 arXiv for more thorough discussions on this point). We have now added this clarification in our Introduction and Results, as well as adding the new Figure 2 to clarify the meaning of PhiID terms.

      We would also like to clarify that all the edges that we identify as significantly changing are indeed simultaneously involved in the difference between consciousness and unconsciousness. This is because the Network-Based Statistic differs from other ways of identifying edges that are significantly different between two groups or conditions, because it does not consider edges in isolation, but only as part of a single connected component.

      Reviewer #3 (Public Review):

      The work proposes a model of neural information processing based on a 'synergistic global workspace,' which processes information in three principal steps: a gatekeeping step (information gathering), an information integration step, and finally, a broadcasting step. The authors determined the synergistic global workspace based on previous work and extended the role of its elements using 100 fMRI recordings of the resting state of healthy participants of the HCP. The authors then applied network analysis and two different measures of information integration to examine changes in reduced states of consciousness (such as anesthesia and after-coma disorders of consciousness). They provided an interpretation of the results in terms of the proposed model of brain information processing, which could be helpful to be implemented in other states of consciousness and related to perturbative approaches. Overall, I found the manuscript to be well-organized, and the results are interesting and could be informative for a broad range of literature, suggesting interesting new ideas for the field to explore. However, there are some points that the authors could clarify to strengthen the paper. Key points include:

      (1) The work strongly relies on the identification of the regions belonging to the synergistic global workspace, which was primarily proposed and computed in a previous paper by the authors. It would be great if this computation could be included in a more explicit way in this manuscript to make it self-contained. Maybe include some table or figure being explicit in the Gradient of redundancy-to-synergy relative importance results and procedure.

      We have now added the new Supplementary Figure 1 to clarify how the synergistic workspace is identified, as per Luppi et al (2022 Nature Neuroscience).

      (2) It would be beneficial if the authors could provide further explanation regarding the differences in the procedure for selecting the workspace and its role within the proposed architecture. For instance, why does one case uses the strength of the nodes while the other case uses the participation coefficient? It would be interesting to explore what would happen if the workspace was defined directly using the participation coefficient instead of the strength. Additionally, what impact would it have on the procedure if a different selection of modules was used? For example, instead of using the RSN, other criteria, such as modularity algorithms, PCA, Hidden Markov Models, Variational Autoencoders, etc., could be considered. The main point of my question is that, probably, the RSN are quite redundant networks and other methods, as PCA generates independent networks. It would be helpful if the authors could offer some comments on their intuition regarding these points without necessarily requiring additional computations.

      We appreciate the opportunity to clarify this point. Our rationale for the procedure used to identify the workspace is to find regions where synergy is especially prominent. This is due to the close mathematical relationship between synergistic information and integration of information (see also Luppi et al., 2024 TICS), which we view as the core function of the global workspace. This identification is based on the strength ranking, as per Luppi et al (2022 Nature Neuroscience), which demonstrated that regions where synergy predominates (i.e., our proposed workspace) are also involved with high-level cognitive functions and anatomically coincide with transmodal association cortices at the confluence of multiple information streams. This is what we should expect of a global workspace, which is why we use the strength of synergistic interactions to identify it, rather than the participation coefficient. Subsequently, to discern broadcasters from gateways within the synergistic workspace, we seek to encapsulate the meaning of a “broadcaster” in information terms. We argue that this corresponds with making the same information available to multiple modules. Sameness of information corresponds to redundancy, and multiplicity of modules can be reflected in the network-theoretic notion of participation coefficient. Thus, a broadcaster is a region in the synergistic workspace (i.e., a region with strong synergistic interactions) that in addition has a high participation coefficient for its redundant interactions.

      Pertaining specifically to the use of resting-state networks as modules, indeed our own (Luppi et al., 2022 Nature Neuroscience) and others’ research has shown that each RSN entertains primarily redundant interactions among its constituent regions. This is not surprising, since RSNs are functionally defined: their constituent elements need to process the same information (e.g., pertaining to a visual task in case of the visual network). We used the RSNs as our definition of modules, because they are widely understood to reflect the intrinsic organisation of brain activity into functional units; for example, Smith et al., (2009 PNAS) and Cole et al (2014 Neuron) both showed that RSNs reflect task-related co-activation of regions, whether directly quantified from fMRI in individuals performing multiple tasks, or inferred from meta-analysis of the neuroimaging literature. This is the aspect of a “module” that matters from the global workspace perspective: modules are units with distinct function, and RSNs capture this well. This is therefore why we use the RSNs as modules when defining the participation coefficient: they provide an a-priori division into units with functionally distinct roles.

      Nonetheless, we also note that RSN organisation is robustly recovered using many different methods, including seed-based correlation from specific regions-of-interest, or Independent Components Analysis, or community detection on the network of inter-regional correlations - demonstrating that they are not merely a function of the specific method used to identify them. In fact, we show significant correlation between participation coefficient defined in terms of RSNs, and in terms of modules identified in a purely data-driven manner from Louvain consensus clustering (Figure S4).

      (3) The authors acknowledged the potential relevance of perturbative approaches in terms of PCI and quantification of consciousness. It would be valuable if the authors could also discuss perturbative approaches in relation to inducing transitions between brain states. In other words, since the authors investigate disorders of consciousness where interventions could provide insights into treatment, as suggested by computational and experimental works, it would be interesting to explore the relationship between the synergistic workspace and its modifications from this perspective as well.

      We thank the Reviewer for bringing this up: we now cite several studies that in recent years have applied perturbative approaches to induce transitions between states of consciousness.

      “The PCI is used as a means of assessing the brain’s current state, but stimulation protocols can also be adopted to directly induce transitions between states of consciousness. In rodents, carbachol administration to frontal cortex awakens rats from sevoflurane anaesthesia120, and optogenetic stimulation was used to identify a role of central thalamus neurons in controlling transitions between states of responsiveness121,122. Additionally, several studies in non-human primates have now shown that electrical stimulation of the central thalamus can reliably induce awakening from anaesthesia, accompanied by the reversal of electrophysiological and fMRI markers of anaesthesia 123–128. Finally, in human patients suffering from disorders of consciousness, stimulation of intra-laminar central thalamic nuclei was reported to induce behavioural improvement 129, and ultrasonic stimulation 130,131 and deep-brain stimulation are among potential therapies being considered for DOC patients 132,133. It will be of considerable interest to determine whether our corrected measure of integrated information and topography of the synergistic workspace also restored by these causal interventions.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would appreciate it if the authors could revisit the figures and make sure that:

      (1) All fonts are large enough to be readable for people with visual impairments (for ex. the ranges on the colorbars in Fig. 2 are unreadably small).

      Thank you: we have increased font sizes.

      (2) The colormaps are scaled to show meaningful differences (Fig. 2A)

      We have changed the color scale in Figure 2A and 2B.

      Also, the authors may want to revisit the references section: some of the papers that were pre-prints at one point have now been published and should be updated.

      Thank you: we have updated our references.

      Minor comments:

      • In Eqs. 2 and 3, the unique information term uses the bar notation ( | ) that is typically indicative of "conditioned on." Perhaps the authors could use a slash notation (e.g. Unq(X ; Z / Y)) to avoid this ambiguity? My understanding of the Unique information is that it is not necessarily "conditioned on", so much as it is "in the context of".

      Indeed, the “|” sign of “conditioning” could be misleading; however, the “/” sign could also be misleading, if interpreted as division. Therefore, we have opted for the “\” sign of “set difference”, in Eq 2 and 3, which is conceptually more appropriate in this context.

      • The font on the figures is a little bit small - for readers with poor eyes, it might be helpful to increase the wording size.

      We have increased font sizes in the figures where relevant.

      • I don't quite understand what is happening in Fig. 2A - perhaps it is a colormap issue, but it seems as though it's just a bit white square? It looks like redundancy is broadly correlated with FC (just based on the look of the adjacency matrices), but I have no real sense of what the synergistic matrix looks like, other than "flat."

      We have now changed the color scale in Figure 2.

      Reviewer #2 (Recommendations For The Authors):

      Besides the issues mentioned in the Public review, I have the following suggestions to improve the manuscript:

      • At the end of the introduction, a few lines could be added explaining why the study of DOC patients and subjects under anesthesia will be informative in the context of this work.

      By comparing functional brain scans from transient anaesthetic-induced unconsciousness and from the persistent unconsciousness of DOC patients, which arises from brain injury, we can search for common brain changes associated with loss of consciousness – thereby disambiguating what is specific to loss of consciousness.

      • On page and in general the first part of Results, it is not evident that you are working with functional connectivity. Many times the word 'connection' is used and sometimes I was wondering whether they were structural or functional. Please clarify. Also, the meaning of 'synergistic connection' or 'redundant connection' could be explained in lay terms.

      Thank you for bringing this up. We have now replaced the word “connection” with “interaction” to disambiguate this issue, further adding “functional” where appropriate. We have also provided, in the Introduction, an intuitive explanation of what synergy and redundancy mean int he context of spontaneous fMRI signals.

      • Figure 2 needs a lot of improvement. The matrix of synergistic interactions looks completely yellow-ish with some vague areas of white. So everything is above 2. What does it mean?? Pretty uninformative. The matrix of redundant connections looks a lot of black, with some red here and there. So everything is below 0.6. Also, what are the meaning and units of the colorbars?.

      We agree: we have increased font sizes, added labels, and changed the color scale in Figure 2. We hope that the new version of Figure 2 will be clearer.

      • Caption of Figure 2 mentions "... brain regions identified as belonging to the synergistic global workspace". I didn't get it clear how do you define these areas. Are they just the sum of gateways and broadcasters, or is there another criterion?

      Regions belonging to the synergistic workspace are indeed the set comprising gateways and broadcasters; they are the regions that are synergy-dominated, as defined in Luppi et al., 2022 Nature Neuroscience. We have now clarified this in the figure caption.

      • In the first lines of page 7, it is said that data from DOC and anesthesia was parcellated in 400 + 54 regions. However, it was said in a manner that made me think it was a different parcellation than the other data. Please make it clear that the parcellation is the same (if it is).

      We have now clarified that the 400 cortical regions are from the Schaefer atlas, and 54 subcortical regions from the Tian atlas, as for the other analysis. The only other parcellation that we use is the Schaefer-232, for the robustness analysis. This is also reported in the Methods.

      • Figure 3: the labels in the colorbars cannot be read, please make them bigger. Also, the colorbars and colorscales should be centered in white, to make it clear that red is positive and blue is negative. O at least maintain consistency across the panels (I can't tell because of the small numbers).

      Thank you: we have increased font sizes, added labels, indicated that white refers to zero (so that red is always an increase, and blue is always a decrease), and changed the color scale in Figure 2.

      • The legend of Figure 4 is written in a different style, interpreting the figure rather than describing it. Please describe the figure in the caption, in order to let the read know what they are looking at.

      We have endeavoured to rewrite the legend of Figure 4 in a style that is more consistent with the other figures.

      • In several parts the 'whole-minus-sum' phi measure is mentioned and it is said that it did not decrease during loss of consciousness. However, I did not see any figure about that nor any conspicuous reference to that in Results text. Where is it?

      We apologise for the confusion: this is Figure S3A, in the Supplementary. We have now clarified this in the text.

      Reviewer #3 (Recommendations For The Authors):

      (1) In the same direction, regarding Fig. 2, in my opinion, it does not effectively aid in understanding the selection of regions as more synergistic or redundant. In panels A) and B), the color scales could be improved to better distinguish regions in the matrices (panel A) is saturated at the upper limit, while panel B) is saturated at the lower limit). Additionally, I suggest indicating in the panels what is being measured with the color scales.

      Thank you: we have increased font sizes, added labels, and changed the color scale in Figure 2.

      (2) When investigating the synergistic core of human consciousness and interpreting the results of changes in information integration measures in terms of the proposed framework, did the authors consider the synergistic workspace computed in HCP data? If the answer is positive, it would be helpful for the authors to be more explicit about it and elaborate on any differences that may be found, as well as the potential impact on interpretation.

      This is correct: the synergistic workspace, including gateways and broadcasters, are identified from the Human Connectome Project dataset. We now clarify this in the manuscript.

      Minors:

      (1) I would suggest improving the readability of figures 2 and 3, considering font size (letters and numbers) and color bars (numbers and indicate what is measured with this scale). In Figure 1, the caption defines steps instead stages that are indicated in the figure.

      Thank you: we have increased font sizes, added labels, and replaced steps with “stages” in Figure 1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Qin et al. set out to investigate the role of mechanosensory feedback during swallowing and identify neural circuits that generate ingestion rhythms. They use Drosophila melanogaster swallowing as a model system, focusing their study on the neural mechanisms that control cibarium filling and emptying in vivo. They find that pump frequency is decreased in mutants of three mechanotransduction genes (nompC, piezo, and Tmc), and conclude that mechanosensation mainly contributes to the emptying phase of swallowing. Furthermore, they find that double mutants of nompC and Tmc have more pronounced cibarium pumping defects than either single mutants or Tmc/piezo double mutants. They discover that the expression patterns of nompC and Tmc overlap in two classes of neurons, md-C and md-L neurons. The dendrites of md-C neurons warp the cibarium and project their axons to the subesophageal zone of the brain. Silencing neurons that express both nompC and Tmc leads to severe ingestion defects, with decreased cibarium emptying. Optogenetic activation of the same population of neurons inhibited filling of the cibarium and accelerated cibarium emptying. In the brain, the axons of nompC∩Tmc cell types respond during ingestion of sugar but do not respond when the entire fly head is passively exposed to sucrose. Finally, the authors show that nompC∩Tmc cell types arborize close to the dendrites of motor neurons that are required for swallowing, and that swallowing motor neurons respond to the activation of the entire Tmc-GAL4 pattern.

      Strengths:

      • The authors rigorously quantify ingestion behavior to convincingly demonstrate the importance of mechanosensory genes in the control of swallowing rhythms and cibarium filling and emptying

      • The authors demonstrate that a small population of neurons that express both nompC and Tmc oppositely regulate cibarium emptying and filling when inhibited or activated, respectively

      • They provide evidence that the action of multiple mechanotransduction genes may converge in common cell types

      Thank you for your insightful and detailed assessment of our work. Your constructive feedback will help to improve our manuscript.

      Weaknesses:

      • A major weakness of the paper is that the authors use reagents that are expressed in both md-C and md-L but describe the results as though only md-C is manipulated-Severing the labellum will not prevent optogenetic activation of md-L from triggering neural responses downstream of md-L. Optogenetic activation is strong enough to trigger action potentials in the remaining axons. Therefore, Qin et al. do not present convincing evidence that the defects they see in pumping can be specifically attributed to md-C.

      Thank you for your comments. This is important point that we did not adequately address in the original preprint. We have obtained imaging and behavioral results that strongly suggest md-C, rather than md-L, are essential for swallowing behavior.

      36 hours after the ablation of the labellum, the signals of md-L were hardly observable when GFP expression was driven by the intersection between Tmc-GAL4 & nompC-QF (see F Figure 3—figure supplement 1A). This observation indicates that the axons of md-L likely degenerated after 36 hours, and were unlikely to influence swallowing. Moreover, the projecting pattern of Tmc-GAL4 & nompC-QF>>GFP exhibited no significant changes in the brain post labellum ablation.

      Furthermore, even after labellum ablation for 36 hours, flies exhibited responses to light stimulation (see Figure 3—figure supplement 1B-C, Video 5) when ReaChR was expressed in md-C. We thus reasoned that md-C but not md-L, plays a crucial role in the swallowing process.

      • GRASP is known to be non-specific and prone to false positives when neurons are in close proximity but not synaptically connected. A positive GRASP signal supports but does not confirm direct synaptic connectivity between md-C/md-L axons and MN11/MN12.

      In this study, we employed the nSyb-GRASP, wherein the GRASP is expressed at the presynaptic terminals by fusion with the synaptic marker nSyb. This method demonstrates an enhanced specificity compared to the original GRASP approach.

      Additionally, we utilized +/ UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; + / MN-LexA fruit flies as a negative control to mitigate potential false signals originating from the tool itself (Author response image 1, scale bar = 50μm). Beside the genotype Tmc-Gal4, Tub(FRT. Gal80) / UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies discussed in this manuscript, we also incorporated genotype Tmc-Gal4, Tub(FRT. Gal80) / lexAop-nSyb-spGFP1-10, UAS-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies as a reverse control (Author response image 2). Unexpectedly, similar positive signals were observed, indicating that, positive signals may emerge due to close proximity between neurons even with nSyb-GRASP.

      Author response image 1

      It should be noted that the existence of synaptic projections from motor neurons (MN) to md-C cannot be definitively confirmed at this juncture. At present, we can only posit the potential for synaptic connections between md-C and motor neurons. A more conclusive conclusion may be attainable with the utilization of comprehensive whole-brain connectome data in future studies.

      Author response image 2

      • As seen in Figure 2—figure supplement 1, the expression pattern of Tmc-GAL4 is broader than md-C alone. Therefore, the functional connectivity the authors observe between Tmc expressing neurons and MN11 and 12 cannot be traced to md-C alone

      It is true that the expression pattern of Tmc-GAL4 is broader than that of md-C alone. Our experiments, including those flies expressing TNT in Tmc+ neurons, demonstrated difficulties in emptying (Figure 2A, 2D). Notably, we encountered challenges in finding fly stocks bearing UAS>FRT-STOP-P2X2. Consequently, we opted to utilize Tmc-GAL4 to drive UAS-P2X2 instead. We believe that the results further support our hypothesis on the role of md-C in the observed behavioral change in emptying.

      Overall, this work convincingly shows that swallowing and swallowing rhythms are dependent on several mechanosensory genes. Qin et al. also characterize a candidate neuron, md-C, that is likely to provide mechanosensory feedback to pumping motor neurons, but the results they present here are not sufficient to assign this function to md-C alone. This work will have a positive impact on the field by demonstrating the importance of mechanosensory feedback to swallowing rhythms and providing a potential entry point for future investigation of the identity and mechanisms of swallowing central pattern generators.

      Reviewer #2 (Public Review):

      In this manuscript, the authors describe the role of cibarial mechanosensory neurons in fly ingestion. They demonstrate that pumping of the cibarium is subtly disrupted in mutants for piezo, TMC, and nomp-C. Evidence is presented that these three genes are co-expressed in a set of cibarial mechanosensory neurons named md-C. Silencing of md-C neurons results in disrupted cibarial emptying, while activation promotes faster pumping and/or difficulty filling. GRASP and chemogenetic activation of the md-C neurons is used to argue that they may be directly connected to motor neurons that control cibarial emptying.

      The manuscript makes several convincing and useful contributions. First, identifying the md-C neurons and demonstrating their essential role for cibarium emptying provides reagents for further studying this circuit and also demonstrates the important of mechanosensation in driving pumping rhythms in the pharynx. Second, the suggestion that these mechanosensory neurons are directly connected to motor neurons controlling pumping stands in contrast to other sensory circuits identified in fly feeding and is an interesting idea that can be more rigorously tested in the future.

      At the same time, there are several shortcomings that limit the scope of the paper and the confidence in some claims. These include:

      a) the MN-LexA lines used for GRASP experiments are not characterized in any other way to demonstrate specificity. These were generated for this study using Phack methods, and their expression should be shown to be specific for MN11 and MN12 in order to interpret the GRASP experiments.

      Thanks for the suggestion. We have checked the expression pattern of MN-LexA, which is similar to MN-GAL4 used in previous work (Manzo et al., PNAS., 2012, PMID:22474379) . Here is the expression pattern:

      Author response image 3

      b) There is also insufficient detail for the P2X2 experiment to evaluate its results. Is this an in vivo or ex vivo prep? Is ATP added to the brain, or ingested? If it is ingested, how is ATP coming into contact with md-C neuron if it is not a chemosensory neuron and therefore not exposed to the contents of the cibarium?

      The P2X2 experimental preparation was done ex vivo. We immersed the fly in the imaging buffer, as described in the Methods section under Functional Imaging. Following dissection and identification of the subesophageal zone (SEZ) area under fluorescent microscopy, we introduced ATP slowly into the buffer, positioned at a distance from the brain

      c) In Figure 3C, the authors claim that ablating the labellum will remove the optogenetic stimulation of the md-L neuron (mechanosensory neuron of the labellum), but this manipulation would presumably leave an intact md-L axon that would still be capable of being optogenetically activated by Chrimson.

      Please refer to the corresponding answers for reviewer 1 and Figure 3—figure supplement 1.

      d) Average GCaMP traces are not shown for md-C during ingestion, and therefore it is impossible to gauge the dynamics of md-C neuron activation during swallowing. Seeing activation with a similar frequency to pumping would support the suggested role for these neurons, although GCaMP6s may be too slow for these purposes.

      Profiling the dynamics of md-C neuron activation during swallowing is crucial for unraveling the operational model of md-C and validating our proposed hypothesis. Unfortunately, our assay faces challenges in detecting probable 6Hz fluorescent changes with GCaMP6s.

      In general, we observed an increase of fluorescent signals during swallowing, but movement of alive flies during swallowing influenced the imaging recording, so we could not depict a decent tracing for calcium imaging for md-C neurons. To enhance the robustness of our findings, patching the md-C neurons would be a more convincing approach. As illustrated in Figure 2, the somata of md-C neurons are situated in the cibarium rather than the brain. patching of the md-C neuron somata in flies during ingestion is difficult.

      e) The negative result in Figure 4K that is meant to rule out taste stimulation of md-C is not useful without a positive control for pharyngeal taste neuron activation in this same preparation.

      We followed methods used in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which we believe could confirm that md-C do not respond to sugars.

      In addition to the experimental limitations described above, the manuscript could be organized in a way that is easier to read (for example, not jumping back and forth in figure order).

      Thanks for your suggestion and the manuscript has been reorganized.

      Reviewer #3 (Public Review):

      Swallowing is an essential daily activity for survival, and pharyngo-laryngeal sensory function is critical for safe swallowing. In Drosophila, it has been reported that the mechanical property of food (e.g. Viscosity) can modulate swallowing. However, how mechanical expansion of the pharynx or fluid content sense and control swallowing was elusive. Qin et al. showed that a group of pharyngeal mechanosensory neurons, as well as mechanosensory channels (nompC, Tmc, and Piezo), respond to these mechanical forces for regulation of swallowing in Drosophila melanogaster.

      Strengths:

      There are many reports on the effect of chemical properties of foods on feeding in fruit flies, but only limited studies reported how physical properties of food affect feeding especially pharyngeal mechanosensory neurons. First, they found that mechanosensory mutants, including nompC, Tmc, and Piezo, showed impaired swallowing, mainly the emptying process. Next, they identified cibarium multidendritic mechanosensory neurons (md-C) are responsible for controlling swallowing by regulating motor neuron (MN) 12 and 11, which control filling and emptying, respectively.

      Weaknesses:

      While the involvement of md-C and mechanosensory channels in controlling swallowing is convincing, it is not yet clear which stimuli activate md-C. Can it be an expansion of cibarium or food viscosity, or both? In addition, if rhythmic and coordinated contraction of muscles 11 and 12 is essential for swallowing, how can simultaneous activation of MN 11 and 12 by md-C achieve this? Finally, previous reports showed that food viscosity mainly affects the filling rather than the emptying process, which seems different from their finding.

      We have confirmed that swallowing sucrose water solution activated md-C neurons, while sucrose water solution alone could not (Figure 4J-K). We hypothesized that the viscosity of the food might influence this expansion process.

      While we were unable to delineate the activation dynamics of md-C neurons, our proposal posits that these neurons could be activated in a single pump cycle, sequentially stimulating MN12 and MN11. Another possibility is that the activation of md-C neurons acts as a switch, altering the oscillation pattern of the swallowing central pattern generator (CPG) from a resting state to a working state.

      In the experiments with w1118 flies fed with MC (methylcellulose) water, we observed that viscosity predominantly affects the filling process rather than the emptying process, consistent with previous findings. This raises an intriguing question. Our investigation into the mutation of mechanosensitive ion channels revealed a significant impact on the emptying process. We believe this is due to the loss of mechanosensation affecting the vibration of swallowing circuits, thereby influencing both the emptying and filling processes. In contrast, viscosity appears to make it more challenging for the fly to fill the cibarium with food, primarily attributable to the inherent properties of the food itself.

      Reviewer #4 (Public Review):

      A combination of optogenetic behavioral experiments and functional imaging are employed to identify the role of mechanosensory neurons in food swallowing in adult Drosophila. While some of the findings are intriguing and the overall goal of mapping a sensory to motor circuit for this rhythmic movement are admirable, the data presented could be improved.

      The circuit proposed (and supported by GRASP contact data) shows these multi-dendritic neurons connecting to pharyngeal motor neurons. This is pretty direct - there is no evidence that they affect the hypothetical central pattern generator - just the execution of its rhythm. The optogenetic activation and inhibition experiments are constitutive, not patterned light, and they seem to disrupt the timing of pumping, not impose a new one. A slight slowing of the rhythm is not consistent with the proposed function.

      Motor neurons implicated in patterned motions can be considered effectors of Central Pattern Generators (CPGs)(Marder et al., Curr Biol., 2001, PMID: 11728329; Hurkey et al., Nature., 2023, PMID:37225999). Given our observation of the connection between md-C neurons and motor neurons, it is reasonable to speculate that md-C neurons influence CPGs. Compared to the patterned light (0.1s light on and 0.1s light off) used in our optogenetic experiments, we noted no significant changes in their responses to continuous light stimulation. We think that optogenetic methods may lead to overstimulation of md-C neurons, failing to accurately mimic the expansion of the cibarium during feeding.

      Dysfunction in mechanosensitive ion channels or mechanosensory neurons not only disrupts the timing of pumping but also results in decreased intake efficiency (Figure 1E). The water-swallowing rhythm is generally stable in flies, and swallowing is a vital process that may involve redundant ion channels to ensure its stability.

      The mechanosensory channel mutants nompC, piezo, and TMC have a range of defects. The role of these channels in swallowing may not be sufficiently specific to support the interpretation presented. Their other defects are not described here and their overall locomotor function is not measured. If the flies have trouble consuming sufficient food throughout their development, how healthy are they at the time of assay? The level of starvation or water deprivation can affect different properties of feeding - meal size and frequency. There is no description of how starvation state was standardized or measured in these experiments.

      Defects in mechanosensory channel mutants nompC, piezo, and TMC, have been extensively investigated (Hehlert et al., Trends Neurosci., 2021, PMID:332570000). Mutations in these channels exhibit multifaceted effects, as illustrated in our RNAi experiments (see Figure 2E). Deprivation of water and food was performed in empty fly vials. It's important to note that the duration of starvation determines the fly's willingness to feed but not the pump frequency (Manzo et al., PNAS., 2012, PMID:22474379).

      In most cases, female flies were deprived water and food in empty vials for 24 hours because after that most flies would be willing to drink water. The deprivation time is 12 hours for flies with nompC and Tmc mutated or flies with Kir2.1 expressed in md-C neurons, as some of these flies cannot survive 24h deprivation.

      The brain is likely to move considerably during swallow, so the GCaMP signal change may be a motion artifact. Sometimes this can be calculated by comparing GCaMP signal to that of a co-expressed fluorescent protein, but there is no mention that this is done here. Therefore, the GCaMP data cannot be interpreted.

      We did not co-express a fluorescent protein with GCaMP for md-C. The head of the fly was mounted onto a glass slide, and we did not observe significant signal changes before feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      .>Abstract: I disagree that swallow is the first step of ingestion. The first paragraph also mentions the final checkpoint before food ingestion. Perhaps sufficient to say that swallow is a critical step of ingestion.

      Indeed, it is not rigorous enough to say “first step”. This has been replaced by “early step”.

      Introduction:

      Line 59: "Silence" should be "Silencing"

      This has been replaced.

      Results:

      Lines 91-92: I am not clear about what this means. 20% of nompC and 20% of wild-type flies exhibit incomplete filling? So nompC is not different from wild-type?

      Sorry for the mistake. Viscous foods led to incomplete emptying (not incomplete filling), as displayed in Video 4. The swallowing behavior differs between nompC mutants and wild-type flies, as illustrated in Figure 1C, Figure 1—figure supplement 1A-C and video 1&5.

      When fed with 1% MC water solution (Figure 1—figure supplement 1E-H). We found that when fed with 1% MC watere solution, Tmc or piezo mutants displayed incomplete emptying, which could constitute a long time proportion of swallowing behavior; while only 20% of nompC flies and 20% of wild-type flies sporadically exhibit incomplete emptying, which is significantly different. Though the percent of flies displaying incomplete pump is similar between nompC mutant and wild-type files, you can find it quite different in video 1 and 5.

      Line 94: Should read: “while for foods with certain viscosity, the pump of Tmc or piezo mutants might"

      What evidence is there for weakened muscle motion? The phenotypes of all three mutants is quite similar, so concluding that they have roles in initiation versus swallowing strength is not well supported -this would be better moved to the discussion since it is speculative.

      Muscles are responsible for pumping the bolus from the mouth to the crop. In the case of Tmc or piezo mutants, as evidenced by incomplete filling for viscous foods (see Video 4), we speculate that the loss of sensory stimuli leads to inadequate muscle contraction. The phenotypes observed in Tmc and piezo mutants are similar yet distinct from those of the wild-type or nompC mutant, as shown in Video 1 and 4. The phrase "due to weakened muscle motion" has been removed for clarity.

      Line 146: If md-L neurons are also labeled by this intersection, then you are not able to know whether the axons seen in the brain are from md-L or md-C neurons. Line 148: cutting the labellum is not sufficient to ablate md-L neurons. The projections will still enter the brain and can be activated with optogenetics, even after severing the processes that reside in the labellum.

      Please refer to the responses for reviewer #1 (Public Review):” A major weakness of the paper…” and Figure 4.

      Line 162: If the fly head alone is in saline, do you know that the sucrose enters the esophagus? The more relevant question here is whether the md-C neurons respond to mechanical force. If you could artificially inflate the cibarium with air and see the md-C neurons respond that would be a more convincing result. So far you only know that these are activated during ingestion, but have not shown that they are activated specifically by filling or emptying. In addition, you are not only imaging md-C (md-L is also labeled). This caveat should be mentioned.

      We followed the methods outlined in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which suggested that md-C neurons do not respond to sugars. While we aimed to mechanically stimulate md-C neurons, detecting signal changes during different steps of swallowing is challenging. This aspect could be further investigated in subsequent research with the application of adequate patch recording or two-photon microscopy (TPM).

      Figure 3: It is not clear what the pie charts in Figure 3 A refer to. What are the three different rows, and what does blue versus red indicate?

      Figure 3A illustrates three distinct states driven by CsChrimson light stimulation of md-C neurons, with the proportions of flies exhibiting each state. During light activation, flies may display difficulty in filling, incomplete filling, or a normal range of pumping. The blue and red bars represent the proportions of flies showing the corresponding state, as indicated by the black line.

      Figure 4: Where are the example traces for J? The comparison in K should be average dF/F before ingestion compared with average dF/F during ingestion. Comparing the in vitro response to sucrose to the in vivo response during ingestion is not a useful comparison.

      Please refer to the answers for reviewer #2 question d).

      Reviewer #2 (Recommendations For The Authors):

      Suggested experiments that would address some of my concerns listed in the public review include:

      a) high resolution SEZ images of MN-LexA lines crossed to LexAop-GFP to demonstrate their specificity

      b) more detail on the P2X2 experiment. It is hard to make suggestions beyond that without first seeing the details.

      c) presenting average GCaMP traces for all calcium imaging results

      d) to rule out taste stimulation of md-C (Figure 4K) I would suggest performing more extensive calcium imaging experiments with different stimuli. For example, sugar, water, and increasing concentrations of a neutral osmolyte (e.g. PEG) to suppress the water response. I think that this is more feasible than trying to get an in vitro taste prep to be convincing.

      Please refer to the responses for public review of reviewer #2.

      Reviewer #3 (Recommendations For The Authors):

      Below I list my suggestions as well as criticisms.

      (1) It would be excellent if the authors could demonstrate whether varying levels of food viscosity affect md-C activation.

      That is a good point, and could be studied in future work.

      (2) It is not clear whether an intersectional approach using TMC-GAL4 and nompC-QF abolishes labelling of the labellar multidendritic neurons. If this is the case, please show labellar multidendritic neurons in TMC-GAL4 only flies and flies using the intersectional approach. Along with this question, I am concerned that labellum-removed flies could be used for feeding assay.

      Intersectional labelling using TMC-GAL4 and nompC-QF could not abolish labelling of the labellar multidendritic neurons (Author response image 4). Labellum-removed flies could be used for feeding assay (Figure 3—figure supplement 1B-C, video 5), but once LSO or cibarium of fly was damaged, swallowing behavior would be affected. Removing labellum should be very careful.

      Author response image 4

      (3) Please provide the detailed methods for GRASP and include proper control.

      Please refer to the responses for public review of reviewer #1.

      (4) The authors hypothesized that md-C sequentially activates MN11 and 12. Is the time gap between applying ATP on md-C and activation of MN11 or MN12 different? Please refer to the responses for public review of reviewer #3. The time gap between applying ATP on md-C and activation of MN11 or MN12 didn’t show significant differences, and we think the reason is that the ex vivo conditions could not completely mimic in vivo process.

      I found the manuscript includes many errors, which need to be corrected.

      (1) The reference formatting needs to be rechecked, for example, lines 37, 42, and 43.

      (2) Line 44-46: There is some misunderstanding. The role of pharyngeal mechanosensory neurons is not known compared with chemosensory neurons.

      (3) Line 49: Please specify which type of quality of food. Chemical or physical?

      (4) Line 80 and Figure 1B-D Authors need to put filling and emptying time data in the main figure rather than in the supplementary figure. Otherwise, please cite the relevant figures in the text(S1A-C).

      (5) Line 84-85; Is "the mutant animals" indicating only nompC? Please specify it.

      (6) Figure 1a: It is hard to determine the difference between the series of images. And also label filling and emptying under the time.

      (7) S1E-H: It is unclear what "Time proportion of incomplete pump" means. Please define it.

      (8) Please reorganize the figures to follow the order of the text, for example, figures 2 and 4

      (9) Figure 4A. There is mislabelling in Figure 4A. It is supposed to be phalloidin not nc82.

      (10) Figure 4K: It does not match the figure legend and main text.

      (11) Figure 4D and G: Please indicate ATP application time point.

      Thanks for your correction and all the points mentioned were revised.

      Reviewer #4 (Recommendations For The Authors):

      The figures need improvement. 1A has tiny circles showing pharynx and any differences are unclear.

      The expression pattern of some of these drivers (Supplement) seems quite broad. The tmc nompC intersection image in Figure 1F is nice but the cibarium images are hard to interpret: does this one show muscle expression? What are "brain" motor neurons? Where are the labellar multi-dendritic neurons?

      Tmc nompC intersection image show no expression in muscles. Somata of motor neurons 12 or 11 situated at SEZ area of brain, while somata of md-C neurons are in the cibarium. Image of md-L neurons was posted in response for reviewer #3 (Recommendations For The Authors):

      Why do the assays alternate between swallowing food and swallowing water?

      Thank for your suggestion, figure 1A has been zoomed-in. The Tmc nompC intersection image in Figure 2F displayed the position of md-C neurons in a ventral perspective, and muscles were not labelled. We stained muscles in cibarium by phalloidin and the image is illustrated in Figure 4A, while we didn’t find overlap between md-C neurons and muscles. Image of md-L neurons were posted as Author response image 4.

      In the majority of our experiments, we employed water to test swallowing behavior, while we used methylcellulose water solution to test swallowing behavior of mechanoreceptor mutants, and sucrose solution for flies with md-C neurons expressing GCaMP since they hardly drank water when their head capsules were open.

      How starved or water-deprived were the flies?

      One day prior to the behavioral assays, flies were transferred to empty vials (without water or food) for 24 hours for water deprivation. Flies who could not survive 24h deprivation would be deprived for 12h.

      How exactly was the pumping frequency (shown in Fig 1B) measured? There is no description in the methods at all. If the pump frequency is scored by changes in blue food intensity (arbitrary units?), this seems very subjective and maybe image angle dependent. What was camera frame rate? Can it capture this pumping speed adequately? Given the wealth of more quantitative methods for measuring food intake (eg. CAFE, flyPAD), it seems that better data could be obtained.

      How was the total volume of the cibarium measured? What do the pie charts in Figure 3A represent?

      The pump frequency was computed as the number of pumps divided by the time scale, following the methodology outlined in Manzo et al., 2012. Swallowing curves were plotted using the inverse of the blue food intensity in the cibarium. In this representation, ascending lines signify filling, while descending lines indicate emptying (see Figure 2D, 3B). We maintain objectivity in our approach since, during the recording of swallowing behavior, the fly was fixed, and we exclusively used data for analysis when the Region of Interest (ROI) was in the cibarium. This ensures that the intensity values accurately reflect the filling and emptying processes. Furthermore, we conducted manual frame-by-frame checks of pump frequency, and the results align with those generated by the time series analyzer V3 of ImageJ.

      For the assessment of total volume of ingestion, we referred the methods of CAFE, utilizing a measurable glass capillary. We then calculated the ingestion rate (nL/s) by dividing the total volume of ingestion by the feeding time.

      The changes seem small, in spite of the claim of statistical significance.

      The observed stability in pump frequency within a given genotype underscores the significance of even seemingly small changes, which is statistically significant. We speculate that the stability in swallowing frequency suggests the existence of a redundant mechanism to ensure the robustness of the process. Disruption of one channel might potentially be partially compensated for by others, highlighting the vital nature of the swallowing mechanism.

      How is this change in pump frequency consistent with defects in one aspect of the cycle - either ingestion (activation) or expulsion (inhibition)?

      Please refer to Figure 2, 3. Both filling and emptying process were affects, while inhibition mainly influences emptying time (Figure 1—figure supplement 1).

      for the authors:

      Line 48: extensively

      Line 62 - undiscovered.

      Line 107, 463: multi

      Line 124: What is "dysphagia?" This is an unusual word and should be defined.

      Line 446: severe

      Line 466: in the cibarium or not?

      Thanks for your correction and all the places mentioned were revised.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript by Chen et al. entitled, "The retina uncouples glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle", the authors look to provide insight on retinal metabolism and substrate utilization by using a murine explant model with various pharmacological treatments in conjunction with metabolomics. The authors conclude that photoreceptors, a specific cell within the explant, which also includes retinal pigment epithelium (RPE) and many other types of cells, are able to uncouple glycolytic and Krebs-cycle metabolism via three different pathways: 1) the mini-Krebs-cycle, fueled by glutamine and branched-chain amino acids; 2) the alanine-generating Cahill-cycle; and 3) the lactate-releasing Cori-cycle. While intriguing if determined to be true, these cell-specific conclusions are called into question due to the ex vivo experimental setup with the inclusion of RPE, the fact that the treatments were not cell-specific nor targeted at an enzyme specific to a certain cell within the retina, and no stable isotope tracing nor mitochondrial function assays were performed. Hence, without significant cell-specific methods and future experimentation, the primary claims are not supported.

      Strengths:

      This study attempts to improve on the issues that have limited the results obtained from previous ex vivo retinal explant studies by culturing in the presence of the RPE, which is a major player in the outer retinal metabolic microenvironment. Additionally, the study utilizes multiple pharmacologic methods to define retinal metabolism and substrate utilization.

      Weaknesses:

      A major weakness of this study is the lack of in vivo supporting data. Explant cultures remove the retina from its dual blood supply. Typically, retinal explant cultures are done without RPE. However, the authors included RPE in the majority of experimental conditions herein. However, it is unclear if the metabolomics samples included the RPE or not. The inclusion of the RPE, which is metabolically active and can be altered by the treatments investigated herein, further confounds the claims made regarding the neuroretina. Considering the pharmacologic treatments utilized with the explant cultures are not cell-specific and/or have significant off-target effects, it is difficult to ascertain that the metabolic changes are secondary to the effects on photoreceptors alone, which the authors claim. Additionally, the explants are taken at a very early age when photoreceptors are known to still be maturing. No mention or data is presented on how these metabolic changes are altered in retinal explants after photoreceptors have fully matured. Likewise, significant assumptions are made based on a single metabolomics experiment with no stable isotope tracing to support the pathways suggested. While the authors use immunofluorescence to support their claims at multiple points, demonstrating the presence of certain enzymes in the photoreceptors, many of these enzymes are present throughout the retina and likely the RPE. Finally, the claims presented here are in direction contradiction to recent in vivo studies that used cell-specific methods when examining retinal metabolism. No discussion of this difference in results is attempted. Response: We agree with the reviewer that in vivo studies could be very interesting indeed. However, technologically it will be extremely difficult to (repeatedly/continuously) sample the retina of an experimental animal and to combine this with an interventional study, with a subsequent metabolomic analysis. We do not currently have access to such technology nor are we aware of any other lab in the world capable of doing such studies. Moreover, virtually all prior studies on retinal metabolism have been done on explanted retina without RPE. This includes the seminal studies by Otto Warburg in the 1920s. As opposed to this, our retinal samples for also all the metabolomic analyses included the RPE, except for the no RPE condition that was used as a comparator for the earlier investigations.

      We note that our metabolomic analysis was done for all five experimental conditions where each condition included at least five independent samples (each derived from different animals).

      The reviewer is correct to say that our organotypic explant cultures are early post-natal, with explantation performed at post-natal day 9 and culturing until day 15. Since our retinal explant system has been validated extremely well over more than three decades of pertinent research (see for instance: Caffe et al., Curr Eye Res. 8:1083-92, 1989), we are confident that photoreceptors mature in vitro in ways that are very similar to the in vivo situation. As far as studies in adult retina (i.e. three months or older) are concerned, this is indeed an important question that will be addressed in future studies. Studies employing stable isotope labelling may also be very informative and are planned for the future, also in order to properly determine fluxes. This will likely require an extension to our NMR hardware with an 15N channel probe, something that we plan on implementing in the future.

      We are aware that a number of questions relating to retinal metabolism are controversial and that the use of other methodology or experimental systems may lead to alternative interpretations. We have now included citations of other studies that use, for example, conditional and/or inducible knock-outs or in vivo blood sampling (e.g. Wang et al., IOVS 38:48-55, 1997; Yu et al., Invest Ophthalmol Vis Sci. 46:4728-33, 2005; Swarup et al., Am J Physiol Cell Physiol. 316:C121-C133, 2019; Daniele et al., FASEB Journal 36:e22428, 2022) and discuss the pros and cons of such approaches (e.g. in Lines 376-384; 454-472).

      Reviewer #2 (Public Review):

      Summary:

      The authors aim to learn about retinal cell-specific metabolic pathways, which could substantially improve the way retinal diseases are understood and treated. They culture ex vivo mouse retinas for 6 days with 2 - 4 days of various drug treatments targeting different metabolic pathways or by removing the RPE/choroid tissue from the neural retina. They then look at photoreceptor survival, stain for various metabolic enzymes, and quantify a broad panel of metabolites. While this is an important question to address, the results are not sufficient to support the conclusions.

      Strengths:

      The questions the authors are exploring at extremely valuable and I commend the authors and working to learn more about retina metabolism. The different sensitivity of the cones to various drugs is interesting and may suggest key differences between rods and cones. The authors also provide a thoughtful discussion of various metabolic pathways in the context of previous publications.

      Weaknesses:

      As the authors point out, ex vivo culture models allow for control over multiple aspects of the environment (such as drug delivery) not available in vivo. Ex vivo cultures can provide good hints as to what pathways are available between interacting tissues. However, there are many limitations to ex vivo cultures, including shifting to a very artificial culture media condition that is extremely different than the native environment of the retina. It is well appreciated that cells have flexible metabolism and will adapt to the conditions provided. Therefore, observations of metabolic responses obtained under culture conditions need to be interpreted with caution, they indicate what the tissue is doing under those specific conditions (which include cells adapting and dying).

      Chen et al use pharmacological interventions to the impact of various metabolic pathways on photoreceptor survival and "long term" metabolic changes. The dose and timing of these drug treatments are not examined though. It is also hard to know how these drugs penetrate the tissue and it needs to be validated that the intended targets are being accurately hit. These relatively long-term treatments should be causing numerous downstream changes to metabolism, cell function, and survival, which makes looking at a snapshot of metabolite levels hard to interpret. It would be more valuable to look at multiple time points after drug treatment, especially easy time points (closer to 1 hr). The authors use metabolite ratios to make conclusions about pathway activity. It would be more valuable to directly measure pathway activity by looking a metabolite production rates in the media and/or with metabolic tracers again in time scales closer to minutes and hours instead of days.

      It is not clear from the text if the ex vivo samples with RPE/choroid intact are analyzed for metabolomics with the RPE/choroid still intact or if this is removed. If it is not removed, the comparison to the retina without RPE/choroid needs to be re-interpreted for the contribution of metabolites from the added tissue. The composition of the tissue is different and cannot be disentangled from the changes to the neural retina specifically.

      While the data is interesting and may give insights into some rod and cone-specific metabolic susceptibility, more work is needed to validate these conclusions. Given the limitations of the model the authors have over-interpreted their findings and the conclusions are not supported by the results. They need to either dramatically limit the scope of their conclusions or validate these hypotheses with additional models and tools.

      Response: We thank the reviewer for the insightful comments and agree that some of our interpretations may have been phrased too determinedly. We have therefore rephrased and toned down our conclusions in many instances in the text, and changed the manuscript title to now read “Retinal metabolism: Evidence for uncoupling of glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle”.

      Nevertheless, when considering the major known metabolic pathways and their possible impact on metabolite patterns after the experimental manipulations used here, we believe our interpretations to be consistent with the data obtained. Conversely, the previously suggested retinal aerobic glycolysis cannot explain most of the data we have obtained. Even further, also a predominant use of the classical “full” Krebs-cycle/OXPHOS would not explain the metabolite patterns found (e.g. alanine, N-acetylaspartate (NAA)). While this does not in itself mean that our interpretations are all correct, they seem plausible in view of the data at hand and will hopefully stimulate further research on retinal energy metabolism using complementary technologies that were not available to us for the purpose of this study.

      We comment that our organotypic retinal explant cultures, while they do contain their very own, native RPE, do not comprise the choroidal vasculature (in our explantation procedure the RPE readily detaches from the choroid).

      As far as the drugs used on retinal explants are concerned, we note that:

      (1) all three compounds used are extremely well validated, with literally thousands of studies and decades of research to their credit (i.e., 1,9-dideoxyforskolin: >270 publications since 1984; Shikonin: >1000 publications since 1977; FCCP: >2800 publications since 1967),

      (2) all experimental conditions show clear and differential drug effects, as shown, for instance, by the principal component analysis in Figure1I and the cluster analysis in Figure2A,

      (3) the response patterns observed for key metabolites match the anticipated drug effects (e.g. decreased glucose consumption with 1,9-dideoxyforskolin; decreased lactate levels with Shikonin; lactate accumulation with FCCP).

      One can therefore be reasonably certain that these drugs did penetrate the explanted retina and that their respective drug targets were hit. Assessing dose-responses would certainly be interesting, however, the aim of this initial study was not pharmacodynamics but a general manipulation of energy metabolism. Moreover, given the extensive validation of these drugs, off-target effects seem not very likely at the concentrations used.

      We agree with the reviewer that using a longitudinal, time-series type of analysis could give additional insights. We note that each additional time-point will require retinae from 25 animals and a very resource-intensive and time-consuming metabolomic analysis, together with a significantly more complex multivariate analysis (metabolite, experimental condition, time). This is a completely new undertaking that is simply not feasible as an extension of the present study.

      To look at pathway activity in more direct ways is very good idea, to this end we aim to implement in the future an idea put forward by the reviewers, namely 13C-labeling and additionally 15N-labeling and tracing for specific metabolic fuels (e.g. glucose, lactate and anaplerotic amino acids such as glutamate and branched chain amino acids).

      The reviewer is of course correct to say that the culture condition is somewhat artificial and that this may have introduced changes in the metabolism. However, as noted above in the first response to reviewer #1, the organotypic retinal culture system, using a defined medium, free of serum and antibiotics, has been extremely well studied and validated for decades (cf. Caffé et al., Curr Eye Res. 8:1083-92, 1989). Importantly, this system allows to maintain retinal viability, histotypic organization, and function over many weeks in culture. Moreover, most previous studies on retinal metabolism have also used explanted retina – acute or cultured – i.e. experimental approaches that are similar to what we have used and that may be liable to their own artefactual changes in metabolism. This includes the seminal, 1920s studies by Otto Warburg, or the 1980s studies by Barry Winkler, the results of which the reviewers do not seem to doubt.

      We further agree that studying retinal metabolism in a situation closer to in vivo conditions would be thrilling, however to our knowledge to date there is no retina model that fully mimics the complex interplay of the blood metabolome with metabolic tissue activity. This likely means that for each metabolic condition to study (e.g. hyperglycemia, cachexia, etc.), a fairly large number of animals will need to be sacrificed for the molecular investigation of ex vivo retinal biopsies, which would mean a tremendous animal burden.

      We hope the reviewer will appreciate that the revised manuscript now includes numerous improvements, along with new, additional datasets and figures, references to further relevant literature, and – as mentioned above – a more cautious phrasing of our interpretations and conclusions, including a more careful wording for the manuscript title.

      Reviewer #3 (Public Review):

      Summary:

      The neural retina is one of the most energetically active tissues in the body and research into retinal metabolism has a rich history. Prevailing dogma in the field is that the photoreceptors of the neural retina (rods and cones) are heavily reliant on glycolysis, and as oxygen tension at the level of photoreceptors is very low, these specialized sensory neurons carry out aerobic glycolysis, akin to the Warburg effect in cancer cells. It has been found that this unique metabolism changes in many retinal diseases, and targeting retinal metabolism may be a viable treatment strategy. The neural retina is composed of 11 different cell types, and many research groups over the past century have contributed to our current understanding of cell-specific metabolism of retinal cells. More recently, it has been shown in mouse models and co-culture of the mouse neural retina with human RPE cultures that photoreceptors are reliant on the underlying retinal pigment epithelium for supplying nutrients. Chen and colleagues add to this body of work by studying an ex vivo culture of the developing mouse retina that maintained contact with the retinal pigment epithelium. They exposed such ex vivo cultures to small molecule inhibitors of specific metabolic pathways, performing targeted metabolomics on the tissue and staining the tissue with key metabolic enzymes to lay the groundwork for what metabolic pathways may be active in particular cell types of the retina. The authors conclude that rod and cone photoreceptors are reliant on different metabolic pathways to maintain their cell viability - in particular, that rods rely on oxidative phosphorylation and cones rely on glycolysis. Further, their data support multiple mechanisms whereby glycolysis may occur simultaneously with anapleurosis to provide abundant energy to photoreceptors. The data from metabolomics revealed several novel findings in retinal metabolism, including the use of glutamine to fuel the mini-Krebs cycle, the utilization of the Cahill cycle in photoreceptors, and a taurine/hypotaurine shuttle between the underlying retinal pigment epithelium and photoreceptors to transfer reducing equivalents from the RPE to photoreceptors. In addition, this study provides robust quantitative metabolomics datasets that can be compared across experiments and groups. The use of this platform will allow for rapid testing of novel hypotheses regarding the metabolic ecosystem in the neural retina.

      Strengths:

      The data on differences in the susceptibility of rods and cones to mitochondrial dysfunction versus glycolysis provides novel hypothesis-generating conjectures that can be tested in animal models. The multiple mechanisms that allow anapleurosis and glycolysis to run side-by-side add significant novelty to the field of retinal metabolism, setting the stage for further testing of these hypotheses as well.

      Weaknesses:

      Almost all of the conclusions from the paper are preliminary, based on data showing enzymes necessary for a metabolic process are present and the metabolites for that process are also present. However, to truly prove whether these processes are happening, C13 labeling or knock-out or over-expression experiments are necessary. Further, while there is good data that RPE cultures in vitro strongly recapitulate RPE phenotypes in vivo, ex vivo neural retina cultures undergo rapid death. Thus, conclusions about metabolism from explants should either be well correlated with existing literature or lead to targeted in vivo studies. This paper currently lacks both.

      Response: As mentioned above in the first answers to reviewers #1 and #2, we think of our study as a starting point that may provide novel directions for a whole series of investigations into retinal energy metabolism. Especially the use of novel technologies may in the future allow to decipher the different metabolic phenotypes of the 100+ distinct retinal cell types by in situ spatial metabolomics and lipidomics. Currently, we still have to limit the scope of our studies to only certain aspects of this topic. We thus agree that some of our interpretations need to be formulated more carefully and we have done so in the revised version of our manuscript. We also agree with the reviewer that carbon (13C) labelling and tracing studies will be very informative and will engage in such studies in the future. Besides 13C, we aim to further employ 15N labelled substrates, which is especially suitable to study the destiny of amino acids.

      As far as our organotypic retinal explant system is concerned, it is arguably one of the best validated such systems available (see responses to reviewers #1 and #2). While the reviewer is correct to say that the neuroretina without RPE degenerates relatively quickly in vitro, in our system, with the neuroretina and its native RPE cultured together, we can routinely culture the retina for four weeks or more, without major cell loss (Söderpalm et al., IOVS 35:3910-21, 1994; Belhadj et al., JoVE 165, 2020). Thus, our retinal cultures with RPE do not undergo rapid death. Within the time-frame of the present study (6 days in vitro) culturing-induced cell death is minimal and unlikely to influence our analyses. For further, more detailed answers to the reviewers’ questions please see our detailed point-to-point response below.

      We agree with the reviewer that eventually in vivo studies will be important to confirm our interpretations. As mentioned in our initial response to reviewer #1, such studies will be very challenging and new technologies may need to be developed before in vivo investigations can deliver the answers to the questions at hand (see answer to question Rev#3.17 below), especially if the cross-play between substrate availability from the blood metabolome and the retinal metabolic pathway activity shall be studied.

      Recommendations For The Authors

      Reviewer #1 (Recommendations For The Authors):

      Rev#1.1. The animals should be screened for and lack rd8.

      Response: This is a pertinent question from the reviewer. Ever since we first became aware of the presence of rd8 mutations in certain mouse lines from major vendors (e.g. Charles River, Jackson Labs) in around 2010, we have setup regular screening of all our mouse lines for this Crb1 mutation. Accordingly, the mouse lines used in this study were confirmed to be free of the rd8 / Crb1 mutation. A corresponding remark has now been inserted into the SI materials and methods section (Lines 37-38).

      Rev#1.2. GLUT1 looks significantly different from in vivo to in vitro. Recommend co-staining with RHO and cone markers (PNA or CAR) to further delineate where it is being expressed. The in vitro cultures appear to have much shorter outer segments (OS). Considering OS biosynthesis is thought to drive a good deal of metabolic adaptations, how relevant is the in vitro model system to what is truly occurring in vivo?

      Response: The GLUT1 staining shown in Figure 1 displays the in vivo situation. Since may not have been entirely clear from the previous figure legend, we have now labelled this as “in vivo retina” and distinguish it from “in vitro” samples in the legend to Figure 1 (Lines 774-778). As far as the comparison of GLUT1 staining in vivo (Figure 1A3) vs. in vitro (Figure S1C3) is concerned, in both situations a strong RPE labelling is clearly visible, with essentially no GLUT1 label within the neuroretina.

      Nevertheless, to better delineate the expression of GLUT1 in the outer retina, we have now performed an additional co-staining with rhodopsin (RHO) as rod marker and peanut agglutinin (PNA) as cone marker, as suggested by the reviewer (new supplemental Figure S1). In brief, this co-staining confirms the strong expression of GLUT1 in the RPE, while there is essentially no GLUT1 detectable in rod or cone photoreceptors.

      Retinal explants in long-term cultures do indeed have somewhat shorter outer segments compared to same age in vivo counterparts (Caffe et al., Curr Eye Res. 8:1083-1092, 1989). However, in the short-term cultures (6 DIV) and at the age studied here (P15) outer segments have only just started to grow out and are around 10 - 12 µm long, both in vitro and in vivo (cf. LaVail, JCB 58:650-661, 1978). Thus, the metabolism required for outer segment synthesis should be equivalent when in vitro and in vivo situations are compared. For considerations on outer segments in retinal explant cultures see also Rev#3.2 and Rev#3.29.

      Rev#1.3. Also, recent publications have shown that GLUT1 is expressed in the neuroretina including rods, cones, and muller glia. Was GLUT1 not appreciated in these cells in your ex vivo samples and if so, why? Likewise, these same studies previously demonstrated GLUT1 resulted in rod degeneration but not cone. The results presented here differ significantly. Why the difference in results and is it secondary to the in vitro vs. in vivo setting? Furthermore, the authors state that they thought the no RPE situation would be similar to the GLUT1 inhibitor experimental condition but instead, they were vastly different. Is this secondary to the fact that GLUT1 is expressed outside the RPE.

      Response: We are aware that there is a controversy regarding GLUT1 expression in the neuroretina, please see also our response to question Rev#3.1 below. As far as our immunostaining for GLUT1 on in vivo retina is concerned, we find an unambiguous and very marked expression of GLUT1 in RPE cells, at both basal and apical sides. Compared to the RPE, the neuroretina appears devoid of GLUT1 staining. However, at very high gamma values a faint staining in the neuroretina becomes visible, a staining which from its appearance – processes spanning the entire width of the retina – is most compatible with Müller glia cells. Under normal circumstances we would have dismissed such a faint staining as background and false positive. Given the sometimes very contradicting reports in the literature, we cannot fully exclude a weak expression of GLUT1 also in cells other than the RPE, with Müller glial cells perhaps being the most likely candidate. At any rate, GLUT1 expression in the neuroretina can only be much weaker than in the RPE, making its relevance for overall retinal metabolism unclear.

      As far as recent publications studying GLUT1 in the retina are concerned, we know of the study by Daniele et al. (FASEB Journal 36:e22428, 2022), which used a rod-specific, conditional knock-out of GLUT1 and found a relatively slow rod degeneration. We are not aware of a selective GLUT1 knock-out in cones, nor are we aware of conditional GLUT3 knock-outs in the retina. For further discussion of the Daniele et al. study please see Rev#3.13.

      The reviewer is right, initially we were thinking that, since GLUT1 was expressed only (predominantly) in RPE, the metabolic response to GLUT1 inhibition should look similar to the no RPE situation. However, this initial hypothesis did not consider a key fact: The RPE builds the blood retinal barrier and the tight-junction coupled RPE cells are a barrier to any larger molecule, including glucose. Removing the barrier by removing the RPE dramatically increases the availability of glucose to the retina, a phenomenon that is likely exacerbated by the expression of the high affinity/high capacity GLUT3 on photoreceptors (cf. Figure S1A). In other words, when the RPE is removed the outer retina is “flooded” with glucose and we believe that this is probably the main factor that explains why the metabolic response to GLUT1 inhibition (1,9-DDF group) is so different from the no RPE condition.

      We have now included an additional corresponding explanation in the discussion (Lines 422-429). Furthermore, we have added an entire new subchapter to the discussion to debate the expression of glucose transporters in the outer retina (Lines 454-472).

      Rev#1.4. Shikonin's mechanism of action via protein aggregation and lack of specificity for PKM2 vs PKM1 at 4uM is an experimental limitation that needs to be taken into account. All treatments utilized are not cell-specific.

      Response: While the reviewer is correct to say that Shikonin may have multiple cellular targets and a diverse range of possible applications as an anti-inflammatory, antimicrobial, or anticancer agent (cf. Guo et al., Pharmacol. Res. 149:104463, 2019), numerous studies support its specificity for PKM2 over PKM1, at concentrations ranging from 1 – 10 µM (Chen et al., Oncogene 30:4297-306, 2011; Zhao et al., Sci. Rep. 8:14517, 2018; Traxler et al., Cell Metab. 34:1248-1263, 2022). We settled for 4 μM as an intermediate concentration, considering its effectiveness and specificity in previous studies. We have now inserted references detailing the specificity and concentration range of Shikonin into the SI Materials and Methods section (Line 62).

      The concern that “all treatments” are not cell-specific is debatable. Certainly, any given compound may have off-target effects, yet, since the compounds we used in our study have all been studied for decades (see above, initial response to Reviewer #2), their off-target profile is well established and unlikely to play an important role here. Moreover, in our study the cell specificity does not come from the compounds used but from where their targets are expressed. As shown in Figure 1A and in Figure S1C, Shikonin´s target PKM2 is almost exclusively expressed in photoreceptor inner segments. Hence, it seems very reasonable to expect that the vast majority of the metabolomic changes observed by Shikonin treatment are related to photoreceptors. We note that this assertion would still be true even if there was a low-level expression of PKM2 in other retinal cell types and/or if Shikonin had moderate off-target effects on other enzymes since the bulk of the effect on the quantitative metabolomic dataset would still originate from PKM2 inhibition in photoreceptors.

      Rev#1.5. What was the method of cone counting in Figure 1?

      Response: Cones were counted per 100 µm of retinal circumference based on an arrestin-3 staining (cone arrestin, CAR).

      This information is now included in the SI Materials and Methods section under “Microscopy, cell counting, and statistical analysis” (Lines 99-100).

      Rev#1.6. How do you know that FCCP is not altering RPE ox phos, disrupting the outer retinal microenvironment and leading to cell death, and therefore, the effects seen are not photoreceptor-specific but rather downstream from the initial insult in RPE?

      Response: We propose that FCCP will be acting on both photoreceptors and RPE cells (and all other retinal cell types) at essentially the same time, over the experimental time-frame. Thus, OXPHOS should be inhibited in all cells simultaneously. However, FCCP will primarily affect cells that actually use OXPHOS to a large extent, while cells relying on other metabolic pathways (e.g. glycolysis) will hardly be affected.

      We believe the very strong effect of FCCP, seen exclusively in rod photoreceptors, to be a direct drug effect. While we cannot not fully exclude an indirect effect via the RPE – as proposed by the reviewer – we think this to be unlikely because:

      (1) RPE viability was not compromised by FCCP treatment.

      (2) If the reviewer´s hypothesis was correct, then also cone photoreceptors should have been affected (e.g. because now the RPE consumes all glucose, leaving nothing for cones). However, cones were essentially unaffected by the FCCP treatment, making a dependence on RPE OXPHOS unlikely. Especially so, because blocking GLUT1 and glucose import on the RPE with 1,9-DDF had only relatively minor effects on rod photoreceptor viability but strongly affected cones. This indicates that the RPE is mainly shuttling glucose through to photoreceptors, especially to cones, and this function does not seem to be impaired by FCCP treatment.

      (3) We found that enzymes required for Krebs-cycle and OXPHOS activity (i.e. citrate synthase, fumarase, ATP synthase γ) are predominantly expressed in photoreceptors but virtually absent from RPE (Figure 3D, see also answer to following question).

      (4) The density of mitochondria (i.e. the target for FCCP) is far lower in RPE than in photoreceptors, as evidenced also by the COX staining shown in Figure 1A. Hence, photoreceptors are far more likely to be hit by FCCP treatment than RPE cells.

      To accommodate the reviewer´s concern, we have now added a further comment into the discussion (Lines 440-442).

      Rev#1.7. While Figure 3D is interesting, it offers no significant insight into mechanisms as the enzyme levels are not being compared to control nor is mitochondrial fitness in these conditions being assessed, which would provide greater insight than just showing that these enzymes are present in the inner segments, which are known to be rich in mitochondria. Additionally, stating that the low ATP is secondary to decreased Krebs cycle activity and ox phos based on merely ATP levels is not supported by metabolite levels minus citrate nor ox phos enzyme levels or oxygen consumption. Also, citrate is purported to be decreased in the table in Figure 2 in the no RPE condition; however, Supplemental Figure 2 demonstrates this change is not significant then the same data is presented in Supplemental Figure 3 and it is statistically significant again. Why the difference in data and why is the same data being shown multiple times?

      Response: The immunostaining shown in Figure 3D shows the in vivo retina, or in other words the localization of enzymes in the native situation. Since this may not have been obvious in the previous manuscript version, we have added a corresponding comment to the legend of Figure 3 (Line 806). The localization of the Krebs-cycle/OXPHOS enzymes citrate synthase, fumarase, and ATP synthase mainly to photoreceptors, but not (or much less) to RPE, is another piece of evidence supporting the idea that OXPHOS is predominantly performed by photoreceptors (see also answer to previous question Rev#1.6).

      The decreased ATP levels (together with citrate, aspartate, NAA) shown in Figure 3 in the no RPE group, are an indication that photoreceptor Krebs-cycle activity may be decreased but not abolished in the absence of RPE. Importantly, GTP levels are not reduced in the no RPE group (Figure 2). Since large amounts of GTP can only by synthesized by either SUCLG-1 in the Krebs-cycle or by NDK-mediated exchange with ATP, the most plausible interpretation is that Krebs-cycle dependent ATP-synthesis was decreased in the no RPE situation, but that the (mini) Krebs-cycle or Cahill-cycle, notably the step from succinyl-CoA to succinate, was running. Since there is no RPE in this group, this strongly suggests important Krebs-cycle/OXPHOS activity in photoreceptors where the majority of the corresponding enzymes are located (see above).

      We thank the reviewer for pointing out that the information on group comparisons may not have been presented with sufficient clarity. In the figures mentioned by the reviewer the data is shown and compared in different contexts: the table in Figure 2B and the data in Figure S3 (now renumbered to Figure S5) refer to two-way comparisons of treatment condition to control, to elucidate individual treatment effects. Meanwhile Figure S2 (now supplementary Figure S3) refers to a 5-way comparison for a general overview that puts all five groups in context with each other. These differences in comparisons and normalization to the respective common standards entail the use of different statistical tools, resulting in different p-values. The statistical testing approaches and thresholds are now disclosed in the figure legends, and additionally in the SI Materials and Methods section (Lines 145-155).

      Rev#1.8. When were the ex vivo samples taken for metabolomics, and if taken when significant TUNEL staining and cell death have occurred, are the changes in metabolism due to cell death or a true indication of differential metabolism? Furthermore, it is unclear if the metabolomics samples included the RPE or not. Considering these treatments will affect most cells in the retina and the RPE, which is included in the ex vivo samples, it is difficult to ascertain that these changes are secondary to the effects on photoreceptors alone.

      Response: The samples for metabolomics included the RPE (except for the no RPE condition) and were taken at the same time as the tissues for histological preparations and TUNEL assays, i.e. they were all taken at post-natal day 15. This has now been clarified in the SI Materials and Methods section (Lines 108-110).

      We cannot entirely exclude an effect of ongoing cell death caused by the different drug treatments on the retinal metabolome. However, since in the experimental treatments cell death was still comparatively low (even in the FCCP condition, overall cell death was only around 10% of the total retina), and the metabolomic analysis considered the entire tissue, the impact of cell death per se on the total metabolome will be comparatively minor (≤ 10%, i.e. within the typical error margin of the metabolomic analysis).

      As mentioned above, the drug treatments should in principle affect all retinal cells at the same time. However, only cells that express the drug targets (i.e. 1,9-DDF targets GLUT1 in RPE cells, Shikonin targets PKM2 in photoreceptors; cf. Figure 1A) should react to the treatment. Even FCCP, in the paradigm employed, will only affect those cells that rely heavily on OXPHOS. Our data indicates that while this is almost certainly the case for rods; cones, RPE cells, and essentially all of the inner retina, are not affected by FCCP treatment, strongly suggesting that OXPHOS is of minor importance for these cell populations.

      Rev#1.9. Why were the FCCP and no RPE groups compared? If they have similar metabolite patterns as noted in Figure 2, would that suggest that FCCP's greatest effect is on the ox phos of RPE and the metabolite patterns are secondary to alterations in RPE metabolism? Also, the increase in citrate and decrease in NAD may be related to effects on RPE mitochondrial metabolism when comparing these groups, and the disruption of RPE metabolism may then result in PARP staining of photoreceptors.

      Response: The reason for the pair-wise comparison of the no RPE and FCCP groups initially was indeed the similarity in metabolite patterns. This was now rephrased accordingly in the results section “Photoreceptors use the Krebs-cycle to produce GTP” (Lines 218-219). The interpretation that the reviewer proposed here is interesting, but does not conform with the data analysis of this and other group comparisons.

      Instead, the similarity between the metabolic patterns found in the no RPE and FCCP groups further supports the idea that a lack of RPE decreases retinal OXPHOS and increases glycolysis. This interpretation is based on the following observations:

      (1) Mitochondrial density in the RPE is far lower than in photoreceptors (see COX staining in Figure 1A), thus quantitatively the metabolite pattern caused by a disruption of OXPHOS (via FCCP treatment) will be dominated by metabolites generated by photoreceptors. For the same reason the depletion of retinal NAD+, and the concomitant increase in photoreceptor PAR accumulation after FCCP treatment, is unlikely to be due to changes in RPE.

      (2) Similarly, citrate synthase (CS) was found to be almost exclusively expressed in photoreceptor inner segments, with little expression in RPE (Figure 3D). Hence, the quantitative increase of citrate levels after FCCP treatment can only originate in photoreceptors.

      (3) The comparison of the control (with RPE) against the no RPE group suggested an increase in (aerobic) glycolysis in the absence of RPE, evidenced notably by a retinal accumulation of lactate, BCAAs, and glutamate (Figure 3A). The very same metabolite pattern is seen for the FCCP treatment (Figure 1B) indicating a marked upregulation of glycolysis (Figure 6C). The latter observation suggests that photoreceptors, after disruption of OXPHOS switch to an exclusively glycolytic metabolism, which, however, rods cannot sustain (Figure 1C, D).

      (4) Glucose consumption and lactate release is increased in the no RPE group vs. control (new Supplementary Figure 4). A similar increase in glucose consumption and lactate production is seen in the FCCP group suggesting that also the no RPE situation disrupts OXPHOS in photoreceptors.

      Rev#1.10. The conclusions being reached are difficult to interpret secondary to the experimental procedures and the fact that the treatments are not cell-specific and RPE is included with the neuroretina as well. Likewise, stating FCCP is altering the Krebs cycle in the neuroretina is difficult to believe as there are no changes in the Krebs cycle when compared to the control, which also has RPE.

      Response: We agree with the reviewer, that some of the conclusions may have been somewhat speculative. Accordingly, we have toned down our conclusions in several instances in the text, notably in abstract, introduction, and discussion.

      When it comes to Krebs cycle intermediates a key limitation of our study is indeed the lack of carbon-tracing and metabolic flux analysis as noted by the reviewers, a limitation that we now highlight more strongly in the discussion of the revised manuscript (Lines 545-549). While it is highly probable that the flux of Krebs cycle intermediates is altered by FCCP, our steady-state data does not show significant changes in the metabolites citrate, fumarate, and succinate. However, our study does show a highly significant decrease in GTP levels, which as explained above, is a key indicator of Krebs cycle activity/inactivity. Moreover, while GTP levels were reduced also in the no RPE group, GTP was still significantly higher in the no RPE group compared to the FCCP treatment. Our interpretation of this finding is that there is Krebs-cycle/OXPHOS activity in the neuroretina, which is abolished by FCCP.

      Rev#1.11. Supplemental Figure 4C and D states that GAC inhibition affected only photoreceptors, but GAC is expressed throughout the retina and so the inhibition is altering glutamine-glutamate homeostasis throughout the retina. Clearly, based on histology, one can see that the architecture of the retina, especially at the highest dose, is lost likely because all cells are being affected. So it is not photoreceptor-specific and even at low doses one can see that the inner retina is edematous. Moreover, with such a high amount of TUNEL staining in the ONL, are rods more affected than cones?

      Response: In our hands the immunostaining for Glutaminase C (GAC) labelled predominantly cone inner segments, the OPL, and perhaps bipolar cells (Figure S1A). The deleterious effects mentioned by the reviewer are only seen at the highest concentration of the GAC inhibitor compound 968. This concentration (10 µM) is 100-fold higher than the dose that produces a significant loss of cones in the outer retina (0.1 µM). We therefore think that this data points to the extraordinary reliance of cones on glutamine and glutamate. As can be seen from the images (Figure S4C) illustrating the effects of 0.1 and 1 µM Compound 968 treatment, the ONL thickness is not significantly reduced by the GAC inhibitor. This strongly indicates that at these doses the rods are not affected by GAC inhibition.

      Rev#1.12. The no RPE vs 1,9 DDF data may be interpreted as preventing glucose transport in the RPE increases BCAA catabolism by the RPE, which has been shown to utilize BCAA in culture systems. To this end, when the RPE is not present, the BCAA is increased as compared to the control with RPE.

      Response: Our original interpretation of this data was that after GLUT1 inhibition and a correspondingly reduced retinal glucose uptake, the retina switched to an increasing use of anaplerotic substrates, including BCAAs. This is supported by the concomitant upregulation of the Cahill-cycle product alanine and the mini-Krebs-cycle product N-acetylaspartate (NAA). Yet, we agree with the reviewer that BCAAs could also be consumed by the RPE. We have now changed our conclusion at the end of the results chapter “Reduced retinal glucose uptake promotes anaplerotic metabolism“ to also highlight this possibility (Lines 261-262).

      Rev#1.13. It is unclear why so much effort is comparing the no RPE group to the treatment groups and not comparing the control group to the different treatment groups.

      Response: Previous studies – including the seminal studies of Otto Warburg from the early 1920s – had always used retina without RPE. This “no RPE” situation is therefore something of a reference for our entire study, which is why we dedicated more effort to its analysis. We have now inserted a corresponding remark into the manuscript (Lines 182-184).

      Rev#1.14. The conclusions are significantly overstated especially with regards to rods versus cones as these are not cell-specific treatments. For example, the control vs 1,9 DDF vs FCCP clearly shows that there is mitochondrial dysfunction due to decreased NAD, increased AMP/ATP ratio, decreased Asp but increased Gln, and a compensatory increase in lactate production.

      Response: We agree with the reviewer and have tried to phrase our statements in more measured fashion. Notably, we have toned down our statements in the title, abstract, results, discussion, and several of the subchapter headings.

      Rev#1.15. While metabolic conclusions are drawn on serine/lactate ratio, this ratio is driven by the drastic changes in lactate and not so much serine in the treatment conditions as it was rather stable. Likewise, substrates beyond glucose have the potential to fuel the TCA cycle and make GTP via SUCLG1, such as fatty acids, other AAs, etc. Therefore, this ratio may not tell the entire story about anaplerotic metabolism. Furthermore, knowing that RPE utilize BCAAs to fuel their TCA cycle, the no RPE condition may simply have increased BCAAs due to lack of metabolism by the RPE, which drives the GTP/BCAA ratio. To state that the neuroretina was utilizing BCAAs for anaplerosis is not well supported based on the current data. Similarly, what is to say that the GTP/lactate ratio in the no RPE situation is not driven by the fact that the RPE is no longer present to act as acceptor of retinal lactate production or that more glucose is reaching the retina since the RPE is not present to accept and utilize that produced. Glucose uptake was not assessed to further address these issues.

      Response: We agree with reviewer that metabolite ratios may not tell the full story underlying retinal metabolism however based on the robustness of using quantitative and highly reproducible NMR data, they are an important part of the metabolomics toolbox. The reviewer correctly observed that the changes in lactate levels are more dramatic than in serine. Still, also serine was significantly increased in the no RPE, 1,9-DDF, and Shikonin groups. Together with the lactate changes (same or opposite direction) the resulting serine/lactate ratios display marked alterations.

      When it comes to the supply of other potential energy substrates mentioned by the reviewer, i.e. fatty acids or amino acids other than BCAAs, these are only supplied in minimal amounts in the defined, serum free R16 medium (Romijn, Biology of the Cell, 63, 263-268, 1988) and – if used to any important extent – would be rapidly depleted by the retina. Thus, for a culture period of 2 days in vitro between medium changes these energy sources are not available and thus cannot be used by the retina.

      Our conclusion that the retina is using anaplerosis is based not only on the observations made in the no RPE group but also on, for instance, the metabolite ratios seen in the 1,9-DDF treatment group. In this group decreased glycolytic activity may correspond to increased serine synthesis and anaplerosis.

      As far as glucose uptake is concerned, we have analysed the medium samples at P15 (equivalent to the retina tissue collection time point) and now present data that addresses this question more directly via the consumption of glucose from and release of lactate to the culture medium (New Supplementary Figure 4C, D). This new dataset provides another independent observation showing that:

      (1) Glucose consumption/lactate release (i.e. aerobic glycolysis) is high in the no RPE situation but low in the control situation. In other words, retinal aerobic glycolysis is most likely stimulated by the absence of RPE.

      (2) 1,9-DDF treatment decreases glucose consumption/lactate release as would be expected from a GLUT1 blocker. Since ATP and GTP production are high nonetheless, this indicates that other substrates (i.e. anaplerosis) were used for retinal energy production, in agreement with the analysis shown in Figure 6C.

      (3) The FCCP treatment, which disrupts oxidative ATP-production, increases glucose consumption/lactate release in way similar to the no RPE situation. Yet, the no RPE retina can still generate sizeable amounts of GTP but not ATP. Together, this provides further evidence that neuroretinal OXPHOS is decreased in the absence of RPE.

      Rev#1.16. The evidence for the mini-Krebs cycle is intriguing but weak considering it is based on certain enzymes being expressed in the photoreceptors, which had already been shown to be present in other publications, and a single ratio of metabolites that is increased in FCCP. One would expect this ratio to be increased under FCCP regardless. There is no stable isotope tracing with certain fuels to confirm the existence of the mini-Krebs cycle.

      Response: We thank the reviewer for this suggestion. We agree that our evidence for the mini-Krebs-cycle (and the Cahill-cycle) may be to some extent circumstantial and additional technologies would help to obtain further supportive data. Still, here we would like to invite the reviewer to a thought experiment where he/she could try and interpret our data without considering the Cahill- or the mini-Krebs-cycle. At least we ourselves, when we engaged into such thought experiments, were unable to explain the data observed without these alternative energy-producing cycles. Most notably, we were unable to explain the strong accumulation of either alanine or N-acetyl-aspartate (NAA) when only considering glycolysis and (full) Krebs-cycle metabolism. Of course, this may still be considered “weak” evidence, and we expect that future studies including complementary technologies will either confirm or expand our interpretation of the existing data set.

      The suggestion to perform stable isotope-labelled tracing with potential alternative fuels (e.g. glutamate, glutamine, pyruvate, etc.) is very attractive indeed. While such studies are likely to shed further light on the metabolic pathways proposed, this will entail very extensive experimental work, with multiple different conditions and concentrations and variety of analysis methods that is currently not feasible (e.g. a 1.7 mm NMR probe equipped with a 15N channel) as an extension of the present manuscript. Nevertheless, we will certainly consider this approach for future follow-up studies once such techniques are available and will screen for suited collaboration partners. A corresponding comment on such future possibilities has now been inserted into the discussion (Lines 545-549).

      Rev#1.17. The discussion does not mention how this data contradicts a recent in vivo study looking at Glut1 knockout in the retina (Daniele et al. FASEB. 2022) or previous in vivo studies that suggest cones may be less sensitive to changes in glucose levels (Swarup et al. 2019). This is a key oversight.

      Response: We thank the reviewer for pointing this out. We now included these studies in the revised discussion in a new subchapter on the expression of glucose transporters in the outer retina (Lines 454-472). For a critical review of the Daniele et al., 2022 study please also see our more detailed response to question Rev#3.13 below.

      Rev#1.18. GAC is expressed in more than just cones so making cell-specific statements regarding fuel utilization is not well supported.

      Response: Our immunostaining for GAC revealed a strong expression in cone inner segments (Figure S1A3). While this does not exclude (relatively minor) expression in other retinal cell types, cones are likely to be more reliant on GAC activity than other cell types. See also answer above.

      Rev#1.19. Suggesting that rods utilize the mini-Krebs cycle based on AAT2 being seen in the inner segments without at least co-staining for RHO or PNA is weak evidence for such a cycle. AAT looks to be expressed in the inner segments of all photoreceptors.

      Response: We have taken up this suggestion from the reviewer and now provide an additional co-staining for AAT1 and AAT2 with rhodopsin. Note that in response to a pertinent comment from Reviewer #3 we have changed the abbreviation for aspartate aminotransferase from “AAT” to the more commonly used “AST” throughout the manuscript.

      New images showing a co-staining for AST1 and AST2 with rhodopsin now replace the former image set in Figure 7D. In brief, the new images show the expression of both AST1 and AST2 across the retina, with, notably an expression in the inner segments of photoreceptors but not in the outer segments, where rhodopsin is expressed.

      Reviewer #3 (Recommendations For The Authors):

      Rev#3.1. The staining for the glucose transporters GLUT1 and GLUT3 does not reflect what has previously been published by two different groups that were validated by cell-specific knockout mice. As mentioned by the author GLUT1 and GLUT3 have differences in transport kinetics, which would affect their metabolism. Therefore, the lack of GLUT1 in photoreceptors would suggest that photoreceptor metabolism is not faithfully replicated in this system. This difference from the previous literature should be discussed in the discussion.

      Response: As the reviewer pointed out, the expression of GLUT1 in the retina is somewhat controversial, with much older literature showing expression on the RPE, while some more recent studies claim GLUT1 expression in photoreceptors. For a brief discussion of our GLUT1 immunostaining please see also our answer to question Rev#1.3 above.

      Although the retinal expression of GLUT1 was besides the focus of our study, we feel we must address this point in more detail: In the brain the generally accepted setup for GLUT1 and GLUT3 expression is that low-affinity GLUT1 (Km = 6.9 mM) is expressed on glial cells, which contact blood vessels, while high-affinity GLUT3 (Km = 1.8 mM) is expressed on neurons (Burant & Bell, Biochemistry 31:10414-20, 1992; Koepsell, Pflügers Archiv 472, 1299–1343, 2020). This setup matches decreasing glucose concentration with increasing transporter affinity, for an efficient transport of glucose from blood vessels, to glial cells, to neurons. In the retina, the cells that contact the choroidal blood vessels are the tight-junction-coupled RPE cells. As shown by us and many others, RPE cells strongly express GLUT1 (cf. Figure 1A-3.). To warrant an efficient glucose transport from the RPE to photoreceptors, photoreceptors must express a glucose transporter with higher glucose affinity than GLUT1. We show that this is indeed the case with photoreceptors expressing GLUT3 (cf. Supplemental Figure 1-5.). While a part teleological explanation does not per se prove that our data is correct, at least our data is plausible. In contrast, the glucose transporter setup sometimes claimed in the literature is biochemically implausible, i.e. for the flow of metabolites (glucose) to go against a gradient of transporter affinities, and we are not aware of an example of such a setup occurring anywhere in nature.

      However, at this point we cannot exclude low levels of GLUT1 expression on Müller glia cells or even photoreceptors. This expression could, for instance, be relevant in cases where cells were shuttling excess glucose – perhaps produced through gluconeogenesis – onwards to other retinal cells. Still, GLUT1 expression can only be minor when compared to RPE since a major expression would destroy the glucose affinity gradient (see above) required for efficient glucose shuttling into the energy hungry photoreceptors.

      To address this request by the reviewer (and also reviewer #1) we now discuss the question of glucose transporter expression in the outer retina in a new subchapter of the discussion (Lines 454-472).

      Rev#3.2. Photoreceptor metabolism and aerobic glycolysis are tied to photoreceptor function, as demonstrated by Dr. Barry Winkler. The authors should provide data or mention (if previously published) about photoreceptor OS growth and function in this system.

      Response: The studies of Barry Winkler (e.g. Winkler, J Gen Physiol. 77, 667-692, 1981) confirmed the original work of Otto Warburg and expanded on the idea that the neuroretina was using aerobic glycolysis. Importantly, Winkler used a very similar experimental setup as Warburg has used, namely explanted rat retina without RPE. In light of our data where we compare metabolism of mouse retina with and without RPE – where retina cultured without RPE confirms the data of Warburg and Winkler – it appears most likely that the purported aerobic glycolysis occurs mostly in the absence of RPE but only to a lower extent in the native retina.

      Photoreceptor outer segment outgrowth is somewhat slower in the organotypic retinal explant cultures compared to the in vivo situation (cf. Caffe et al., Curr Eye Res. 8:1083-1092, 1989 with LaVail, JCB 58:650-661, 1978; see also answer to reviewer #1). Importantly, organotypic retinal explant cultures and their photoreceptors are fully functional and remain so for extended periods in culture (Haq et al., Bioengineering 10:725, 2023; Tolone et al., IJMS 24:15277, 2023). This information has now been added to the manuscript discussion section, into the new subchapter “The retina as an experimental system for studies into neuronal energy metabolism” (Lines 367-395).

      Rev#3.3. It is unclear from the description of the experiment in both the results and methods if 1,9DDF, Shikonin, and FCCP were added to both apical and basal media compartments or one or the other and should be specified. The details of what was on the apical compartment would be helpful, as the model is supposed to allow for only nutrients from the basal compartment (as indicated by the authors themselves). Is the apical compartment just exposed to air? How does this affect survival?

      Response: In organotypic retinal explant cultures the RPE rests on the permeable culturing membrane such that the basal side is contact with the membrane and the medium below (far schematic drawing see Figure S1B), while the apical side is covered by a thin film of medium created by the surface tension of water (Caffe et al., Curr Eye Res. 1989; Belhadj et al., JoVE, 2020). This thin liquid film ensures sufficient oxygenation and is an important factor that allows the retinal explant to remain viable for several weeks in culture. If the retinal cultures were submerged by the medium, their viability – especially that of the photoreceptors – would drop dramatically and would typically be below 3-5 days. Therefore, in the retinal organotypic explant cultures used here, the nutrients and the drugs applied do indeed reach the outer retina from the basal side, i.e. similar as they would in vivo.

      To address this question from the reviewer, corresponding clarifications have been inserted into the SI Materials and Methods section (Lines 64-66).

      Rev#3.4. As the metabolomic data obtained was quantitative, several metabolites discussed should be analyzed in terms of ratios, for example, Glutathione and glutathione disulfide should be reported as a ratio. In addition as ATP, ADP, and AMP were measured, they can used to calculate the energy charge of the tissue.

      Response: We thank the reviewer for these suggestions and have created corresponding graphs for GSH / GSSG ratio and energy charge. These new graphs have now been added to the SI datasets, to the new Supplementary Figure 4. To accommodate other requests from the Reviewers, this new Figure also contains additional new datasets on glucose and lactate concentrations (see further comments above and below). Please note that all later SI Figures have been renumbered accordingly.

      In brief, the ratios for GSH/GSSG show no significant changes between control and the different experimental groups. Meanwhile, the adenylate energy charge of the retinal tissues show a significant decrease in the energy charge for the Shikonin group and the FCCP group. Note that in the new Supplementary Figure 4A, the dotted lines indicate the energy charge window typical for most healthy cells (0.7 – 0.95).

      Rev#3.5. I think a missed opportunity when discussing the possible taurine/hypotaurine shuttle would be the impact on the osmosis of the subretinal space as taurine has been hypothesized as a major osmolyte.

      Response: This is another interesting recommendation from the reviewer. To address this point, we have now introduced a corresponding paragraph and references in the discussion of the manuscript (Lines 503-504; 512-514).

      Rev#3.6. In Figure 3, the distribution of these enzymes should also be studied under the no RPE condition as the culture treatment took several days for these metabolic changes to occur.

      Response: The images shown in Figure 3D are from the in vivo retina. Since this may not have been very clear in the previous manuscript version, we have now added a corresponding explanation to the legend of Figure 3. As far as we can tell, the expression and localization of neuroretinal enzymes does not change in cultured retina, during the culture period (compare Figure 1A with Supplementary Figure S1C). However, when it comes to the metabolite taurine its production (localization) changes dramatically in the no RPE situation where taurine is essentially undetectable by immunostaining (not shown but see metabolite data in Figure 2A, Figure 3A).

      Rev#3.7. In Figures 4 and 5, it is unclear why the experimental groups were not compared to the control and requires further explanation. Furthermore, the authors should justify the concentrations of drugs used as the cell death could have risen from toxicity to the drugs and not due to disruption of metabolism.

      Response: The reviewer is right, the rationale for these comparisons may not have been laid out with sufficient clarity. In Figure 4 the no RPE and FCCP groups are compared because both groups showed similar metabolite changes towards the control situation. The no RPE to FCCP comparison thus focussed on the details of the – at first seemingly minor – differences between these two groups. This has now been clarified in the corresponding part of the results (Lines 218-219).

      In Figure 5A, B we compare the no RPE and 1,9-DDF groups with each other, notably because the data obtained seemingly contradicted our initial expectation that these two groups should show similar metabolite patterns. Also here, we have now inserted an additional explanation for this choice of comparisons (Lines 252-253).

      In Figure 5C, D we compare the Shikonin and FCCP groups with each other. The idea behind this comparison was that in the 1st group glycolysis was blocked while in the 2nd group OXPHOS was inhibited, or in other words here were compared what happened when the two opposing ends of energy metabolism were manipulated in opposite directions. This reasoning is now given in the results section (Lines 265-268).

      As far as the choice of drugs and concentrations is concerned, we used only compounds that have been extremely well validated through up to five decades of scientific research (see initial response to Reviewer #2 above). We therefore are confident that at the concentrations employed the results obtained stem from drug effects on metabolism and not from generic, off-target toxicity. Then again, as we show, prolonged (i.e. 4 days) block of energy metabolism pathways does cause cell death.

      Rev#3.8. In line 203, the authors discuss GTP as being primarily a mitochondrial metabolite, however, photoreceptors would require a localized source of GTP synthesis in the outer segments as part of phototransduction, and therefore GTP in photoreceptors cannot be a mitochondrial-specific reaction in photoreceptors. Furthermore, the authors mentioned NDK as being a possible source of GTP, but they do not show NDK localization despite it being reported in the literature to be localized in the OS.

      Response: The question as to the source of GTP in photoreceptor outer segments is indeed highly relevant. For GTP production in mitochondria see the answer to the next question below (Rev#3.9). An early study showed nucleoside-diphosphate kinases (NDK) to be expressed on the rod outer segments of bovine retina (Abdulaev et al., Biochemistry 37:13958-13967, 1998). More recently NDK-A was shown to be strongly expressed in photoreceptor inner segments (Rueda et al., Molecular Vision 22:847-885, 2016). We now refer to both studies in the results section of the manuscript (Line 227-228).

      Rev#3.9. In the "Impact on glycolytic activity, serine synthesis pathway, and anaplerotic metabolism" section, the authors claim in the no RPE group glycolytic activity was higher due to a depressed GTP-to-lactate ratio. However, this reviewer is under the impression that GTP production in photoreceptors is not mitochondrial specific, so this ratio doesn't make sense (I could be mistaken, however). A better ratio would have been pyruvate/lactate or glucose/lactate when discussing increased glucose consumption.

      Response: We appreciate the reviewers’ comment, yet we do indeed believe we can show that GTP-production in our experimental context is mainly mitochondrial. As explained in the manuscript results section (“Photoreceptors use the Krebs-cycle to produce GTP”), there are essentially only two possibilities for a photoreceptor to produce sizeable amounts of GTP. In the mitochondria via SUCLG1 – i.e. an enzyme highly expressed in photoreceptor inner segments (Figure 5D) – and the cytoplasm via NDK from excess ATP. The claim about the depressed GTP-to-lactate ratio in the no RPE situation takes this into account. Importantly, since in the no RPE situation ATP-levels are significantly lower than GTP, here GTP can only be produced via SUCLG1 and OXPHOS. Moreover, this contrasts with the FCCP group where mitochondrial OXPHOS is disrupted and both ATP and GTP are depleted.

      As far as ratios with pyruvate and glucose are concerned, we agree that these could potentially be very interesting to analyse. Unfortunately, in our retinal tissue 1H-NMR spectroscopy- based metabolomics analysis the levels of both pyruvate and glucose were below the detection limits which likely reflects their rapid metabolic turnover (cf. table S1). While this might be attributable to the marked consumption of these metabolites within the tissue, it does not allow for us to calculate the suggested ratios to lactate. Then again, in the supernatant medium which was collected at the same time point as the retina tissue, we can readily detect glucose and lactate levels, for this data please see the new Supplementary Figure 4.

      Rev#3.10. Aspartate aminotransferase should be abbreviated as AST, as it is more commonly noted.

      Response: In response to this comment from the reviewer, we have changed the abbreviation for aspartate aminotransferase from AAT to AST throughout the manuscript.

      Rev#3.11. In the discussion the assumptions of the ex vivo culture systems should be clearly stated. One that was not mentioned, but affects the implications of the data, is that the retinas used in this study are from the developing mouse eye. Another important assumption that was made in this paper was that the changes in retinal metabolism were due to photoreceptors even though the whole neural retina was included.

      Response: The reviewer is correct; we have added these two points to the discussion section of the manuscript. Notably, we now included a new subchapter “The retina as an experimental system for studies into neuronal energy metabolism” (Lines 367-395) to present different in vitro and in vivo test systems.

      Rev#3.12. Starting at line 347: As the authors know, the RPE has been shown to be highly reliant on mitochondrial function, and disruption of RPE mitochondrial metabolism leads to photoreceptor degeneration (numerous papers have shown this). Furthermore, the lower levels of lactate detected in their explants when RPE was present suggests that lactate is actively transported out of the neural retina by the RPE.

      Response: The reviewer is right about lactate being exported from the retina to the blood stream in vivo, or, in our in vitro study, to the culture medium. In the new dataset showing glucose and lactate concentrations in the culture medium (new Supplementary Figure 4C, D), we show that without RPE (no RPE group) and the retina releases more significantly lactate into the medium than control retina with RPE. At the same time the no RPE retina consumes more glucose than control retina.

      Rev#3.13. Line 360: Again, in mouse photoreceptors (by bulk RNAseq and scRNAseq), there is no GLUT3 expression (encoded by slc2a3). It was also recently shown by Dr. Nancy Philp's lab that rod photoreceptors express GLUT1, encoded by slc2a1 (PMCID: PMC9438481). The differences reported in this study and previous studies should be discussed.

      Response: Although this comment may not make us very popular, we are somewhat sceptical of RNAseq data (especially single cell RNAseq) since the underlying methodology – at the current level of technological development – is notoriously unreliable when it comes to the assessment of low abundance transcripts and suffers from apoor batch reproducibility, compared to NMR based metabolomics. Due to methodological constraints RNAseq have a propensity to display erroneously high or low expression. Moreover, and perhaps even more important, dissociated cells in scRNAseq studies undergo rapid gene expression changes that can significantly falsify the image obtained (Rajala et al., PNAS Nexus 2:1-12, 2023). Finally, it cannot be emphasized enough that mRNA expression profiles DO NOT equate protein expression and there are numerous examples for divergent expression profiles when mRNA and protein is compared.

      The Daniele et al. study (FASEB Journal 36:e22428, 2022; PMCID: PMC9438481) used in situ hybridization to study the mRNA expression of GLUT1 (slc2a1) and GLUT3 (slc2a3). In line with our comment just above, the Daniele et al. study may provide for an example of divergence between mRNA and protein expression, since it seemingly showed only minor expression of GLUT1/slc2a1 in the RPE, i.e. precisely in the one cell type that is well-known for its very strong GLUT1 protein expression.

      Furthermore, Daniele et al. used a conditional GLUT1 knock-out in photoreceptors induced by repeated Tamoxifen injections. The photoreceptor GLUT1 knock-out led to a relatively mild phenotype with only about 45% of the outer nuclear layer lost over a 4-months time-course. This is in stark contrast with the FCCP or the 1,9-DDF treatment, which would ablate nearly all rod photoreceptors in under one or two weeks, respectively.

      As a side note, Tamoxifen is an oestrogen receptor antagonist (with partial agonistic behaviour) with a long history of causing retinal and photoreceptor damage. Notably, oestrogen receptor signalling is important for maintaining photoreceptor viability (Nixon & Simpkins, IOVS 53:4739-47, 2012; Xiong et al., Neuroscience 452:280-294, 2021). Therefore, the relatively minor effects of the conditional GLUT1 KO in photoreceptors found in Daniele et al. may have been confounded by direct tamoxifen photoreceptor toxicity. On a wider level, this possible confounding factor related to the use of Tamoxifen points to general problems associated with certain forms of genetic manipulations.

      We now mention the controversy around the expression of glucose transporters in the retina, including the Daniele et al. study in a new subchapter of the discussion on "Expression of glucose transporters in the outer retina” (Lines 454-472).

      Rev#3.14. Lines 370-372: FCCP caused a strong cell death phenotype in rods, however under stress rods upregulate the secretion of RdCVF, which leads to cone photoreceptor survival by the upregulation of aerobic glycolysis in cones. The data should be re-interpreted in the context of this previous literature.

      Response: We thank the reviewer for this comment; however, we could not find a reference that would state that “…under stress rods upregulate the secretion of RdCVF”. What we did find was a reference stating that similar factors such as thioredoxins (TRX80) are secreted from blood monocytes under stress (Sahaf & Rosén, Antioxid Redox Signal 2:717-26, 2000). However, we consider these cells to be too dissimilar to rod photoreceptors to warrant a corresponding comment. Moreover, the research group who discovered RdCVF originally showed that rod-secreted RdCVF cannot prevent cone degeneration if the corresponding Nxnl1 gene is knocked-out in cones, arguing for a cell-autonomous mechanism of RdCVF -dependent cone protection (Mei et al., Antioxid Redox Signal. 24:909-23, 2016).

      Since it is very possible that we may have missed the correct reference(s), we would welcome further guidance by the reviewer.

      Rev#3.15. Line 374: 1,9-DDF caused a 90% loss of cones, however, previous studies by Dr. Nancy Philp have shown glucose deprivation in the outer retina affects primarily rod photoreceptors. The differences should be discussed.

      Response: We thank the reviewer for directing us to these studies. As mentioned above (Rev#3.13.) the Daniele et al. 2022 study yielded only relatively mild effects for a rod-specific conditional GLUT1 KO on photoreceptor viability. Similarly, in an earlier study (Swarup et al., Am J Physiol Cell Physiol. 316: C121–C133, 2019) the Philp group found that also a GLUT1 KO in the RPE caused only a minor phenotype in the photoreceptor layer. We would argue that if glucose, and by extension aerobic glycolysis, were indeed of major importance for (rod) photoreceptor survival, the degenerative effect of these genetic GLUT1 ablations should have been devastating and should have destroyed most of the outer retina in a matter of days. The fact that this was not seen in both studies is another piece of independent evidence that rod photoreceptors do not rely to any major extent on glycolytic metabolism.

      The two studies from the Philp lab (Swarup et al., 2019; Daniele et al., 2022) are now cited in the discussion (Lines 417-419 and 458-460).

      Rev#3.16. Line 375: Yes Dr. Claudio Punzo and Dr. Leveillard Thierry along with other groups have shown glycolysis is required to maintain cone survival when under stress, however, the authors should emphasize that it is under stress that this is observed.

      Response: In response to this comment we have now specifically extended our corresponding remark in the discussion of the manuscript (Lines 446-447).

      Rev#3.17. The section "Cone photoreceptors use the Cahill-cycle". The presence of ALT in photoreceptors was surprising and suggests alternatives to the Cori reaction. However, previous measurements of glucose and lactate from localized in vivo cannulation of animal eyes suggest the majority of glucose taken up by the retina is released back to the blood as lactate. Again, this section should discuss this idea in terms of the previous literature.

      Response: Here, we believe the reviewer is referring to studies performed in the late 1990s where, in anaesthetized cats, the lactate concentration in blood samples obtained from choroidal vein cannulation was compared against that in blood samples obtained from femoral arteries (Wang et al., IOVS 38:48-55, 1997). We note that a more relevant in vivo measurement of retinal glucose consumption and lactate production would likely require the simultaneous cannulation of the central retinal artery (CRA) and the central retinal vein (CRV). This would need to be combined with repeated (online) blood sampling, drug applications, and subsequent metabolomic analysis. We are not aware of any in vivo studies where such procedures have been successfully performed and further miniaturization and increased sensitivity of metabolomic analytic equipment will likely be required before such an undertaking may become feasible. Even so, such studies may not be feasible in small rodents (mice, rats) and may instead require larger animal species (e.g. dog, monkey) to overcome limitations in eye and blood sample size.

      We have now extended the discussion of our manuscript with a new subchapter on “The retina as an experimental system for studies into neuronal energy metabolism”. Within this new subchapter we now present two different in vivo experimental approaches that addressed retinal energy metabolism (Lines 376-384). Moreover, we now present new data on retinal lactate release to the culture medium, showing, for instance, a strong increase in lactate release in the no RPE condition compared to control (new Supplementary Figure 4).

      Rev#3.18. Lines 431-433: The study cited suggested that the mitochondrial AST was detected in other cells, in agreement with the data shown. However, the authors' statements in this section are misleading as they do not take into consideration the contribution of AST from other cell types.

      Response: The reviewer is right, we found both AST1 and AST2 to be expressed not only in photoreceptor inner segments but also in the inner retina, especially in the inner plexiform layer (new Figure 6D). Since this might indicate mini-Krebs-cycle activity also in retinal synapses, we have added a corresponding comment to the discussion (Lines 540-543).

      Grammatical and wording fixes:

      Rev#3.19. Line 98 - "the recycling of the photopigment, retinal."

      Response: We have inserted a comma after “photopigment”.

      Rev#3.20. Results section and Figure 1 start without providing context for the model system where staining is being done.

      Response: We have added this information to the beginning of the results section (Lines 105-106).

      Rev#3.21. Supplementary Figure 2 is not mentioned in the main text - there is no context for this figure.

      Response: Supplementary Figure 2 was originally referenced in the legend to Figure 2. We now mention supplementary figure 2 (now renumbered to supplementary figure S3) also in the main text, in the results section under “Experimental retinal interventions produce characteristic metabolomic patterns” (Line 148).

      Rev#3.22. Volcano plot in Supplementary Figures 3, 5, 6, 7, and 8 don't indicate what Log2(FC) is in reference to.

      Response: The log2 fold change (FC) is calculated as follows: log2 (fold change) = log2 (mean metabolite concentration in condition A) - log2 (mean metabolite concentration in condition B) where condition A and condition B are two different experimental groups being compared. This is now explained in the SI Materials and Methods (Lines 145-147) and indicated in abbreviated form in the figure legends. Please note that supplemental figures have now been renumbered due to the insertion of an additional, new Figure.

      Rev#3.23. Line 331 - –a“d allowed to analyze the..." ”s incorrect phrasing.

      Response: This phrasing was changed.

      Rev#3.24. Line 343 "c“cled" ”

      Response: This phrasing was changed.

      Rev#3.25. Line 446 is misworded.

      Response: This phrasing was changed.

      Technical questions:

      Rev#3.26. At what point after explant was the IHC done in Supplemental Figure 1? If early, but experiments are done later, there's’a chance things are more disorganized at the end of the experiment.

      Response: Staining and metabolomics analysis were both done at the end of each experiment, at the same time, at P15. This is now mentioned in the SI materials and methods section (Lines 67, 108-110).

      Rev#3.27. FCCP affects plasma membrane permeability, which is particularly critical in neurons that undergo repolarization and depolarization - –ow do we know FCCP on cell death via metabolism? See: https://www.sciencedirect.com/science/article/pii/S2212877813001233

      Response: The reviewer is correct, a significant permeabilization of cell membranes in general would likely cause extensive neuronal cell death, unrelated to a disruption of OXPHOS. However, the FCCP concentration used here (5 µM) is at the lower end of what was used in the mentioned Kenwood et al. study (Mol Metab. 3:114-123, 2014) and the effect on cell membrane permeability in tissue culture is likely to be rather small, as opposed to what was seen by Kenwood et al. in cultures of individual cells. This view is supported by the fact that in our FCCP treatments, we did not observe any significant increases of cell death in any retinal cell type (including RPE) other than in rod photoreceptors. Together with the fact that only photoreceptors strongly express Krebs-cycle/OXPHOS related enzymes, this strongly suggests that the FCCP effects seen by us were due to disruption of OXPHOS.

      Rev#3.28. Numerous metabolite comparisons are being made throughout the manuscript – what type of multiple hypothesis testing corrections are utilized? Only certain figures mention multiple hypothesis testing (e.g. Figure 6).

      Response: In general, in this manuscript we used two different statistical methods: 1) For two-group comparisons, we used an unpaired, two-tailed t-test, which reports a p-value with 95% confidence interval without additional multiple hypothesis testing (e.g. in Figure 2, Suppl. Figures 4, 6, 7, 8). 2) For multiple group comparisons we used a one-way ANOVA analysis with Tukey’s multiple comparisons post-hoc test (except suppl. Figure 9 where Fisher´s LSD post-hoc test was used). The information on which statistical test was used for what dataset is now given in the figure legends and in the SI Material and Methods section.

      Rev#3.29. For Figure 3, how do we know that the removal of RPE is causing the metabolite changes due to RPE-PR coupling? How do you rule out the fact that it isn’t just: I – a thicker physical barrier between media and the neural retina that is causing the changes, or II – removal of RPE from PR causes OS shearing and a stress response that alters metabolism?

      Response: We believe these concerns can be ruled out: The RPE cells are linked by tight junctions and are not “just a thicker barrier” but a barrier that is almost impermeable for most metabolites unless they are carried by specific transporters. Outer segment shearing via RPE removal would indeed be a concern if we had used adult retina. However, we explanted that retina at P9 when it does not possess any sizeable outer segments yet. As a matter of fact, photoreceptors grow out outer segments only after P9.

      Rev#3.30. While 1,9-dideoxyforskolin blocks GLUT1, it is known to have other effects, including on potassium channels. How do we know the effects of 1,9-dideoxyforskolin are specific to GLUT1? Utilizing a GLUT1 KO and showing no additional effects when adding 1,9-dideoxyforskolin would be helpful as a control.

      Response: This is a good suggestion from the reviewer. We note that this is technically not easy to achieve as it would require an RPE-specific knock-out that should be inducible at a given experimental time-point, in a quantitative manner. The study by Swarup et al. (see above Rev#3.13.) used an RPE specific knock-out that was, however, not inducible. Moreover, if the corresponding inducible knock-out animals could be generated, then the stochastic nature of the inducing treatment would probably affect only a limited number of cells within a given cell population. In our experimental context, a less than quantitative knock-out would significantly complicate interpretation of results, even to the point that no additional insight might be gained.

      Rev#3.31. The analysis in Figure 6, even with attempts to control drug treatments, is highly speculative. One really needs animals with predominately cones vs. predominately rods to do this analysis (e.g. with NRL mice).

      Response: The reviewer is right, the analysis shown in Figure 6 was an explorative approach to try and deduce features of rod and cone metabolism. This is now mentioned in the results section (Lines 282-284). Since the experiments were not initially intended to address such questions, by necessity the interpretations remain speculative. The comparison of mouse mutants in which there are either no cones (e.g. cpfl1 mouse) or no rods (e.g. NRL knock-out mouse) may allow to disentangle the metabolic contributions of rods and cones. We appreciate the suggestion from the reviewer and have now inserted a relating suggestion for future studies into the discussion section (Lines 450-452).

      Rev#3.32. Overall, much of the paper suggests intriguing pathways, but without C13 tracing or relevant genetic knock-outs, the pathways would have to be speculative rather than definitive.

      Response: We agree with the reviewer that further research, including 13C and 15N-tracing studies, will be necessary to evaluate which pathway(s) are used by what retinal cell type under what condition. Still, the high robustness and quantitative nature of the NMR metabolomics data allows us to draw pathway conclusions based on metabolites that are unique to specific pathways/cell types or using ratios. We now relate to the advantages of such carbon-tracing studies in the discussion of the manuscript (Lines 545-549).

      Stylistic suggestions:

      Rev#3.33. This is a very dense paper to read. It would be helpful for each figure to have a summary diagram of the relevant metabolite changes and how they fit together. Further, for those not metabolism-inclined, defining the mini-Kreb’s, Cahill, and Cori cycles and their brief implications at some point early in the manuscript would be helpful.

      Response: We have been thinking a lot about how we could add in the suggested summary diagrams into each figure. Unfortunately, whatever idea we contemplated would have significantly increased the complexity of the figures, while the actual benefit in terms of improved understandability was unclear.

      However, we did include the suggestion from the reviewer to present the terms Cori, Cahill-, and mini-Krebs-cycle already in the introduction and we hope that this has improved the understandability of the manuscript overall (Lines 79-92).

      Rev#3.34. More discussion about the step-by-step ways that the mini-Kreb’s reaction “uncouples” glycolysis from the Kreb’s cycle would be helpful. What do you mean by “uncouple” in this context?

      Response: We thank the reviewer for this suggestion. Uncoupling in this context means that glycolysis and Krebs cycle are not metabolically coupled to each other via pyruvate. Instead both pathways can run independently from each other and in parallel, as long as the Krebs-cycle uses glutamate, BCAAs or other amino acids as fuels. We now also address this point already in the introduction of the manuscript (Lines 87-90).

      Conceptual questions:

      Rev#3.35. As the proposal that PR undergo heavy amounts of OXPHOS is controversial, it would be helpful for the authors to review the literature on lactate production by the retina and what studies have shown previously about retina use of lactate, specifically lactate making its way into TCA cycle intermediates, suggesting OXPHOS, in PRs.

      Response: In response to this question we have added several new references to the introduction and discussion of the manuscript. The question of lactate production (aerobic glycolysis) vs. the use of OXPHOS is now discussed in Lines 77-81, Lines 367-384.

      Rev#3.36. Why would cones die more in the no RPE condition? The authors suggest this has something to do with GLUT1 expression on RPE and the transport of glucose to cones. Even if we accept that cones are highly glycolytic, loss of RPE should expose the neural retina to even more glucose in your experimental set-up.

      Response: This is a very interesting question from the reviewer. Indeed, loss of the RPE and blood-retinal barrier function should increase photoreceptor access to glucose, even more so if they are expressing high affinity GLUT3. In the discussion (Lines 420-424), we speculate that this may trigger the Crabtree effect, shutting down OXPHOS and causing the cells to exclusively rely on glycolysis. This, however, will likely not yield sufficient ATP to maintain their viability, so that they “starve” to death even in the presence of ample glucose. Since cones require at least twice as much ATP as rods, they may be more sensitive to a Crabtree-dependent shut-down of OXPHOS. However, if this speculation was correct then the question remains why the FCCP treatment, which abolishes OXPHOS more directly, does not cause cone death. Here, we again can only speculate that high glucose may have additional toxic effects on cones that are independent of OXPHOS. We now try to present this reasoning in the discussion (Lines 426-429).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their comments, as well as for the time dedicated to make useful suggestions that have contributed to improve the manuscript. We have responded to the concerns raised by the reviewers, and after that, we have also responded to the different points highlighted in the Recommendations for the authors:

      Reviewer #1

      While in vivo injury was used to assess regeneration from subsets of PNS neurons, different in vitro neurite growth or explant assays were used for further assessments. However, the authors did not assess whether the differential "regenerative" responses in vivo could be recapitulated in vitro. Such results will be important in interpreting the results.

      We included a supplementary figure evaluating the neurite extension in vitro and updated the text accordingly.

      Intriguingly, even in individual groups of PNS neurons, not all neurons regenerate to the same extent. It is known that the distance between the cell body and the lesion site affects neuronal injury responses. It would be interesting to test this in the observed regeneration.

      Although it is true that the distance can affect the outcome, here we used a physiological model where all neurons are lesioned at the same point in the nerve. Not only distance is different for motoneurons, but also the microenvironment surrounding their somas and therefore the direct comparison of these neurons with sensory neurons is limited. We extended the discussion on this matter in the new manuscript.

      Fig 1: The authors quantified the number of regenerating axons at two different time points. However, the total numbers of neurons/axons in each subset are different. The authors should use these numbers to normalize their regenerative axons.

      Figure 1D shows the normalization of data from figure 1C (normalized against the number of control axons in each neuron type). This has been clarified in the text.

      Fig 2-5: In explaining differential regeneration of individual groups of neurons, there are at least two possibilities: (1). Each group of neurons has different injury/regenerative responses; (2). The same set of injury/regenerative responses are differentially activated. Some data in this manuscript suggested the latter possibility. But some other data point in the opposite direction. It would be informative for the authors to analyze/discuss this further.

      From our point of view, these two options can be considered differential response to injury and could be potentially used for the modulation of regeneration. However, if the second possibility is correct, the regenerative program could be more influenced by the time chosen to study the response. Given the importance of this, we added some discussion about this topic.

      Fig 6: Is it possible to assess the regenerative effects of knockdown Med12 after in vivo injury?

      It is possible, but it is out of the scope of this work. Here, we aimed to describe the regenerative response and validate our data by testing a potential target for specific regeneration. Future studies will focus on the modulation of this specific regeneration both in vitro and in vivo.

      Reviewer #2

      It seems that the most intriguing outcome of this paper revolves around the role of Med12 in nerve regeneration. The authors should prioritize this finding. Drawing a conclusion regarding Med12's role in proprioceptor regeneration based solely on this in vitro model may be insufficient. This noteworthy result requires further investigation using more animal models of nerve regeneration.

      The main goal of this work was to compare the regenerative responses of different neuron subpopulations. We modulated Med12 to validate our data and the potential of our findings. Unfortunately, investigating in depth the role of Med12 in regeneration is out of the scope of this paper. For this reason, we did not prioritise this finding here. As this finding was striking, we strongly agree that the next step should be studying how it modulates regeneration.

      One critique revolves around the authors' examination of only a single time point within the dynamic and continuously evolving process of regeneration/reinnervation. Given that this process is characterized by dynamic changes, some of which may not be directly associated with active axon growth during regeneration, and encompasses a wide range of molecular alterations throughout reinnervation, concentrating solely on a single time point could result in the omission of critical molecular events.

      We agree that this is probably the main limitation of this study, as we discussed in the text. We chose 7 days postinjury as a standard time point widely described in literature and to have a correlate with our histological data. Although the main aim was to compare populations, analyzing an additional time point after injury could add valuable information.

      Reviewer #3

      No concerns were expressed by that reviewer.

      Recommendations for the authors:

      The authors should assess whether the differential "regenerative" responses in vivo could be recapitulated in vitro.

      We included a supplementary figure evaluating the neurite extension in vitro and updated the text accordingly.

      Optional:

      It will be interesting to test if the distances between the cell body and the lesion site contribute to the observed differences in individual subsets of PNS neurons.

      Figure 1D shows the normalization of data from figure 1C (normalized against the number of control axons in each neuron type). This has been clarified in the text.

      Fig 2-5: In explaining differential regeneration of individual groups of neurons, there are at least two possibilities: (1). Each group of neurons has different injury/regenerative responses; (2). The same set of injury/regenerative responses are differentially activated. Some data in this manuscript suggested the latter possibility. But some other data point in the opposite direction. At least the authors should discuss these.

      From our point of view, these two options can be considered differential response to injury and could be potentially used for the modulation of regeneration. However, if the second possibility is correct, the regenerative program could be more influenced by the time chosen to study the response. Given the importance of this, we added some discussion about this topic.

      While the paper is technically well-executed, the conclusions and some of the findings appear to be incomplete and challenging to draw meaningful conclusions from. This manuscript presents some interesting findings, but the title is quite broad and may suggest that the authors have unveiled fundamental mechanisms explaining the varying regenerative abilities of peripheral axons. However, the results do not substantiate such a conclusion. Further comments and suggestions follow.

      We eliminated the word “regenerative (response)” from the title, as it could lead to think that all changes seen in these neurons are related only to regeneration. We think that “Neuron-specific RNA-sequencing reveals different responses in peripheral neurons after nerve injury” highlights the differences between neurons that we found without misleading towards thinking that we described regenerative mechanisms in all neurons.

      What's notably absent here is the validation of certain genes found with the ribosomes, especially those highlighted in the subsequent figures. The question arises as to whether the changes depicted in the figures align with changes in the DRGs in vivo. Is there concordance between the presence of these genes and their transcriptional changes? It would greatly enhance the study's value if the authors could show evidence of upregulation or downregulation of certain genes over time in tissue sections, utilizing techniques such as in situ hybridization or immunocytochemistry.

      We selected some factors that were specifically upregulated in subsets of neurons to corroborated by immunohistochemistry these findings. Changes in the immunofluorescence of P75 in motoneurons and ATF2 in cutaneous mechanoreceptors, were evaluated in controls and animals that received a nerve crush one week before. Supplementary figures with the images have been added.

      The authors discovered intriguing distinctions, such as the presence of specific signaling pathways unique to neurons projecting to muscle as opposed to those projecting to the skin. Among these pathways were those associated with receptor tyrosine kinases like VEGF, erbB, and neurotrophin signaling among others. The question now arises: do these pathways play a role in natural peripheral regeneration processes? To answer this, it is imperative to conduct in vivo studies. However, the authors employed an in vitro DRG neurite outgrowth assay to demonstrate that various types of neurons exhibit different responses to the presence of different neurotrophins. This does not reflect what actually happens in vivo. While neurotrophins indeed play a role in neuron survival and axon extension during development, their role in postnatal periods changes over time, and it remains unclear whether they play any role in the natural regenerative processes of the peripheral nerve. Therefore, this experiment may not be directly relevant in this case, especially during the early axon extension period of the regenerating axons. if the authors aim to establish a causal link with neurotrophin signaling, it becomes crucial to conduct in vivo experiments by manipulating the expression of key molecules like the receptors.

      It has been widely described that different types of peripheral neurons have a differential expression of Trk receptors, even in the adult, and that these respond differentially to neurotrophins. In our study, we do not stablish a causal relationship between the expression of Trk and neurite extension, but instead we show (as many others) that distinct neurons respond differentially to these neurotrophins. The fact that in vivo studies fail to show a clear effect does not necessarily mean that neurotrophins are not specific. It might mean that their effect is not strong enough to be a useful guide in the complex microenvironment found after an injury. For instance, NGF acts on TrkA (present in some neurons), but in vivo it has been shown to accelerate the clearance of myelin debris in Schwann cells (Li et al., 2020), which could facilitate regeneration of all type of axons, masking any potential specific effect on the subtypes of neurons expressing TrkA. In contrast, in an in vitro setting on neuronal cultures, the specific neuronal effect can be more evident.

      Additionally, it's worth noting that another paper utilizing the same methodology and experimental setup (PMID: 29756027, "Translatome Regulation in Neuronal Injury and Axon Regrowth" by Rozenbaum et al.) exists. Are there any significant differences or shared findings with that study?

      This study shows the transcriptomic response after an injury 4, 12 and 24 hours after an injury in a very similar experimental setup. They focus on comparing the neuronal vs the glial response to the injury, using a Ribotag line that tags ribosomes from all neurons in the DRG rather than specific neuron subtypes. As the time postinjury (24h vs 7 days) and the cell types studied are different, we could not directly compare our results. We did see an upregulation in both datasets of previously described growth-associated genes (Jun, Atf3, Sox11, Sprr1a, Gal…). We included the article in the references for its relevance in the topic.

      It would be helpful for readers to illustrate the finding of the fastest axon regeneration of nociceptors by showing fluorescence micrographs of the nerve samples in addition to the graphs shown in Fig. 1 C/D.

      In figure 1B, we show fluorescence micrographs of the nerves 7 days postinjury. As explained in the results, we counted the number of axons at 2 distances from the injury, we did not analyse the fastest axon. This is due to technical reasons: 7 days after the injury the fastest axon has surpassed our evaluation point, which was the further distance that we could assess in our experimental setting in a consistent manner. If the reviewer thinks that we need to include more images from our evaluations (from 9 dpi for example), we could prepare a new figure.

      The labeling in Fig. 2B is confusing. Is the CHAT immunoreactivity shown in the last panel illustrated by green or red signals? Is the red signal counterstaining with beta-tubulin?

      The labelling was changed in the figure to increase clarity.

      The references to the supplementary data throughout the manuscript are confusing. For example, where can the "Supp data 2" be found? (mention on p. 14 in the merged pdf file). Are they referring to the Excel spreadsheets?

      We divided the supplementary material in supplementary figures/table (found in the pdf) and supplementary data. Supplementary data refers to excel spreadsheets found outside the pdf file. We hope this will be clearer after the final formatting of the article.

      What does the following statement on p. 14 mean?: "The caveat in these analyses was that molecular classification by these approaches may be arbitrary, and not reflective of protein repurposing." This reviewer notes that these databases consider the fact that components participate in different pathways.

      Indeed, we aimed to explain that many proteins participate in different pathways, and this is a limitation of the enrichment analysis. We modified the sentence in the text.

      First paragraph on p. 15: The PPAR and AMPK pathways have much broader roles, and are not only "related to fatty acid metabolism". This factual inaccuracy should be corrected in the manuscript.

      The sentence has been corrected.

      The authors should consider showing increased TGF-beta signaling in their neurons after downregulation of Med12 given the previous implication of TGF-beta signaling in axon regeneration.

      We tried to demonstrate the effect of our knockdown in TGF-beta pathway by analyzing the expression of typical targets from this pathway by qPCR in our cultures. However, we could not detect any difference. We think that this can have two explanations: (1) as only a few cells upregulate Med12 whereas many cells downregulate it, the effect is masked (presumably only proprioceptors will have a significant difference in this pathway and, thus, it would be very difficult to see the effect), or (2) Med12 is not exerting its effect through this pathway. We added a supplementary figure with these data and discussed it in the manuscript.

      It would be helpful to eliminate typos and improve syntax/grammar/style.

      We revised the text to improve style.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In the present manuscript, the authors analyzed diel oscillations in the brain and olfactory organs' transcriptome of Aedes aegypti and Anopheles culicifacies. The analysis of their RNAseq results showed an effect of time of day on the expression of detoxification genes involved in oxidoreductase and monooxygenase activity. Next, they investigated the effect of time of day on the olfactory sensitivity of Ae. aegypti and An. gambiae and identified the role of CYP450 in odor detection in these species using RNAi. In the last part of the study, they used RNAi to knock down the expression of one of the serine protease genes and observed a reduction in olfactory sensitivity. Overall, the experiments are well-designed and mostly robust (see comment regarding the sample size and data analysis of the EAG experiments) but do not always support the claims of the authors. For example, since no experiments were conducted under constant conditions, the circadian (i.e., driven by the internal clocks) effects are not being quantified here. In addition, knocking down the expression of a gene showing daily variations in its expression and observing an effect on olfactory sensitivity is not sufficient to show its role in the daily olfactory rhythms. Knowledge gaps are not well supported by the literature, and overstatements are made throughout the manuscript. Our detailed comments are listed below.

      We sincerely thank the reviewer for their time and consideration, and appreciate the thorough review of our manuscript. Their insightful comments have greatly enriched our work. We also apologies for instances of overinterpreting the data. Your feedback has helped us recognize areas where clarity and caution are needed, and we are committed to addressing these concerns in our revisions. Thank you for your valuable input and guidance.

      Major comments

      Introduction

      1. Several statements made in the introduction are misleading and suggest that authors are trying to exaggerate the impact of their work. For example, "Furthermore, different species of mosquitoes exhibit plasticity and distinct rhythms in their daily activity pattern, including locomotion, feeding, mating, blood-feeding, and oviposition, facilitating their adaptation into separate time-niches (7, 8), but the underlying molecular mechanism for the heterogenous temporal activity remains to be explored." is not accurate since daily rhythms in mosquitoes' transcriptomes, behavior, and olfactory sensitivity have been the object of several publications. Even though some of them are listed later in the introduction, they contradict the claim made about the knowledge gap. See:

      Rund, S. S., Gentile, J. E., & Duffield, G. E. (2013). Extensive circadian and light regulation of the transcriptome in the malaria mosquito Anopheles gambiae. BMC genomics, 14(1), 1-19

      Rund, S. S., Hou, T. Y., Ward, S. M., Collins, F. H., & Duffield, G. E. (2011). Genome-wide profiling of diel and circadian gene expression in the malaria vector Anopheles gambiae. Proceedings of the National Academy of Sciences, 108(32), E421-E430

      Rund, S. S., Bonar, N. A., Champion, M. M., Ghazi, J. P., Houk, C. M., Leming, M. T., ... & Duffield, G. E. (2013). Daily rhythms in antennal protein and olfactory sensitivity in the malaria mosquito Anopheles gambiae. Scientific reports, 3(1), 2494

      Rund, S. S., Lee, S. J., Bush, B. R., & Duffield, G. E. (2012). Strain-and sex-specific differences in daily flight activity and the circadian clock of Anopheles gambiae mosquitoes. Journal of insect physiology, 58(12), 1609-1619

      Leming, M. T., Rund, S. S., Behura, S. K., Duffield, G. E., & O'Tousa, J. E. (2014). A database of circadian and diel rhythmic gene expression in the yellow fever mosquito Aedes aegypti. BMC genomics, 15(1), 1-9

      Eilerts, D. F., VanderGiessen, M., Bose, E. A., Broxton, K., & Vinauger, C. (2018). Odor-specific daily rhythms in the olfactory sensitivity and behavior of Aedes aegypti mosquitoes. Insects, 9(4), 147

      Rivas, G. B., Teles-de-Freitas, R., Pavan, M. G., Lima, J. B., Peixoto, A. A., & Bruno, R. V. (2018). Effects of light and temperature on daily activity and clock gene expression in two mosquito disease vectors. Journal of Biological Rhythms, 33(3), 272-288

      Response: We apologies for this oversight. In the revised manuscript, we have added these references and made changes to the text as suggested by the reviewer.

      The knowledge gap brought up in the next paragraph of the introduction doesn't reflect the questions asked by the experiments: "But, how the pacemaker differentially influences peripheral clock activity present in the olfactory system and modulates olfactory sensitivity has not been studied in detail." Specifically, the control of peripheral clocks by the central pacemaker has not been evaluated here.

      Response: This statement has been modified in the revised manuscript.

      "In vertebrates and invertebrates, it is well documented that circadian phase-dependent training can influence olfactory memory acquisition and consolidation of brain functions" should also cite work on cockroaches and kissing bugs:

      Lubinski, A. J., & Page, T. L. (2016). The optic lobes regulate circadian rhythms of olfactory learning and memory in the cockroach. Journal of Biological Rhythms, 31(2), 161-169

      Page, T. L. (2009). Circadian regulation of olfaction and olfactory learning in the cockroach Leucophaea maderae. Sleep and Biological Rhythms, 7, 152-161

      Vinauger, C., & Lazzari, C. R. (2015). Circadian modulation of learning ability in a disease vector insect, Rhodnius prolixus. Journal of Experimental Biology, 218(19), 3110-3117

      Response: These references have been added in the revised manuscript as suggested by the reviewer.

      The sentence: "Previous studies showed that synaptic plasticity and memory are significantly influenced by the strength and number of synaptic connections (43, 44)." should be nuanced as the role of neuropeptides such as dopamine has also been showed to influence learning and memory in mosquitoes:

      Vinauger, C., Lahondère, C., Wolff, G. H., Locke, L. T., Liaw, J. E., Parrish, J. Z., ... & Riffell, J. A. (2018). Modulation of host learning in Aedes aegypti mosquitoes. Current Biology, 28(3), 333-344 Wolff, G. H., Lahondère, C., Vinauger, C., Rylance, E., & Riffell, J. A. (2023). Neuromodulation and differential learning across mosquito species. Proceedings of the Royal Society B, 290(1990), 20222118

      Response: We agree with the reviewer. We have modified this statement and added the references in the revised manuscript.

      Overall, the paragraph dealing with the idea that "circadian phase-dependent training can influence olfactory memory acquisition and consolidation of brain functions" is very confusing. This paragraph discusses mechanisms of learning-induced plasticity but seems to ignore the simplest (most parsimonious) explanations for the circadian regulation of learning (e.g., time-dependent expression of genes involved in memory consolidation). In addition, the sentence quoted above is circumvoluted to simply say that training at different times of the day affects memory acquisition and consolidation. Although the authors did look at one gene involved in neural function, learning, memory, or circadian effects were not analysed in this study. Please reconsider the relevance of the paragraph.

      Response: We have modified this paragraph as per the suggestions of the reviewer in the revised manuscript.

      The sentence: "But, how the brain of mosquitoes entrains circadian inputs and modulates transcriptional responses that consequently contribute to remodel plastic memory, is unknown." should be rephrased. First, it should be "entrains TO circadian inputs", and second, it suggests that the study will be investigating circadian modulation of learning and memory, which is not the case. Furthermore, the term "remodel plastic memory" is unclear and doesn't seem to relate to any specific cellular or neural processes.

      Response: This statement has been removed from the revised manuscript.

      Given the differences in mosquito chronobiology observed even between strains, why perform the RNAi and EAGs on a different species of Anopheles than the one used for the RNAseq (or vice versa)?

      Response: We agree with the reviewer that there are differences in mosquito chronobiology between different strains and therefore species variation may be challenging for data interpretation. Considering the strict nocturnal behavioral pattern of An. culicifacies and dirurnal behavior of Aedes aegypti, we performed RNA-Seq study with these respective species. However, 1) due to unavailability of EAG facility at ICMR-National Institute of Malaria Research, India (only where An. culicifacies colony is available), 2) challenges in rearing and adaptation of An. culicifacies in a new environment/laboratory, 3) to validate the proof-of-concept of CYP450 function in odorant detection and olfactory sensitivity, we opt for the current collaborative study. We are also aware that species variation of Anopheles for electroantennographic study would be difficult to correlate with the molecular data on An. culicifacies. Thus, we consider An. gambiae (not other Anopheles mosquitoes like An. stephensi, An. coluzzii etc.) because of the availability of diel rhythm associated molecular data for An. gambiae (68). For better interpretation we also compare expression profiling of CYP450 and OBP genes between An. culicifacies and An. gambiae (Supplemental file 3). Importantly, we found similar expression pattern of several CYP450 and OBP/CSP genes between An. culicifacies and An. gambiae. Furthermore, please note that the primary focus of the current MS is to highlight the role of peri-receptor proteins in olfactory sensitivity and odor detection. And, as a proof-of-concept, we validate this hypothesis both in An. gambiae and Aed. aegypti. We believe that the basic mechanism of odor detection and peri-receptor events are similar/conserved from insects to higher vertebrates, therefore, the arguments for species difference can be overruled.

      S. S. C. Rund, J. E. Gentile, G. E. Duffield, Extensive circadian and light regulation of the transcriptome in the malaria mosquito Anopheles gambiae. BMC Genomics. 14 (2013), doi:10.1186/1471-2164-14-218. S. S. C. Rund, T. Y. Hou, S. M. Ward, F. H. Collins, G. E. Duffield, Genome-wide profiling of diel and circadian gene expression in the malaria vector Anopheles gambiae. Proc. Natl. Acad. Sci. U. S. A. 108 (2011), doi:10.1073/pnas.1100584108. S. S. C. Rund, N. A. Bonar, M. M. Champion, J. P. Ghazi, C. M. Houk, M. T. Leming, Z. Syed, G. E. Duffield, Daily rhythms in antennal protein and olfactory sensitivity in the malaria mosquito Anopheles gambiae. Sci. Rep. 3, 2494 (2013).

      Results

      1. "As reported earlier, a significant upregulation of period and timeless during ZT12-ZT18 was observed in both species (Figure 1C)." Please provide effect size and summary statistics.

      Response: The statistics are provided in the Figure S2 in the revised manuscript.

      "Next, the distribution of peak transcriptional changes in both An. culicifacies and Ae. aegypti was assessed through differential gene-expression analysis. Noticeably, An. culicifacies showed a higher abundance of differentially expressed olfactory genes (Figure 1D)" Please provide effect size and summary statistics.

      Response: The statistics are provided in the Table 1 in the revised manuscript.

      "Taken together, the data suggests that the nocturnal An. culicifacies may possess a more stringent circadian molecular rhythm in peripheral olfactory and brain tissues." What do the authors mean by "stringent"? At this point, this should be stated as a working hypothesis, as the statement is not backed up by the data. It is possible that the fewer differentially expressed genes of Aedes aegypti are more central to regulatory networks and cascade into more "stringent" rhythmic control of activities and rhythms.

      Response: We thank the reviewer for this suggestion. We have modified this statement as suggested by the reviewer.

      The section title: "Circadian cycle differentially and predominantly expresses olfaction-associated detoxification genes in Anopheles and Aedes" doesn't make sense. The expression of genes can be modulated by circadian rhythms, but cycles don't express genes. Please rephrase. In addition, this whole section deals with "circadian rhythms" while no experiment has been conducted under constant conditions. The observed daily variations are therefore diel rhythms until their persistence under constant conditions is established.

      Response: We agree with the reviewer and changed the statement accordingly.

      "The downregulated genes of Ae. aegypti did not show any functional categories probably due to the limited transcriptional change." Could the authors explain if this is actually the phenomenon or due to a lack of temporal resolution in the study design (i.e., 4 time points)?

      Response: We do not agree with the reviewer’s comments about the lack of temporal resolution in the current study. The functional categories of differentially expressed genes are deduced by gene set enrichment analysis, which identify the classes of genes that are overrepresented in a large set of genes. The statistical significance value is dependent on the abundance of query and background genes. In our experiments, as the number of queries (i.e. number of downregulated genes) is limited, the enrichment tool, i.e. shinyGo didn’t able to show significant enrichment of downregulated genes with FDR cut-off 0.05 and top 10 pathways were selected. Though we have selected 4 time points, previous study by Rund et al. (BMC Genomics 2013) also showed that compared to Aed. aegypti, An. gambiae possess higher number of rhythmic genes (2.6 fold higher). Therefore, it can be stated that the data that we received is not due to the pitfalls of study design, but probably the physiological difference between Anopheles and Aedes mosquitoes.

      "a GO-enrichment analysis was unable to track any change in the response-to-stimulus or odorant binding category of genes (including OBPs, CSPs, and olfactory receptors)." This finding doesn't corroborate the statements made previously and doesn't align with previously published studies. Is it due to pitfalls in the study design?

      Response: The functional categories of differentially expressed genes are deduced by gene set enrichment analysis, which identify the classes of genes that are overrepresented in a large set of genes. The statistical significance value is dependent on the abundance of query and background genes. Though, differential expression analysis revealed a significant upregulation of a subset of CSPs (~ 5-fold) and OBP6 (~3.3-fold) transcripts in An. culicifacies mosquitoes during ZT12, as the number of queries (i.e. number of chemosensory genes) is limited (i.e. 3), the enrichment tool, i.e. shinyGo didn’t able to show significant enrichment of these categories of genes when FDR cut-off 0.05 and top 10 pathways were selected.

      Moreover, we do not agree with the reviewer regarding the comment on pitfalls of study design because our previous experiments with An. culicifacies according to diel rhythm, considering more extended time points, also revealed similar expression pattern of chemosensory genes (Das De et.al., 2018).

      "In contrast, three different clusters of OBP genes in Ae. aegypti showed a time-of-day dependent distinct peak in expression starting from ZT0-ZT12 (Figure 2F)." Please provide summary statistics.

      Response: Please find the table for summary statistics in the supplemental file 1.

      "In the case of An. gambiae, the amplitudes of odor-evoked responses were significantly influenced by the doses of all the odorants tested (repeated measure ANOVA, p {less than or equal to} 2e-16) (Figure S4B)." Did the authors use a positive control for the EAGs? How did the authors normalize the responses across the two species? Given the way the data is presented, how were the data normalized to allow inter-species comparisons? In addition, It is highly unlikely that all the mosquito preps used in the EAG assay responded to all the odors tested. If that was the case, then the dataset includes missing data for certain odors and time points. We believe the authors have ensured there are at least a certain number of responses per odor and time point combinations. If this is true, repeated measures ANOVA is not suited for analyzing this data because this statistical technique requires all repeated measures within and across preps without missing values. Also, the authors need to correct the summary statistics for multiple comparisons within this framework to avoid inflating type-I errors. Has this been done?

      Response: In our study involving An. gambiae, we observed significant influences of odorant doses on the amplitudes of odor-evoked responses (repeated measure ANOVA, p ≤ 2e-16) (Figure S4B). It's important to note that we did not employ a separate positive control for the electroantennogram (EAG) assays, as the compounds utilized in our research are already known to be EAG active in at least one of the mosquito species under investigation (mentioned in supplementary file 3).

      Our primary objective for performing EAG studies is to correlate the diel-rhythmic molecular data with the diel-rhythmic electroantennographic response in nocturnal and diurnal mosquitoes. To address the normalization of responses across the two species, we opted to control for dose and time rather than normalizing using one of the EAG active compounds. Further, the EAG responses were measured in relation to solvent control. In our experimental design, we utilized different batches of mosquitoes from the same cohort to test each odorant at various time points. EAG responses were acquired using the same mosquito across different dilutions for a single odor or volatile compound, rather than across time points. Hence, we didn’t end up with missing values.

      For individual species analysis, we performed repeated measures ANOVA for each compound's EAG response, considering dose and time as variables. This enabled not only enabled us select compounds which where ‘Time’ or its interaction terms were found to be significant. Subsequently, for compounds showing significance, we conducted a basic one-way ANOVA using only time as a variable, segregating the data by each individual dose. Post-hoc Tukey tests were then carried out to compare between time points. When comparing between species, we generated a dataset by combining both species and adding species as a variable as well. Repeated measures ANOVA for each compound's EAG response, considering species, dose, and time as variables, was applied. This enabled us select compounds which where ‘Time’ or its interaction terms were found to be significant. For significant compounds, a two-way ANOVA was performed using time and species as variables. Data were segregated by each individual dose, and post-hoc Tukey tests were employed to compare between time points. It's worth mentioning that our analysis aims to account for repeated measures within and across preparations. Additionally, we have implemented post-hoc Tukey tests to correct for multiple comparisons within this framework, ensuring that we avoid inflating type-I errors in our statistical interpretations.

      "Ae. aegypti was found to be most sensitive to all the odorants (4-methylphenol, β-ocimine, E2-nonenal, benzaldehyde, nonanal, and 3-octanol) during ZT18-20 except sulcatone (Figure 3C - 3H)." Although some of these chemicals are associated with plants and Ae. aegypti is suspected to sugar feed at night, how do the authors explain that the peak olfactory sensitivity occurs at night for compounds such as nonanal? It would be interesting to discuss how these results compare to previous studies such as:

      Eilerts, D. F., VanderGiessen, M., Bose, E. A., Broxton, K., & Vinauger, C. (2018). Odor-specific daily rhythms in the olfactory sensitivity and behavior of Aedes aegypti mosquitoes. Insects, 9(4), 147

      Response: The possible explanations have been added in the revised MS.

      "Additionally, our principal components analysis also illustrates that most loadings of relative EAG responses are higher towards the Anopheles observations (Figure S4C)." The meaning of this sentence is unclear? Please clarify.

      Response: Considering the limited clarity of the statement we have removed it from the revised manuscript.

      "Taken together these data indicate that An. gambiae may exhibit higher antennal sensitivity to at least five different odorants tested, as compared to Ae. aegypti." As mentioned above, how did the authors normalized across species to allow comparisons? If not normalized, how do you ensure that higher response magnitudes correlate with higher olfactory sensitivity, given potential differences in the morphology or size differences between the two species? Furthermore, An. gambiae has been exclusively used in the EAG assay. Besides the lack of a justification for using a species other than An. culicifacies, the authors have interpreted the EAG results under the assumption that the olfactory sensitivities of An. gambiae and An. culicifacies are comparable. This, however, is a major caveat in the experiment design, given previous studies (indicated below) have reported species-specific variations in olfactory sensitivity. In its present form, the EAG data from An. gambiae is not a piece of appropriate evidence that the authors could use to complement or substantiate the findings from other aspects of this study on An. culicifacies.

      Wheelwright, M., Whittle, C. R., & Riabinina, O. (2021). Olfactory systems across mosquito species. Cell and Tissue Research, 383(1), 75-90. Wooding, M., Naudé, Y., Rohwer, E., & Bouwer, M. (2020). Controlling mosquitoes with semiochemicals: a review. Parasites & Vectors, 13, 1-20.

      iii. Gupta, A., Singh, S. S., Mittal, A. M., Singh, P., Goyal, S., Kannan, K. R., ... & Gupta, N. (2022). Mosquito Olfactory Response Ensemble enables pattern discovery by curating a behavioral and electrophysiological response database. Iscience, 25(3).

      Response: The data is normalized as described above in the point 15. Also, it is technical limitation that we had to use multiple species of the mosquito for this study (please refer to the point 7).

      The reviewer’s statement “Besides the lack of a justification for using a species other than An. culicifacies, the authors have interpreted the EAG results under the assumption that the olfactory sensitivities of An. gambiae and An. culicifacies are comparable” is not true, as we never assume similar olfactory sensitivity between An. culicifacies and An. gambiae. We only consider nocturnal activity for both the mosquito species. Moreover, we are aware that species variation of Anopheles for electroantennographic study would be difficult to correlate with the molecular data on An. culicifacies. Thus, we consider An. gambiae (no other Anopheles mosquitoes like An. stephensi, An. coluzzii etc.) because of the availability of diel rhythm associated molecular data for An. gambiae (68). For better interpretation we also compare expression profiling of CYP450 and OBP genes between An. culicifacies and An. gambiae (Supplemental file 3). Importantly, we found similar expression pattern of several CYP450 and OBP/CSP genes between An. culicifacies and An. gambiae. Furthermore, we would like to emphasize that the primary focus of the current manuscript is to highlight the role of peri-receptor proteins in olfactory sensitivity and odor detection. And, as a proof-of-concept, we validated this hypothesis both in An. gambiae and Aed. aegypti. We believe that the basic mechanism of odor detection and peri-receptor events are similar/conserved from insects to higher vertebrates.

      "Similar to An. gambiae, a comparatively high amplitude response was also observed in An. stephensi (Figure S4D)." This is interesting but what would be even more relevant to the present study is to discuss how the time-dependent responses compare between the two Anopheles species.

      Response: We agree that it will be interesting to compare time-dependent response between the two Anopheles species. However, it is not our primary interest and objectives, and is beyond the scope of the current manuscript. Thus, we remove the data from the revised MS.

      The paragraph titled "Daily temporal modulation of neuronal serine protease impacts mosquito's olfactory sensitivity" is confusing because the authors move on to test the effect of knocking down a serine protease gene (found to be differentially expressed throughout the day) on olfactory sensitivity. While this is interesting in and of itself, the link between the role of this gene in learning-induced plasticity, the circadian modulation of "brain functions" and olfactory sensitivity is 1) unclear and 2) not explicitly tested. We agree with the authors that what has been tested is "the effect of neuronal serine protease on circadian-dependent olfactory responses," but the two paragraphs leading to it seem to be extrapolating functional links that have yet to be determined. In this context, their conclusions that "Our finding highlights that daily temporal modulation of neuronal serine-protease may have important functions in the maintenance of brain homeostasis and olfactory odor responses." is misleading because although they used the hypothetical "may", the link between the temporal modulation of one serine protease gene and the maintenance of brain homeostasis is not explicitly tested here.

      Response: Though, we strongly believe that neuronal serine protease are involved in remodelling of extracellular matrix and the maintenance of brain homeostasis, the limitation of experimental validation by neuroimaging (out of the scope of the current manuscript), restricting us to draw the conclusion. Therefore, we have modified our conclusions based on the available data as suggested by the reviewer.

      Discussion

      1. The first sentence of the discussion: "In this study, we provide initial evidence that the daily rhythmic change in the olfactory sensitivity of mosquitoes is tuned with the temporal modulation of molecular factors involved in the initial biochemical process of odor detection i.e., peri-receptor events" is not true since studies from Rund and Duffield previously revealed the daily modulation of OBP gene expression. It also contradicts the next sentence: "The findings of circadian-dependent elevation of xenobiotic metabolizing enzymes in the olfactory system of both Ae. aegypti and An. culicifacies are consistent with previous literature (26, 31), and we postulate that these proteins may contribute to the regulation of odorant detection in mosquitoes."

      Response: This statement is modified in the revised manuscript.

      The use of "circadian" in the discussion of the results is also misleading as only diel rhythms were evaluated in the present study.

      Response: This is changed in the revised manuscript.

      "Given the potentially larger odor space in mosquitoes (like other hematophagous insects) (16, 58)." This is not really what these references show.

      Response: The statement and the references have been changed in the revised manuscript.

      "Given the potentially larger odor space in mosquitoes (like other hematophagous insects) (16, 58), it can be hypothesized that detection of any specific signal in such a noisy environment, mosquitoes may have evolved a sophisticated mechanism for rapid (i) odor mobilization and (ii) odorant clearance, to prevent anosmia (24)." One could argue that this is a requirement for all insects, regardless of the size of their olfactory repertoire.

      Response: We agree with the reviewer and modified the text accordingly.

      "Taken together, we hypothesize that circadian-dependent activation of the peri-receptor events may modulate olfactory sensitivity and are key for the onset of peak navigation time in each mosquito species." This is not entirely accurate since spontaneous locomotor activity rhythms are also observed in the absence of olfactory stimulation. While "navigation" does imply olfactory-guided behaviors, "peak navigation time" appears to be driven by other processes. See, for example, all studies testing mosquito activity rhythms in locomotor activity monitors. Response: Considering the concern of the reviewer, we have modified the text.

      "Due to technical limitations, and considering the substantial data on the circadian-dependent molecular rhythmicity" please clarify what the technical limitations were. Is this something that prevented the authors specifically, or something tied to mosquito biology and would prevent anybody from doing it? Also, why couldn't the transcriptomic analysis be performed on An. gambiae?

      Response: As previously mentioned, primarily, unavailability of EAG facility at ICMR-National Institute of Malaria Research, India (only where An. culicifacies colony is available) is the major challenge for us to proof our hypothesis. Secondly, transportation of An. culicifacies was not possible due to Govt. regulations and also adaptation and establishment of the colony of An. culicifacies take long time as it is not easily adapted (Adak T, Kaur S, Singh OP. Comparative susceptibility of different members of the Anopheles culicifacies complex to Plasmodium vivax. Trans R Soc Trop Med Hyg. 1999;93:573–577) in a new environment/laboratory. Thirdly, An. culicifacies colony was not available at our collaborative laboratory. These are the major technical limitations.

      Therefore, to validate the hypothesis of CYP450 function in odorant detection and olfactory sensitivity, we opt for the current collaborative study. We are also aware that species variation of Anopheles for electroantennographic study would be difficult to correlate with the molecular data on An. culicifacies. Thus, we consider An. gambiae (not other Anopheles mosquitoes like An. stephensi, An. coluzzii etc.) because of the availability of diel rhythm associated molecular data for An. gambiae (68). For better interpretation we also compare expression profiling of CYP450 and OBP genes between An. culicifacies and An. gambiae (Supplemental file 3). Importantly, we found similar expression pattern of several CYP450 and OBP/CSP genes between An. culicifacies and An. gambiae. Performing another RNA-Seq study with An. gambiae would not be possible for the current MS. Furthermore, please note that the primary focus of the current MS is to highlight the role of peri-receptor proteins in olfactory sensitivity and odor detection. And, as a proof-of-concept, we validate this hypothesis both in An. gambiae and Aed. aegypti. We believe that the basic mechanism of odor detection and peri-receptor events are similar/conserved from insects to higher vertebrates.

      "In contrast to An. gambiae, the time-dose interactions had a higher significant impact on the antennal sensitivity of Ae. aegypti. An. gambiae showed a conserved pattern in the daily rhythm of olfactory sensitivity, peaking at ZT1-3 and ZT18-20." These two sentences are very confusing. Doesn't it simply mean that the co-variation is not linear or not the same across odors? In addition, what does it mean for a pattern to be more conserved? How can one conclude about the "conserved" nature of a pattern by looking at time-dependent variations in dose-response curves?

      Response: This section of discussion is re-written in the revised version of the manuscript.

      "Together these data, we interpret that mosquito's olfactory sensitivity possibly does not follow a fixed temporal trait" is unclear and suggests that the authors are discussing global versus odor-specific rhythms. Please rephrase.

      Response: This section of discussion is re-written in the revised version of the manuscript.

      "Moreover, we hypothesize that under standard insectary conditions, mosquitoes may not need to exhibit foraging flight activity either for nectar or blood, and during the time course, it may minimize their olfactory rhythm, which is obligately required for wild mosquitoes." This hypothesis is not supported by the results of the study and contradicts work by others (Rund et al., Eilerts et al., Gentile et., etc).

      Response: This section of discussion is re-written in the revised version of the manuscript.

      The same comment applies to "Therefore, it is reasonable to think that the mosquitoes used for EAG studies may have adapted well under insectary settings and, hence carry weak olfactory rhythm." as this statement is not supported by results of the present study or comparisons of the results to previous studies based on field-caught mosquitoes. Although it is an interesting question to ask in the future, it should be stated as a future research avenue rather than a working hypothesis that results from the present study.

      This section of discussion is re-written in the revised version of the manuscript.

      "Aedes aegypti displayed a peak in antennal sensitivity at ZT18-20 to the higher concentrations of plant and vertebrate host-associated odorants tested. Given the time-of-day dependent multiple peaks (at ZT6-8 and ZT18-20 for benzaldehyde and at ZT12-14 and ZT18-20 for nonanal) in antennal sensitivity to different odorants, our data supports the previous observation of bimodal activity pattern of Ae. aegypti (50)." Rephrase by saying that results are "aligned with the previous observations of bimodal activity". Olfactory rhythms don't "support" the activity patterns because olfactory processes and spontaneous locomotor activity are independent processes.

      Response: We have made these changes in the revised manuscript as per the suggestions of the reviewer.

      "our preliminary data indicate that Anopheles spp. may possess comparatively higher olfactory sensitivity to a substantial number of odorants as compared to Aedes spp." Consider removing this sentence unless the way the data has been normalized to allow for comparisons between species is clarified.

      Response: This statement is removed from the revised manuscript.

      In "A significant decrease in odorant sensitivity for all the volatile odors tested in the CYP450-silenced Ae. aegypti," please change "silenced" to "reduced" because RNAi doesn't silence (i.e. knockout) gene expression.

      Response: It has been modified as per the suggestions of the reviewer.

      The title "Neuronal serine protease consolidates brain function and olfactory detection" is extremely misleading. Do the authors refer to memory consolidation, which has not been tested here? What is brain function consolidation??

      Response: We agree with the reviewer. The title has been modified in the revised manuscript.

      The reference used in "Despite their tiny brain size, mosquitoes, like other insects, have an incredible power to process and memorize circadian-guided olfactory information (7)." is not appropriate. Also, "circadian-guided" is unclear. Consider replacing it with "circadian-gated".

      Response: It has been modified as per the suggestions of the reviewer.

      What is the "the homeostatic process of the brain"?

      Response: The process of maintaining a stable state can be defined as homeostasis. Here, the statement "the homeostatic process of the brain" is used to convey that after the active host-seeking/olfaction phase of mosquitoes during which the co-ordinated and integrated functions of both olfactory and neuronal system is required for crucial decision-making events, brain may undergo a homeostatic process (comes down from excitatory state to stable state) during the resting period. However, in view of reviewer’s concern we have modified the statement.

      "the temporal oscillation of the sleep-wake cycle of any organism is managed by the encoding of experience during wake, and consolidation of synaptic change during inactive (sleep) phases, respectively (70)." By experience, do the authors refer to learning? This seems out of topic as this process has not been evaluated here.

      Response: It has been modified as per the suggestions of the reviewer.

      "We speculate that after the commencement of the active phase (ZT6-ZT12), the serine peptidase family of proteins in the brain of Ae. aegypti mosquitoes may play an important function in consolidating brain actions (after ZT12) and aid circadian-dependent memory formation." The value of this statement is unclear. Circadian-dependent memory formation is not being evaluated here, and the results from the present study do not directly support this speculation, also because other processes involved in memory formation are not evaluated here. This seems at odds with the literature on learning and memory.

      Response: We have modified these statements in the revised manuscript and mentioned it as future research hypothesis.

      "Subsequent work on electrophysiological and neuro-imaging studies are needed to demonstrate the role of neuronal-serine proteases in the reorganization of perisynaptic structure." Sure. But the link between "the role of neuronal-serine proteases in the reorganization of perisynaptic structure" and rhythms in olfactory sensitivity is unclear.

      Response: It has been modified as per the suggestions of the reviewer.

      As a general comment, EAGs seem inappropriate to evaluate the effect of the central-brain processing in the regulation of peripheral olfactory processes. This is a critical comment that needs to be considered by the authors and clarified in the manuscript. If rhythms of central brain processes are important for olfactory-guided behaviors, these should be evaluated at the level of the central brain or via behavioral metrics. The effect of the RNAi knockdowns on peripheral sensitivity is interesting, but its link with central processes is unclear and doesn't support the speculations made by the authors about learning and memory.

      Response: We agree with the reviewer that EAG study is not enough/appropriate to comment on the effect of central-brain processing in the regulation of olfactory processes. Further validation by either neuroimaging or behavioral studies are needed to make any conclusion. We clearly mention in the manuscript that our data indirectly indicating this function of serine protease and further confirmatory studies are needed to prove this hypothesis.

      Methods

      1. No explanations are provided for how the EAG data are normalized to allow comparisons between species.

      Response: Please refer to the response of the point no. 15 of the reviewer 1.

      Figures 42. Figure 1: The daily rhythm depicted in A, are not representative of the actual profiles. See: Benoit, J. B., & Vinauger, C. (2022). Chapter 32: Chronobiology of blood-feeding arthropods: influences on their role as disease vectors. In Sensory ecology of disease vectors (pp. 815-849). Wageningen Academic Publishers. Or any other paper on mosquito activity rhythms.

      Response: Considering the reviewer’s concern we have revised the figure.

      Figure 3 and 4: The EAG results are plotted twice. This is redundant and misleading as it makes the reader think there is more data than actually presented.

      Response: Considering the reviewer’s comment we shifted figure 4 into the supplemental file.

      Figure 5: Please clarify the sample size for each panel. In C - F, what would be used as a reference? In other words, what is a Relative EAG Response of 1? And if it is "relative", are the units really mV? In E and F, it would be great to show how the Ethanol control compares to the no solvent condition. This could be placed in supplementary materials.

      Response: The sample size was mentioned in the figure legends. However, for the reviewer’s clarification, the odor response was tested with 40 individual mosquitoes of control and dsrRNA-treated groups. Therefore, sample size N=40 for Fig. 5C.

      Respective solvent control (hexane solvent) used as a reference to calculate the relative EAG response for both the dsrLacZ and dsrCYP450 group. As it is relative EAG amplitude we have removed the unit in the revised MS.

      Figures 5 and 6, given the dispersion in the EAG data, the treatments where N=40 appear robust, but the interpretation of results from treatments where N=6 may be limited due to the low sample size. This limitation is visible in Figure 5F, for example, where ABT-Aceto is different from Cont-Aceta but not PBO-Aceto because one individual shows a higher response.

      Response: We agree that probably, by increasing the sample size for inhibitor treatment experiment, may decrease these inter-individual differences and increase the overall significance value. However, our robust knock-down data showed significant results and simultaneously it complements the inhibitor study in Ae. aegypti, we do not think of any disparity in the data. Moreover, EAG response to human blend, nonanal and benzaldehyde showed similar significant results in both RNAi and inhibitor studies. Accounting, the different knock-down efficiency in dsRNA injected mosquitoes, the phenotypic assays (EAG recordings) were carried out with 40 control and 40 dsRNA-treated mosquitoes. And, we observed significant reduction in EAG response following inhibitor treatment in An. gambiae, when we tested for 6 ethanol and 6 inhibitor treated mosquitoes. Thus, we followed the similar protocol for Ae. aegypti also. However, inter-individual difference in response is affecting the significance value.

      Figure S6: how does this support that synaptic plasticity is influenced by "Time-of-day dependent modulation of serine protease genes in the brain"?

      Response: We agree with the reviewer’s concern that with only EAG data it is not possible to comment on synaptic plasticity. We apologize for it and revised the statement in the MS.


      Minor comments

      What do the authors mean by "consolidation of brain functions"? Memory consolidation? Please clarify.

      Response: The consolidation of brain function or memory consolidation means to the process of stabilizing the memory that an organism gains through the process of experience or training/learning phase. Memory consolidation initiates with rapid change in de-novo gene expression regulated by several transcription factors, effector genes and non-coding RNAs, known as molecular consolidation followed by cellular consolidation that involves cellular signal transmission within the neurons in the brain. The molecular and cellular consolidation are the basis for system level consolidation which is a slow process and involves communication among neurons located different regions of the brain. The system level consolidation is very important for the reorganization of the brain circuits to maintain long-term memory. The concept of system consolation is very much well evident in humans. Additionally, several studies in Drosophila also showed that fruit fly develop olfactory memories after classical conditioning or olfactory training through system consolidation process.

      Moreover, accumulating data from humans suggest that sleep helps in memory consolidation. Sleep is basic drive for all animals that help to build memories. There are two hypothesis and respective compelling evidences for that. First hypothesis and the supporting molecular and electrophysiological data convey that sleep facilitate the homeostatic processes of the brain involving loosening of synaptic connections between the overactive neurons, structural modification of synapse which consequently help in memory formation. The second hypothesis state the important contribution of sleep in system consolidation and long-term memory potentiation. Studying the electrical activity of the brain and the recent advancement of fMRI scan indicate reorganization of neural activity between brain regions during sleep-related memory consolidation.

      There are several experimental evidences in support of both the theory for humans as well as in fruit fry Drosophila melanogaster. In mosquitoes, the studies related to the function of brain are primarily restricted to the mechanism of odor coding and memory formation has been correlated with Dopamine neurotransmitter signalling. In view of the rapid adaptation potential, change in host-preference and evolution of temporal host-seeking behaviour, it can be hypothesized that mosquito brain also undergo the process of memory consolidation (either following any of the two hypothesized path or cumulatively apply the both) to learn new information in order to effectively shape future actions.

      Furthermore, according to the fundamental principle of modern neuroscience learning and memory are achieved either by the formation of new synaptic connections or changing in existing connections between neurons. The ability of synapses to either strengthen or weaken the communications is called plasticity which is influenced by learning and experience and facilitate organism’s adaptation and survival.

      Reference:

      1. Cervantes-Sandova, A. Martin-Peña, J. A. Berry, R. L. Davis, System-like consolidation of olfactory memories in Drosophila. J. Neurosci. 33, 9846–9854 (2013).
      2. In "Similar to previous studies (26), the expression of a limited number of rhythmic genes was visualized in Ae. aegypti" please replace "visualized" with "observed".
      3. Marshall, N. Cross, S. Binder, T. T. Dang-Vu, Brain rhythms during sleep and memory consolidation: Neurobiological insights. Physiology. 35, 4–15 (2020).
      4. Brendon O. Watson and György Buzsáki. Sleep, Memory & Brain Rhythms. Daedalus, 144(1): 67–82 (2015). doi:10.1162/DAED_a_00318

      Figure 2A, please clarify in the caption what FDR stands for.

      Response: FDR stands for “false discovery rate”. FDR is an adjusted p-value to trim false positive results.

      In "To further establish this proof-of-concept in An. gambiae, three potent CYP450 inhibitors, aminobenzotriazole(52), piperonyl butoxide(53), and schinandrin A (54), was applied topically on the head capsule of 5-6-day-old female mosquitoes" replace "was applied" with "were applied".

      Response: These changes are made in the revised manuscript.

      "Interestingly, our species-time interaction studies revealed that An. gambiae exhibits time-of-day dependent significantly high antennal sensitivity to at least four chemical odorants compared to Ae. aegypti, except phenol." is unclear. Please reword.

      Response: The statement has been revised in the MS.

      In "Similar observations were also noticed with An. stephensi." replace "noticed" with "made". Response: We have modified the statement in the revised version of the manuscript.



      Reviewer #1 (Significance (Required)):

      Such a study has the potential to be valuable for the field, but its value and significance are hindered by an accumulation of overstatements, the fact that prior work in the field has been minimized or omitted, and a lack of support for the stated conclusions.

      In this context, the advances are only slightly incremental compared to the work produced by Rund et al., and the mechanistic hypotheses emitted to link the genes selected for knockdown experiments and olfactory sensitivity are not clearly supported by the evidence presented here. The main strength of the paper is to show the role of CYP450 in olfactory sensitivity.

      The audience is fairly broad and includes insect neuro-ethologists, molecular biologists, and chronobiologists.

      Our field of expertise:

      • Mosquito chemosensation

      • Learning and memory

      • Chronobiology

      • Electrophysiology

      • Medical entomology









      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This report combines an examination of peripheral transcriptomes and general olfactory sensitivity in an effort to underscore the importance of peri-receptor components in circadian-directed modulation of olfaction across both Aedine and Anopheline mosquitoes. While the authors do a nice job of raising the importance of the often-underappreciated spectrum of insect olfactory peri-receptor proteins, the impact of their study is undercut by technical concerns regarding methods and data presentation. That several of these concerns (detailed below) are explicitly acknowledged by the authors as limitations of this study does not mitigate their impact in eroding confidence in these data and this study.

      All in all, as a result of these concerns, I am unconvinced as to the overall merits of this somewhat interesting but generally uneven study.

      We sincerely thank the reviewer for their time and consideration, and appreciate the thorough review of our manuscript. Their insightful comments have greatly enriched our work. We also apologies for instances of overinterpreting the data. Your feedback has helped us recognize areas where clarity and caution are needed, and we are committed to addressing these concerns in our revisions. Thank you for your valuable input and guidance.

      Major concerns:

      1. That the authors use An. culicifacies for their transcriptome studies and An. gambiae (G3) for the olfactory physiology does not work. The 'technical limitations' (read studies done at two different locations) make this report an unwelcome melding of what should perhaps be two distinct studies. In order to maintain this forced marriage as a single report I would suggest the authors utilize An. culicifacies for both components. Alternatively, they can do both parts with An. gambiae but here I would strongly urge them to use any strain other than G3 which as a result of its now decades-long laboratory residence has long since lost its relevance to natural populations of Anopheline vectors. Response: We agree with the reviewer that there is significant species-specific variation in olfactory sensitivity of mosquitoes. Considering the strict nocturnal behavioral pattern of An. culicifacies and dirurnal behavior of Aedes aegypti, we performed RNA-Seq study with these respective species. However, 1) due to unavailability of EAG facility at ICMR-National Institute of Malaria Research, India (only where An. culicifacies colony is available), 2) challenges in rearing and adaptation of An. culicifacies in a new environment/laboratory (An. culicifacies take long time as it is not easily adapted, Ref: Adak T, Kaur S, Singh OP. Comparative susceptibility of different members of the Anopheles culicifacies complex to Plasmodium vivax. Trans R Soc Trop Med Hyg. 1999;93:573–577), 3) An. culicifacies colony was not available at our collaborative laboratory, 4) to validate our hypothesis of CYP450 function in odorant detection and olfactory sensitivity of mosquitoes, we opt for the current collaborative study.

      We are also aware that species variation of Anopheles for electroantennographic study would be difficult to correlate with the molecular data on An. culicifacies. Thus, we consider An. gambiae (not other Anopheles mosquitoes like An. stephensi, An. coluzzii etc.) because of the availability of diel rhythm associated molecular data for An. gambiae (68). For better interpretation we also compare expression profiling of CYP450 and OBP genes between An. culicifacies and An. gambiae (Supplemental file 3). Importantly, we found similar expression pattern of several CYP450 and OBP/CSP genes between An. culicifacies and An. gambiae. Performing another RNA-Seq study with An. gambiae would not be possible for the current MS. Furthermore, please note that the primary focus of the current MS is to highlight the role of peri-receptor proteins in olfactory sensitivity and odor detection. And, as a proof-of-concept, we validate this hypothesis both in An. gambiae and Aed. aegypti. We believe that the basic mechanism of odor detection and peri-receptor events are similar/conserved from insects to higher vertebrates.

      The 70-80% alignment rate reported to the An. culicifacies reference genome significantly erodes this reader's confidence in the integrity of their analyses. That low level of alignment can have dramatic impacts on the estimation of transcript abundance has been repeated demonstrated (see, Srivastava, A., Malik, L., Sarkar, H. et al.. Genome Biol 21, 239, 2020, https://doi.org/10.1186/s13059-020-02151-8). This may (in part) explain why olfactory receptors have been largely absent from this data set.

      Response: We agree with the reviewer that alignment rate could have been better but this should not affect the quantitative information we are referring to in this manuscript. The alignment rates could have impacted the qualitative information which can vary due to multiple reasons including the quality of the reference genome. As it is evident from the analysis that in Ae. aegypti 90% of the reads are aligned to the reference genome, still we did not observe any difference in the abundancy of olfactory receptor genes. Previous microarray analysis in An. gambiae by Rund et.al. 2013, also did not show diel rhythmic expression of any OR genes.

      The issue of species choice is further complicated by questions regarding the An. culicifacies species complex which contains 5 cryptic species. How did the authors confirm they are indeed working with An. culicifacies species A -there is no mention regarding the molecular identification.

      Response: The An. culcifacies species A colony has been colonized at NIMR since 1999, with routine checks performed to verify its purity of species by analyzing inversion genotypes on chromosomes for the presence of sibling species (see the references). But at that time, we had three sibling species--A, B, C; subsequently, we lost B and C. Giving old references will not serve the purpose. Later we verified sibling species A by inversion genotype on chromosome and molecular tools. However, we do not have any published reference for that verified data.

      The species can be identified by performing 28S rDNA-based PCR (Singh et al, 2004) and cytochrome oxidase II-based PCR (Goswami et al 2006). Sequencing can also serve the purpose.


      Singh OP, Goswami G, Nanda N, Raghavendra K, Chandra D, Subbarao SK. An allele-specific polymerase chain reaction assay for the identification of members of Anopheles culicifacies complex. J Biosci. 2004; 29: 275—280 10.1007/bf02702609

      Goswami G, Singh OP, Nanda N, Raghavendra K, Gakhar SK, Subbarao SK. Identification of all members of the Anopheles culicifacies complex using allele-specific polymerase chain reaction assays. Am J Trop Med Hyg. 2006; 75: 454-460. doi: 10.4269/ajtmh.2006.75.454

      Adak T, Kaur S, Singh OP. Comparative susceptibility of different members of the Anopheles culicifacies complex to Plasmodium vivax. Trans R Soc Trop Med Hyg. 1999;93:573–577

      The switch from dsRNAi studies in Aedes to protease inhibitor studies in Anopheles adds to the interspecies confusion.

      Response: Our main goal in this study was to evaluate the function of CYP450 in mosquito’s odor detection and olfactory sensitivity. Our data as well as previous data (Rund et.al. 2011, Rund et.al. 2013) suggesting that the basic mechanism of odor detection and peri-receptor events are similar for both An. gambiae, An. culicifacies and Ae. aegypti, and the role of detoxification genes are very much evidenced from these data. Based on our RNA-Seq data on Ae. aegypti, we shortlisted one CYP450 gene for functional knockdown assays. However, for Anopheles we used An. gambiae for functional validation. Thus, it was not possible for us to select appropriate CYP450 gene from An. gambiae. That is why, we plan for using CYP450 protein inhibitors which block the function of all the CYP450 expressing in the olfactory system of mosquitoes. Expectedly, we also observed much more pronounced reduction of olfactory sensitivity when inhibitors were applied compared to dsRNAi mediated knock-down the function of only one CYP450 protein. These data indicate that Anopheles also possess similar mechanism of perireceptor events for odor detection and CYP450 plays an important role in it.

      The olfactory shifts presented in Fig 3 are somewhat underwhelming. In An. gambiae this mostly seen at very high (to my eyes, non-biologically relevant) 10-1 dilutions. In Aedes, while statistically significant, the EAG values (especially for 4MePhenol) are very low and therefore suspect and unconvincing. It is also unclear how 'Relative EAG Responses' were derived?? Does this mean relative to solvent alone controls??

      Response: Yes, relative EAG response means relative to respective solvent control. We also make necessary changes in the text as well as in the figures for better understanding and representation.

      The same data set seems to have been presented in Figures 3 and 4, with the latter's absence of salient details e.g. haphazard odor concentrations which are seen only when legend is examined). These factors make the inclusion of Figure 4 less obvious.

      Response: Depending on the reviewer’s concern we shifted the Figure 4 into the supplemental data and we are sorry for the miscommunication.

      I am concerned that the data in Figure 5B is derived from only those samples with altered EAGs. I believe that all injected mosquitoes should be assayed in order to better understand the actual efficacy of the treatment. The cherry picking of samples is troubling.

      Response: We pooled five heads for each replicate and we performed the assay with three replicates. That mean we have taken heads from 15 mosquitoes for each experimental setup (control vs knock-down). It is true that we did not consider all the 40 mosquitoes that we used for EAG-recordings. However, we believe that 15 mosquitoes will be a good representation of the population. And the error bars among replicates of the knock-down mosquitoes, compared to the dsLacZ group, clearly indicates the disparity in knock-down efficiency among individuals.

      As is true for earlier figures, Figure 5c-f is lacking critical information about concentration (also not presented in figure legend) and should be done within the context of a multi-point dose response study. The data in its current form is not acceptable.

      Response: We apologize for the mistake for not mentioning the concentration of the inhibitors. Now, we added this information in the revised manuscript.

      The same data concerns apply to Figure 6d-g.

      Response: We apologize for the mistake for not mentioning the concentration of the inhibitors. Now, we added this information in the revised manuscript.

      The inclusion of An. stephensi data Figure S4D seems thrown in as an after-thought and without good reason.

      Response: Our RNA-Seq data on An. culicifacies and Aedes aegypti revealed similar abundance and expression pattern of rhythmic transcripts specifically for peri-receptor transcripts, as reported before by Rund et. al. 2011 & 2013 for Aedes aegypti and Anopheles gambiae. Moreover, we observed significant difference in EAG response between Aedes aegypti and Anopheles gambiae, we hypothesized that higher abundance of rhythmic peri-receptor transcripts possibly has correlation with high EAG response in Anopheles. Therefore, to get an idea about the EAG response for other Anopheles sp. we used An. stephensi, and observed similar difference in EAG response. Though, it will be interesting to compare time-dependent response between the two Anopheles species, it is not our primary interest and objectives, and is beyond the scope of the current MS and the objective can be elaborated further in future.

      I am unsure how shifts in CNS levels of P450 or serine proteases impact peripheral EAG recordings? This is especially so given that any effects on synaptic plasticity/efficacy that might occur are expected to be downstream of the peripheral antennae being recorded in EAGs. The authors do not do a great job explaining away that paradox even though that section in the discussion seems overly speculative.

      Response: We agree with the reviewer that EAG study is not enough/appropriate to comment on the effect of central-brain processing in the regulation of olfactory processes. Further validation by either neuroimaging or beavioral studies are needed to make any conclusion. And we clearly mention in the MS that our data indirectly indicating this function of serine protease and further confirmatory studies are needed to proof this hypothesis. However, it is not possible for us to perform all the experiments now, due to technical and infrastructural limitations. Thus, we hypothesized it as future research endeavour. Moreover, considering the reviewer’s concern we have modified the text and removed the overstatements and speculations.

      The authors discussion on peri-receptor protein oscillation seems premature given the data that is presented (regardless of the caveats discussed above) center on transcript abundance. There is no data on protein abundance, which while related, is an entirely different question/issue.

      Response: Yes, we agree that our hypothesis of peri-receptor protein oscillation is based on our RNA-Seq data. However, later we validated our hypothesis by knock-down studies in mosquitoes as well as we used CYP450 protein inhibitors, where also we observed significant results of decrease in olfactory sensitivity. It is true that we do not have any data on protein abundance, but several previous studies along with our data showed the similar expression profiling of peri-receptor genes, which clearly indicates that the rhythmic expression pattern of these genes are conserved among mosquitoes. None of the previous studies address the hypothesis regarding the peri-receptor events and possible function of XMEs in odorant detection, which is the uniqueness of our study. Therefore, we believe that after functional validation by dsRNAi and inhibitor study, we are able to validate our hypothesis for scientific acceptance. While, CYP450 has been reported to have crucial role in xenobiotic detoxification, its role in odor detection has not been explored yet. We agree that further biochemical validation is required to see the interaction between CYP450 and odor molecules, and how CYP450 is modifying the odorant chemicals either for its detection or for its inactivation. But, such study is out of the scope of the MS and will be our future research endeavour. However, our current data and the MS will have large impact for designing of strategies for application of insecticides, as overlapping the timing of application of insecticide and rhythmic expression/natural upregulation of XMEs could accelerate the inactivation of insecticides and rapid generation of resistant mosquitoes. Thus, we believe that the current revised MS have potential data and would be valuable for publication.

      Minor concerns:

      1. The authors routinely confuse transcript abundance derived from their RNAseq data with gene expression. The former reflects the steady-state snapshot levels of transcripts encompassing\ synthesis, use and decay while the latter is limited to the rate of transcription requiring nuclear run on or single-nucleus RNAseq approaches. Response: Thank you for your insightful comment. We appreciate your clarification regarding the distinction between transcript abundance and gene expression. In the revised manuscript, we have included a clarification stating that 'transcript abundance is referred to as gene expression, unless explicitly stated otherwise”.

      There are numerous typos, spelling errors and other grammatical mistakes-a copy editor is needed.

      Response: In the revised manuscript, we have carefully corrected the spelling errors and other grammatical mistakes.

      Many of the supplemental figures are error filled, lacking sufficient details and otherwise difficult to parse/understand. I recommend revisiting/removing many of these/

      Response: We have improvised on the supplementary figures in the revised manuscript as suggested by the reviewer.

      __ Reviewer #2 (Significance (Required)):__

      In light of the serious concerns described above there is limited significance to this study. Similarly these concerns erode almost all of any advance to the field this study might have offered. The audience of interest would be highly specialized

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In the present manuscript, the authors analyzed diel oscillations in the brain and olfactory organs' transcriptome of Aedes aegypti and Anopheles culicifacies. The analysis of their RNAseq results showed an effect of time of day on the expression of detoxification genes involved in oxidoreductase and monooxygenase activity. Next, they investigated the effect of time of day on the olfactory sensitivity of Ae. aegypti and An. gambiae and identified the role of CYP450 in odor detection in these species using RNAi. In the last part of the study, they used RNAi to knock down the expression of one of the serine protease genes and observed a reduction in olfactory sensitivity. Overall, the experiments are well-designed and mostly robust (see comment regarding the sample size and data analysis of the EAG experiments) but do not always support the claims of the authors. For example, since no experiments were conducted under constant conditions, the circadian (i.e., driven by the internal clocks) effects are not being quantified here. In addition, knocking down the expression of a gene showing daily variations in its expression and observing an effect on olfactory sensitivity is not sufficient to show its role in the daily olfactory rhythms. Knowledge gaps are not well supported by the literature, and overstatements are made throughout the manuscript. Our detailed comments are listed below.

      Major comments

      Introduction

      Several statements made in the introduction are misleading and suggest that authors are trying to exaggerate the impact of their work. For example, "Furthermore, different species of mosquitoes exhibit plasticity and distinct rhythms in their daily activity pattern, including locomotion, feeding, mating, blood-feeding, and oviposition, facilitating their adaptation into separate time-niches (7, 8), but the underlying molecular mechanism for the heterogenous temporal activity remains to be explored." is not accurate since daily rhythms in mosquitoes' transcriptomes, behavior, and olfactory sensitivity have been the object of several publications. Even though some of them are listed later in the introduction, they contradict the claim made about the knowledge gap. See:

      Rund, S. S., Gentile, J. E., & Duffield, G. E. (2013). Extensive circadian and light regulation of the transcriptome in the malaria mosquito Anopheles gambiae. BMC genomics, 14(1), 1-19

      Rund, S. S., Hou, T. Y., Ward, S. M., Collins, F. H., & Duffield, G. E. (2011). Genome-wide profiling of diel and circadian gene expression in the malaria vector Anopheles gambiae. Proceedings of the National Academy of Sciences, 108(32), E421-E430

      Rund, S. S., Bonar, N. A., Champion, M. M., Ghazi, J. P., Houk, C. M., Leming, M. T., ... & Duffield, G. E. (2013). Daily rhythms in antennal protein and olfactory sensitivity in the malaria mosquito Anopheles gambiae. Scientific reports, 3(1), 2494

      Rund, S. S., Lee, S. J., Bush, B. R., & Duffield, G. E. (2012). Strain-and sex-specific differences in daily flight activity and the circadian clock of Anopheles gambiae mosquitoes. Journal of insect physiology, 58(12), 1609-1619

      Leming, M. T., Rund, S. S., Behura, S. K., Duffield, G. E., & O'Tousa, J. E. (2014). A database of circadian and diel rhythmic gene expression in the yellow fever mosquito Aedes aegypti. BMC genomics, 15(1), 1-9

      Eilerts, D. F., VanderGiessen, M., Bose, E. A., Broxton, K., & Vinauger, C. (2018). Odor-specific daily rhythms in the olfactory sensitivity and behavior of Aedes aegypti mosquitoes. Insects, 9(4), 147

      Rivas, G. B., Teles-de-Freitas, R., Pavan, M. G., Lima, J. B., Peixoto, A. A., & Bruno, R. V. (2018). Effects of light and temperature on daily activity and clock gene expression in two mosquito disease vectors. Journal of Biological Rhythms, 33(3), 272-288

      The knowledge gap brought up in the next paragraph of the introduction doesn't reflect the questions asked by the experiments: "But, how the pacemaker differentially influences peripheral clock activity present in the olfactory system and modulates olfactory sensitivity has not been studied in detail." Specifically, the control of peripheral clocks by the central pacemaker has not been evaluated here.

      "In vertebrates and invertebrates, it is well documented that circadian phase-dependent training can influence olfactory memory acquisition and consolidation of brain functions" should also cite work on cockroaches and kissing bugs:

      Lubinski, A. J., & Page, T. L. (2016). The optic lobes regulate circadian rhythms of olfactory learning and memory in the cockroach. Journal of Biological Rhythms, 31(2), 161-169

      Page, T. L. (2009). Circadian regulation of olfaction and olfactory learning in the cockroach Leucophaea maderae. Sleep and Biological Rhythms, 7, 152-161

      Vinauger, C., & Lazzari, C. R. (2015). Circadian modulation of learning ability in a disease vector insect, Rhodnius prolixus. Journal of Experimental Biology, 218(19), 3110-3117

      The sentence: "Previous studies showed that synaptic plasticity and memory are significantly influenced by the strength and number of synaptic connections (43, 44)." should be nuanced as the role of neuropeptides such as dopamine has also been showed to influence learning and memory in mosquitoes:

      Vinauger, C., Lahondère, C., Wolff, G. H., Locke, L. T., Liaw, J. E., Parrish, J. Z., ... & Riffell, J. A. (2018). Modulation of host learning in Aedes aegypti mosquitoes. Current Biology, 28(3), 333-344

      Wolff, G. H., Lahondère, C., Vinauger, C., Rylance, E., & Riffell, J. A. (2023). Neuromodulation and differential learning across mosquito species. Proceedings of the Royal Society B, 290(1990), 20222118

      Overall, the paragraph dealing with the idea that "circadian phase-dependent training can influence olfactory memory acquisition and consolidation of brain functions" is very confusing. This paragraph discusses mechanisms of learning-induced plasticity but seems to ignore the simplest (most parsimonious) explanations for the circadian regulation of learning (e.g., time-dependent expression of genes involved in memory consolidation). In addition, the sentence quoted above is circumvoluted to simply say that training at different times of the day affects memory acquisition and consolidation. Although the authors did look at one gene involved in neural function, learning, memory, or circadian effects were not analyzed in this study. Please reconsider the relevance of the paragraph.

      The sentence: "But, how the brain of mosquitoes entrains circadian inputs and modulates transcriptional responses that consequently contribute to remodel plastic memory, is unknown." should be rephrased. First, it should be "entrains TO circadian inputs", and second, it suggests that the study will be investigating circadian modulation of learning and memory, which is not the case. Furthermore, the term "remodel plastic memory" is unclear and doesn't seem to relate to any specific cellular or neural processes.

      Given the differences in mosquito chronobiology observed even between strains, why perform the RNAi and EAGs on a different species of Anopheles than the one used for the RNAseq (or vice versa)?

      Results

      "As reported earlier, a significant upregulation of period and timeless during ZT12-ZT18 was observed in both species (Figure 1C)." Please provide effect size and summary statistics.

      "Next, the distribution of peak transcriptional changes in both An. culicifacies and Ae. aegypti was assessed through differential gene-expression analysis. Noticeably, An. culicifacies showed a higher abundance of differentially expressed olfactory genes (Figure 1D)" Please provide effect size and summary statistics.

      "Taken together, the data suggests that the nocturnal An. culicifacies may possess a more stringent circadian molecular rhythm in peripheral olfactory and brain tissues." What do the authors mean by "stringent"? At this point, this should be stated as a working hypothesis, as the statement is not backed up by the data. It is possible that the fewer differentially expressed genes of Aedes aegypti are more central to regulatory networks and cascade into more "stringent" rhythmic control of activities and rhythms.

      The section title: "Circadian cycle differentially and predominantly expresses olfaction-associated detoxification genes in Anopheles and Aedes" doesn't make sense. The expression of genes can be modulated by circadian rhythms, but cycles don't express genes. Please rephrase. In addition, this whole section deals with "circadian rhythms" while no experiment has been conducted under constant conditions. The observed daily variations are therefore diel rhythms until their persistence under constant conditions is established.

      "The downregulated genes of Ae. aegypti did not show any functional categories probably due to the limited transcriptional change." Could the authors explain if this is actually the phenomenon or due to a lack of temporal resolution in the study design (i.e., 4 time points)?

      "a GO-enrichment analysis was unable to track any change in the response-to-stimulus or odorant binding category of genes (including OBPs, CSPs, and olfactory receptors)." This finding doesn't corroborate the statements made previously and doesn't align with previously published studies. Is it due to pitfalls in the study design?

      "In contrast, three different clusters of OBP genes in Ae. aegypti showed a time-of-day dependent distinct peak in expression starting from ZT0-ZT12 (Figure 2F)." Please provide summary statistics.

      "In the case of An. gambiae, the amplitudes of odor-evoked responses were significantly influenced by the doses of all the odorants tested (repeated measure ANOVA, p {less than or equal to} 2e-16) (Figure S4B)." Did the authors use a positive control for the EAGs? How did the authors normalize the responses across the two species? Given the way the data is presented, how were the data normalized to allow inter-species comparisons? In addition, It is highly unlikely that all the mosquito preps used in the EAG assay responded to all the odors tested. If that was the case, then the dataset includes missing data for certain odors and time points. We believe the authors have ensured there are at least a certain number of responses per odor and time point combinations. If this is true, repeated measures ANOVA is not suited for analyzing this data because this statistical technique requires all repeated measures within and across preps without missing values. Also, the authors need to correct the summary statistics for multiple comparisons within this framework to avoid inflating type-I errors. Has this been done?

      "Ae. aegypti was found to be most sensitive to all the odorants (4-methylphenol, β-ocimine, E2-nonenal, benzaldehyde, nonanal, and 3-octanol) during ZT18-20 except sulcatone (Figure 3C - 3H)." Although some of these chemicals are associated with plants and Ae. aegypti is suspected to sugar feed at night, how do the authors explain that the peak olfactory sensitivity occurs at night for compounds such as nonanal? It would be interesting to discuss how these results compare to previous studies such as:

      Eilerts, D. F., VanderGiessen, M., Bose, E. A., Broxton, K., & Vinauger, C. (2018). Odor-specific daily rhythms in the olfactory sensitivity and behavior of Aedes aegypti mosquitoes. Insects, 9(4), 147

      "Additionally, our principal components analysis also illustrates that most loadings of relative EAG responses are higher towards the Anopheles observations (Figure S4C)." The meaning of this sentence is unclear? Please clarify.

      "Taken together these data indicate that An. gambiae may exhibit higher antennal sensitivity to at least five different odorants tested, as compared to Ae. aegypti." As mentioned above, how did the authors normalized across species to allow comparisons? If not normalized, how do you ensure that higher response magnitudes correlate with higher olfactory sensitivity, given potential differences in the morphology or size differences between the two species? Furthermore, An. gambiae has been exclusively used in the EAG assay. Besides the lack of a justification for using a species other than An. culicifacies, the authors have interpreted the EAG results under the assumption that the olfactory sensitivities of An. gambiae and An. culicifacies are comparable. This, however, is a major caveat in the experiment design, given previous studies (indicated below) have reported species-specific variations in olfactory sensitivity. In its present form, the EAG data from An. gambiae is not a piece of appropriate evidence that the authors could use to complement or substantiate the findings from other aspects of this study on An. culicifacies.

      i. Wheelwright, M., Whittle, C. R., & Riabinina, O. (2021). Olfactory systems across mosquito species. Cell and Tissue Research, 383(1), 75-90.

      ii. Wooding, M., Naudé, Y., Rohwer, E., & Bouwer, M. (2020). Controlling mosquitoes with semiochemicals: a review. Parasites & Vectors, 13, 1-20.

      iii. Gupta, A., Singh, S. S., Mittal, A. M., Singh, P., Goyal, S., Kannan, K. R., ... & Gupta, N. (2022). Mosquito Olfactory Response Ensemble enables pattern discovery by curating a behavioral and electrophysiological response database. Iscience, 25(3).

      "Similar to An. gambiae, a comparatively high amplitude response was also observed in An. stephensi (Figure S4D)." This is interesting but what would be even more relevant to the present study is to discuss how the time-dependent responses compare between the two Anopheles species.

      The paragraph titled "Daily temporal modulation of neuronal serine protease impacts mosquito's olfactory sensitivity" is confusing because the authors move on to test the effect of knocking down a serine protease gene (found to be differentially expressed throughout the day) on olfactory sensitivity. While this is interesting in and of itself, the link between the role of this gene in learning-induced plasticity, the circadian modulation of "brain functions" and olfactory sensitivity is 1) unclear and 2) not explicitly tested. We agree with the authors that what has been tested is "the effect of neuronal serine protease on circadian-dependent olfactory responses," but the two paragraphs leading to it seem to be extrapolating functional links that have yet to be determined. In this context, their conclusions that "Our finding highlights that daily temporal modulation of neuronal serine-protease may have important functions in the maintenance of brain homeostasis and olfactory odor responses." is misleading because although they used the hypothetical "may", the link between the temporal modulation of one serine protease gene and the maintenance of brain homeostasis is not explicitly tested here.

      Discussion

      The first sentence of the discussion: "In this study, we provide initial evidence that the daily rhythmic change in the olfactory sensitivity of mosquitoes is tuned with the temporal modulation of molecular factors involved in the initial biochemical process of odor detection i.e., peri-receptor events" is not true since studies from Rund and Duffield previously revealed the daily modulation of OBP gene expression. It also contradicts the next sentence: "The findings of circadian-dependent elevation of xenobiotic metabolizing enzymes in the olfactory system of both Ae. aegypti and An. culicifacies are consistent with previous literature (26, 31), and we postulate that these proteins may contribute to the regulation of odorant detection in mosquitoes."

      The use of "circadian" in the discussion of the results is also misleading as only diel rhythms were evaluated in the present study.

      "Given the potentially larger odor space in mosquitoes (like other hematophagous insects) (16, 58)." This is not really what these references show.

      "Given the potentially larger odor space in mosquitoes (like other hematophagous insects) (16, 58), it can be hypothesized that detection of any specific signal in such a noisy environment, mosquitoes may have evolved a sophisticated mechanism for rapid (i) odor mobilization and (ii) odorant clearance, to prevent anosmia (24)." One could argue that this is a requirement for all insects, regardless of the size of their olfactory repertoire.

      "Taken together, we hypothesize that circadian-dependent activation of the peri-receptor events may modulate olfactory sensitivity and are key for the onset of peak navigation time in each mosquito species." This is not entirely accurate since spontaneous locomotor activity rhythms are also observed in the absence of olfactory stimulation. While "navigation" does imply olfactory-guided behaviors, "peak navigation time" appears to be driven by other processes. See, for example, all studies testing mosquito activity rhythms in locomotor activity monitors.

      "Due to technical limitations, and considering the substantial data on the circadian-dependent molecular rhythmicity" please clarify what the technical limitations were. Is this something that prevented the authors specifically, or something tied to mosquito biology and would prevent anybody from doing it? Also, why couldn't the transcriptomic analysis be performed on An. gambiae?

      "In contrast to An. gambiae, the time-dose interactions had a higher significant impact on the antennal sensitivity of Ae. aegypti. An. gambiae showed a conserved pattern in the daily rhythm of olfactory sensitivity, peaking at ZT1-3 and ZT18-20." These two sentences are very confusing. Doesn't it simply mean that the co-variation is not linear or not the same across odors? In addition, what does it mean for a pattern to be more conserved? How can one conclude about the "conserved" nature of a pattern by looking at time-dependent variations in dose-response curves?

      "Together these data, we interpret that mosquito's olfactory sensitivity possibly does not follow a fixed temporal trait" is unclear and suggests that the authors are discussing global versus odor-specific rhythms. Please rephrase.

      "Moreover, we hypothesize that under standard insectary conditions, mosquitoes may not need to exhibit foraging flight activity either for nectar or blood, and during the time course, it may minimize their olfactory rhythm, which is obligately required for wild mosquitoes." This hypothesis is not supported by the results of the study and contradicts work by others (Rund et al., Eilerts et al., Gentile et., etc).

      The same comment applies to "Therefore, it is reasonable to think that the mosquitoes used for EAG studies may have adapted well under insectary settings and, hence carry weak olfactory rhythm." as this statement is not supported by results of the present study or comparisons of the results to previous studies based on field-caught mosquitoes. Although it is an interesting question to ask in the future, it should be stated as a future research avenue rather than a working hypothesis that results from the present study.

      "Aedes aegypti displayed a peak in antennal sensitivity at ZT18-20 to the higher concentrations of plant and vertebrate host-associated odorants tested. Given the time-of-day dependent multiple peaks (at ZT6-8 and ZT18-20 for benzaldehyde and at ZT12-14 and ZT18-20 for nonanal) in antennal sensitivity to different odorants, our data supports the previous observation of bimodal activity pattern of Ae. aegypti (50)." Rephrase by saying that results are "aligned with the previous observations of bimodal activity". Olfactory rhythms don't "support" the activity patterns because olfactory processes and spontaneous locomotor activity are independent processes.

      "our preliminary data indicate that Anopheles spp. may possess comparatively higher olfactory sensitivity to a substantial number of odorants as compared to Aedes spp." Consider removing this sentence unless the way the data has been normalized to allow for comparisons between species is clarified.

      In "A significant decrease in odorant sensitivity for all the volatile odors tested in the CYP450-silenced Ae. aegypti," please change "silenced" to "reduced" because RNAi doesn't silence (i.e. knockout) gene expression.

      The title "Neuronal serine protease consolidates brain function and olfactory detection" is extremely misleading. Do the authors refer to memory consolidation, which has not been tested here? What is brain function consolidation??

      The reference used in "Despite their tiny brain size, mosquitoes, like other insects, have an incredible power to process and memorize circadian-guided olfactory information (7)." is not appropriate. Also, "circadian-guided" is unclear. Consider replacing it with "circadian-gated".

      What is the "the homeostatic process of the brain"?

      "the temporal oscillation of the sleep-wake cycle of any organism is managed by the encoding of experience during wake, and consolidation of synaptic change during inactive (sleep) phases, respectively (70)." By experience, do the authors refer to learning? This seems out of topic as this process has not been evaluated here.

      "We speculate that after the commencement of the active phase (ZT6-ZT12), the serine peptidase family of proteins in the brain of Ae. aegypti mosquitoes may play an important function in consolidating brain actions (after ZT12) and aid circadian-dependent memory formation." The value of this statement is unclear. Circadian-dependent memory formation is not being evaluated here, and the results from the present study do not directly support this speculation, also because other processes involved in memory formation are not evaluated here. This seems at odds with the literature on learning and memory.

      "Subsequent work on electrophysiological and neuro-imaging studies are needed to demonstrate the role of neuronal-serine proteases in the reorganization of perisynaptic structure." Sure. But the link between "the role of neuronal-serine proteases in the reorganization of perisynaptic structure" and rhythms in olfactory sensitivity is unclear.

      As a general comment, EAGs seem inappropriate to evaluate the effect of the central-brain processing in the regulation of peripheral olfactory processes. This is a critical comment that needs to be considered by the authors and clarified in the manuscript. If rhythms of central brain processes are important for olfactory-guided behaviors, these should be evaluated at the level of the central brain or via behavioral metrics. The effect of the RNAi knockdowns on peripheral sensitivity is interesting, but its link with central processes is unclear and doesn't support the speculations made by the authors about learning and memory.

      Methods

      No explanations are provided for how the EAG data are normalized to allow comparisons between species.

      Figures

      Figure 1: The daily rhythm depicted in A, are not representative of the actual profiles. See: Benoit, J. B., & Vinauger, C. (2022). Chapter 32: Chronobiology of blood-feeding arthropods: influences on their role as disease vectors. In Sensory ecology of disease vectors (pp. 815-849). Wageningen Academic Publishers. Or any other paper on mosquito activity rhythms.

      Figure 3 and 4: The EAG results are plotted twice. This is redundant and misleading as it makes the reader think there is more data than actually presented.

      Figure 5: Please clarify the sample size for each panel. In C - F, what would be used as a reference? In other words, what is a Relative EAG Response of 1? And if it is "relative", are the units really mV? In E and F, it would be great to show how the Ethanol control compares to the no solvent condition. This could be placed in supplementary materials.

      Figures 5 and 6, given the dispersion in the EAG data, the treatments where N=40 appear robust, but the interpretation of results from treatments where N=6 may be limited due to the low sample size. This limitation is visible in Figure 5F, for example, where ABT-Aceto is different from Cont-Aceta but not PBO-Aceto because one individual shows a higher response.

      Figure S6: how does this support that synaptic plasticity is influenced by "Time-of-day dependent modulation of serine protease genes in the brain"?

      Minor comments

      What do the authors mean by "consolidation of brain functions"? Memory consolidation? Please clarify.

      In "Similar to previous studies (26), the expression of a limited number of rhythmic genes was visualized in Ae. aegypti" please replace "visualized" with "observed".

      Figure 2A, please clarify in the caption what FDR stands for.

      In "To further establish this proof-of-concept in An. gambiae, three potent CYP450 inhibitors, aminobenzotriazole(52), piperonyl butoxide(53), and schinandrin A (54), was applied topically on the head capsule of 5-6-day-old female mosquitoes" replace "was applied" with "were applied".

      "Interestingly, our species-time interaction studies revealed that An. gambiae exhibits time-of-day dependent significantly high antennal sensitivity to at least four chemical odorants compared to Ae. aegypti, except phenol." is unclear. Please reword.

      In "Similar observations were also noticed with An. stephensi." replace "noticed" with "made".

      Significance

      Such a study has the potential to be valuable for the field, but its value and significance are hindered by an accumulation of overstatements, the fact that prior work in the field has been minimized or omitted, and a lack of support for the stated conclusions.

      In this context, the advances are only slightly incremental compared to the work produced by Rund et al., and the mechanistic hypotheses emitted to link the genes selected for knockdown experiments and olfactory sensitivity are not clearly supported by the evidence presented here. The main strength of the paper is to show the role of CYP450 in olfactory sensitivity.

      The audience is fairly broad and includes insect neuro-ethologists, molecular biologists, and chronobiologists.

      Our field of expertise:

      • Mosquito chemosensation
      • Learning and memory
      • Chronobiology
      • Electrophysiology
      • Medical entomology
    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their constructive comments. The following is our point-to-point responses.

      Reviewer #1 (Recommendations For The Authors):

      Point 1- Abstract: advanced morning peak « opposite » to pdf/pdfr mutants. To my knowledge, the alteration of PDF/PDFR suppresses the morning peak. I am not sure that an advance of the peak is « opposite » to its inhibition?

      Mutants with disruptions in CNMa or CNMaR display advanced morning activity, indicating an enhanced state. Mutants with disruptions in Pdf or Pdfr exhibit no morning anticipation, suggesting a promoting role of these genes in morning anticipation. Therefore, our revised version is: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-51)

      Point 2- Fig 1K-L: the authors should show the sleep phenotype of the homozygous nAChRbeta2 mutant (if not lethal) for a direct comparison with the FRT/FLP genotype and thus evaluate the efficiency of the system.

      We have incorporated sleep profiles of nAChRbeta2 mutant and W1118 into Fig 1K-L. nAChRbeta2 mutants (red) exhibited a sleep amount comparable to that of pan-neural nAChRbeta2 knockout flies (dark red), as shown below.

      Author response image 1.

      Point 3- Dh31-EGFP-FRT expression patterns look different in figS1 A (or fig1 H) and J. why that?

      We re-examined the original data. Both (with R57C10-GAL4 for Fig. S1A, right, S1J, left) are Dh31EGFP.FRT samples displayed below which demonstrated consistent primary expression subsets. Any observed disparities in region "e" could potentially be attributed to variations during dissection.

      Author response image 2.

      Point 4- The knockdown experiments with the elav-switch (RU486) system (fig S2) do not seem to be as efficient as the HS-FLP system (fig 1H-J). The conclusions on the efficiency should be toned down.

      We have revised accordingly: "Near Complete Disruption of Target Genes by GFPi and Flp-out Based cCCTomics" (Line 130): "Knocking out at the adult stage using either hsFLP driven Flp-out (Golic and Lindquist, 1989) (Fig. 1H-1J) or neural (elav-Switch) driven shRNAGFP (Nicholson et al., 2008; Osterwalder et al., 2001) (Fig. S2A-S2I), also resulted in the elimination of most, though not all, GFP signals." (Line 145-149)

      Point 5- Fig 2H-J: the LD behavioral phenotype of pdfr pan-neuronal cripsr does not seem to correspond to what is described in the literature for the pdfr mutant (han), see hyun et al 2005 (no morning anticipation and advanced evening peak). I understand that the activity index is lower than controls but fig2H shows a large anticipatory activity that seems really unusual, and no advanced evening peak is observed. I think that the authors should show the CRISPR flies and pdfr mutants together, to better compare the phenotypes.

      Thank you for pointing out that the phenotypes of pan-neuronal knockout of PDFR by unmodified Cas9 (Fig. 2H-2I of the previous version) whose morning anticipation still exist (Fig, 2H of the previous manuscript), although the significant decrease of morning anticipation index (Fig 2I of the previous manuscript) and advanced evening activity are not as pronounced as observed in han5304 (Fig. 3C in Hyun et al., 2005).

      First, we have separated the activity plots of Fig. 2H of previous manuscript, as shown below. The activity from ZT18 to ZT24 shows a tendency of decreasing from ZT18 to ZT21 and a tendency of increasing from ZT21 to ZT24. The lowest activity before dawn during ZT18 to ZT24 shows at about ZT21, and the activity at ZT18 is comparable to the activity at ZT24. This is significantly different compared to the two control groups, whose activity tends to increase activity from ZT18 to ZT24 with an activity peak at ZT24.

      The activity from ZT6 to ZT12 increased much faster in Pdfr knockout flies and get to an activity plateau at about ZT11 compared to two control groups with a slower activity increasing from ZT6 to ZT12 with no activity plateau but an activity peak at ZT12.

      Author response image 3.

      Second, we have incorporated the phenotype of Pdfr mutants we previously generated (Pdfr-attpKO Deng et al., 2019) with Pdfr pan-neuronal knockout by Cas9.HC. This mutant lacks all seven transmembrane regions of Pdfr (a). The phenotypes are very similar between Pdfr-attpKO flies and Pdfr pan-neuronal knockout flies. In this experimental repeat, we found that a much more obvious advanced evening activity peak is observed both in pan-neuronal knockout flies and Pdfr-attpKO flies.

      To further analyze the phenotypes of Pdfr pan-neuronal knockout flies by Cas9.HC, we referred to the literature. The activity pattern at ZT18 to ZT24 (activity tends to decrease from ZT18 to ZT21 and tends to increase from ZT21 to ZT24, with the lowest activity before dawn occurring at about ZT21, and activity at ZT18 comparable to activity at ZT24) is also reported in Pdfr knockout flies such as Fig3C and 3H in Hyun et al., 2005, Fig 2B in Lear et al., 2009, Fig 3B in Zhang et al., 2010, Fig .5A in Guo et al., 2014, and Fig 5B in Goda et al., 2019. Additionally, the less pronounced advanced evening activity peak compared to han5304 (Fig. 3C in Hyun et al., 2005) is also reported in Fig. 2B in Lear et al., 2009, Fig. 3B in Zhang et al., 2010, and Fig. 5B in Goda et al., 2019. We consider that this difference is more likely to be caused by environmental conditions or recording strategies (DAM system vs. video tracing).

      Therefore, we revised the text to: “Pan-neuronal knockout of Pdfr resulted in a tendency towards advanced evening activity and weaker morning anticipation compared to control flies (Fig. 2H-2I), which is similar to Pdfr-attpKO flies. These phenotypes were not as pronounced as those reported previously, when han5304 mutants exhibited a more obvious advanced evening peak and no morning anticipation (Hyun et al., 2005)”.

      Author response image 4.

      Point 6-The authors should provide more information about the DD behavior (power is low, but how about the period of rhythmic flies, which is shortened in pdf (renn et al) and pdfr (hyun et al) mutants).

      We have incorporated period data into Fig. 2I. Indeed, conditional knock out of Pdfr by Cas9.HC driven by R57C10-GAL4 shortens the period length, as shown below (previous data), also in Fig. 2I of the revised version.

      In the revised Fig. 2I, we tested 45 Pdfr-attpKO flies during DD condition (3 out of 48 flies died during video tracing in DD condition), and only one fly was rhythmic. In contrast, 9 out of 48 Pdfr pan-neuronal knockout flies were rhythmic.

      Author response image 5.

      Point 7- P15 and fig6. The authors indicate that type II CNMa neurons do not show advanced morning activity as type I do, but Figs 6 I and K seem to show some advance although less important than type I. I am not sure that this supports the claim that type I is the main subset for the control of morning activity. This should be toned down.

      We have re-organized Fig. 6 and revised the summary of these results as: “However, Type II neurons-specific CNMa knockout (CNMa ∩ GMR91F02) showed weaker advanced morning activity without advanced morning peak (Fig. 6N), while Type I neurons-specific CNMa knockout did (Fig. 6J), indicating a possibility that these two type I CNMa neurons constitute the main functional subset regulating the morning anticipation activity of fruit fly”. (Line 400-405)

      Point 8- Figs 6M and N: is power determined from DD data? if yes, how about the period and arrhythmicity? Please also provide the LD activity profiles for the mutants and rescued pdfr genotypes.

      Yes, the power was determined from the DD data. In the new version of the manuscript, we have included the activity plots for the LD phase in supplementary Fig S13, as well as shown below (A, B), and the period and arrhythmicity data for the DD phase in Fig. 6S and Table S7. We have also refined the related description as follows: “Moreover, knocking out Pdfr by GMR51H05, GMR79A11 and CNMa GAL4, which cover type I CNMa neurons, decreased morning anticipation of flies (Fig. 6T, Fig. S13B). However, the decrease in morning anticipation observed in the Pdfr knockout by CNMa-GAL4 was not as pronounced as with the other two drivers. Because the presumptive main subset of functional CNMa is also PDFR-positive, there is a possibility that CNMa secretion is regulated by PDF/PDFR signal”. (Line 413-419)

      Author response image 6.

      Point 9- Fig 7: does CNMaR affect DD behavior? This should be tested.

      We analyzed the CNMaR-/- activity in the dark-dark condition over a span of six days. Results revealed a higher power in CNMaR mutants compared to control flies (Power: 93.5±41.9 (CNMaR-/-, n=48) vs 47.3±31.6 (w1118, n=47); Period: 23.7±0.3 h (CNMaR-/-, n=46) vs 23.7±0.3 h (w1118, n=47); arrhythmic rate 2/48 (CNMaR-/-) vs 0/47 (w1118)). Considering that mutating CNMa had no obvious effect on DD behavior, even if CNMaR affects DD behavior, it cannot be attributed to CNMa signal, we did not further repeat and analyze DD behavior of CNMaR mutant. We believe this raises another question beyond the scope of our current discussion.

      Reviewer #2 (Recommendations For The Authors):

      Point 1-One major concern is the apparent discrepancies in clock network gene expression using the Flp-Out and split-LexA approaches compared to what is known about the expression of several transmitter and peptide-related genes. For example, it is well established that the 5th-sLNv expresses CHAT (along with a single LNd), yet there appears to be no choline acetyltransferase (ChAT) signal in the 5th-sLNv as assayed by the Split-LexA approach (Fig. 4). This approach also suggests that DH31 is expressed in the s-LNvs, which, as one of the most intensely studied clock neuron are known to express PDF and sNPF, but not DH31. The results also suggest that the sLNvs express ChAT, which they do not. Remarkably PDF is not included in the expression analysis, this peptide is well known to be expressed in only two subgroups of clock neurons, and would therefore be an excellent test case for the expression analysis in Fig. 4. PDF should therefore be added to analysis shown in Fig. 4. Another discrepancy is PdfR, which split LexA suggests is expressed in the Large LNvs but not the small LNvs, the opposite of what has been shown using both reporter expression and physiology. The authors do acknowledge that discrepancies exist between their data and previous work on expression within the clock network (lines 237 and 238). However, the extent of these discrepancies is not made clear and calls into question the accuracy of Flp-Out and Split LexA approaches.

      The concerns mentioned above are:

      (1) sLNvs express PDF and sNPF but not Dh31;

      (2) ChAT presents in 5th-sLNv and one LNd but not in other sLNvs;

      (3) PDFR presents in sLNvs but not l-LNvs.

      (4) PDF is not included in the analysis.

      To verify the accuracy of these intersection analyses, all related to PDF positive neurons (except 5th-sLNv and LNds), we stained PDF and examined the co-localization between PDF-positive LNvs and the respective drivers ChAT-KI-LexA, Pdfr-KI -LexA, Dh31-KI -LexA, and Pdf-KI -LexA.

      First, Dh31-KI-LexA labeled four s-LNvs, as shown below (also in Fig. S9A). Therefore, the results of the intersection analysis of Dh31-KI-LexA with Clk856-GAL4 are correct. The difference in the results compared to previous literature is attributed to Dh31-KI-LexA labels different neurons than the previous driver or antibody.

      Second, no s-LNv was labeled by ChAT-KI -LexA as shown below. We rechecked our intersection data and found that we analyzed 10 brains of ChAT-KI-LexA∩Clk856-GAL4 while only two brains showed sLNvs positively. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).

      Third, one l-LNv and at least two s-LNvs were labeled by Pdfr-KI-LexA, as shown below (also in Fig. S9B). Fourth, Pdf-KI-LexA labels all PDF-positive neurons, but the intersection analysis by Pdf-KI-LexA and Clk856-GAL4 only showed scattered signals, as shown below (D, also in Fig. S9C). For these cases, we found some positive signals expected but not observed in our dissection. The possible reason could be the inefficiency of LexAop-FRT-myr::GFP driven by LexA. Therefore, our intersection results must miss some positive signals.

      Author response image 7.

      Finally, we revised the text to (Line 286-317):

      To assess the accuracy of expression profiles using CCT drivers, we compared our dissection results with previous reports. Initially, we confirmed the expression of CCHa1 in two DN1s (Fujiwara et al., 2018), sNFP in four s-LNvs and two LNds(Johard et al., 2009), and Trissin in two LNds (Ma et al., 2021), aligning with previous findings. Additionally, we identified the expression of nAChRα1, nAChRα2, nAChRβ2, GABA-B-R2, CCHa1-R, and Dh31-R in all or subsets of LNvs, consistent with suggestions from studies using ligands or agonists in LNvs (Duhart et al., 2020; Fujiwara et al., 2018; Lelito and Shafer, 2012; Shafer et al., 2008) (Table S4).

      Regarding previously reported Nplp1 in two DN1as (Shafer et al., 2006), we found approximately five DN1s positive for Nplp-KI-LexA, indicating a broader expression than previously reported. A similar pattern emerged in our analysis of Dh31-KI-LexA, where four DN1s, four s-LNvs, and two LNds were identified, contrasting with the two DN1s found in immunocytochemical analysis (Goda et al., 2016). Colocalization analysis of Dh31-KI-LexA and anti-PDF revealed labeling of all PDF-positive s-LNvs but not l-LNvs (Fig S9A), suggesting that the differences may arise from the broader labeling of 3' end knock-in LexA drivers or the amplitude effect of the binary expression system. The low protein levels might go undetected in immunocytochemical analysis. This aligns with transcriptome analysis findings showing Nplp1 positive in DN1as, a cluster of CNMa-positive DN1ps, and a cluster of DN3s (Ma et al., 2021), which is more consistent with our dissection.

      Despite the well-known expression of PDF in LNvs and PDFR in s-LNvs (Renn et al., 1999; Shafer et al., 2008), we did not observe stable positive signals for both in Flp-out intersection experiments, although both Pdf-KI-LexA and Pdfr-KI-LexA label LNvs as expected (Fig S9B-S9C). We also noted fewer positive neurons in certain clock neuron subsets compared to previous reports, such as NPF in three LNds and some LNvs (Erion et al., 2016; He et al., 2013; Hermann et al., 2012; Johard et al., 2009; Lee et al., 2006) and ChAT in four LNds and the 5th s-LNv (Johard et al., 2009; Duhart et al., 2020) (Table S4). We attribute this limitation to the inefficiency of LexAop-FRT-myr::GFP driven by LexA, acknowledging that our intersection results may miss some positive signals.

      Point 2-Related to this, the authors rather inaccurately suggest that the field's understanding of PdfR expression within the clock neuron network is "inconsistent" and "variable" (lines 368-377). This is not accurate. It is true that the first attempts to map PdfR expression with antisera and GAL4s were inaccurate. However, subsequent work by several groups has produced strong convergent evidence that with the exception of the l-LNvs after several days post-eclosion, PdfR is expressed in the Cryptochrome expressing a subset of the clock neuron network. This section of the study should be revised.

      We thank the reviewer for pointing this out. As we have already addressed and revised the related part in the RESULTS section (Line 308-317), we have now removed this part from the DISCUSSION section of the revised version.

      Point 3-One minor issue that would avoid unnecessary confusion by readers familiar with the circadian literature is the say that activity profiles are plotted in the study. The authors have centered their averaged activity profiles on the 12h of darkness. This is the opposite of the practice of the field, and it leads to some initial confusion in the examination of the morning and evening peak data. The authors may wish to avoid this by centering their activity plots on the 12h light phase, which would put the morning peak on the left and the evening peak on the right. This is the way the field is accustomed to examining locomotor activity profiles.

      The centering of averaged activity profiles on the 12 h of darkness is done to highlight the phenotype of advanced morning activity. To prevent any confusion among readers, we have included a sentence in the figure legend explaining the difference in our activity profiles compared to previous literatures: "Activity profiles were centered of the 12 h darkness in all figures with evening activity on the left and morning activity on the right, which is different from general circadian literatures. (Fig. 2H legend)" (Line 957-959))

      Point 4-The authors conclude that the loss of PDF and CNMa have opposite effects on the morning peak of locomotor activity (line 392). But they also acknowledge, briefly, that things are not that simple: loss of CNMa causes a phase advance, but loss of PDF causes a loss or reduction in the anticipatory peak. It is still significant to find a peptide transmitter with the clock neuron network that regulates morning activity, but the authors should revise their conclusion regarding the opposing actions of PDF and CNMa, which is not well supported by the data.

      We have revised the relevant parts.

      ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)

      DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)

      Point 5-The authors should acknowledge, cite, and incorporate the substantive discussion of CNMa peptide and the DN1p neuronal class in Reinhard et al. 2022 (Front Physiol. 13: 886432).

      We have revised the text accordingly and cited this paper: “Type I with two neurons whose branches projecting to the anterior region, as in CNMa∩GMR51H05, CNMa∩Pdfr, and CNMa∩GMR79A11 (Fig. 6E, 5G, 6H), and type II with four neurons branching on the posterior side with few projections to the anterior region, as in CNMa∩GMR91F02 (Fig. 6F). These two types of DN1ps’ subsets were also reported and profound discussed previously (Lamaze et al., 2018; Reinhard et al., 2022)”. (Line 393-397)

      Reviewer #3 (Recommendations For The Authors):

      Point 1-Throughout the manuscript figure legends (axis, genotypes, etc) are too small to be appreciated. Fig. 1. Panel A. The labels are very difficult to read.

      We have attempted to enlarge the font as much as possible in the revised version.

      Point 2-Fig. 1. H-J Why is efficiency not mentioned in all the examples?

      In the revised manuscript, the results of Fig 1H-1J are discussed in the revised version (Line 145-147). The reason that we did not calculate the exact efficiency is that the GFP intensity is not stable enough which might change during dissection, mounting or intensity of laser in our experimental process. Therefore, in all results related to GFP signal (Fig. 1B-1J, Fig. S1, Fig. S2, Fig. 2B-2D), we relied on qualitative judgment rather than quantitative judgment, unless the GFP signal was easily quantifiable (such as in cases with limited cells or no GFP signal in the experimental group).

      Point 3-Fig. 1. Panel L, left (light phase): the statistical comparisons are not clearly indicated (the same happens in Figs 3Q and 3R).

      We have now re-arranged Fig. 1L and Fig. 3Q-3R to make the statistical comparisons clear in the new version.

      Point 4-Line 792. Could induced be introduced?

      Yes, we have now corrected this typo.

      Point 5-Fig. S1. Check labels for consistency. GMR57C10 Gal4 driver is most likely R57C10.

      We have now revised the labels (Fig. S1).

      Point 6-Fig. S2. If the experiments were repeated and several brains were observed, the authors should include the efficiency and the number of flies as reported in Fig. S1.

      We have now added the number of flies in Fig. S2 as reported in Fig. S1. As Response to Point 2 mentioned, due to the instability of the GFP signal, we are unable to provide a quantitative efficiency in this context.

      Point 7-Fig S4. The fig legend describes panels I-J which are not shown in the current version of the manuscript.

      We now have deleted them.

      Point 8-Fig 2I. Surprising values for morning anticipation indexes even for controls (0.5 would indicate ¨no anticipation¨; in controls, the expected values would be >>0.5, as most of the activity is concentrated right before the transition. Could the authors explain this unexpected result?

      We have revised the description of the calculation in the methods section (Line 612). After calculating the ratio of the last three hours of activity to the total six hours of activity, the results were further subtracted by 0.5. Therefore, the index should be ≤0.5. When the index is equal to 0, it indicates no morning anticipation.

      Point 9-Fig 2K/L. The authors mention that not all genes are effectively knocked out with their strategy. Could this be accounted for the specific KD strategy, its duration, or the promotor strength? It is surprising no explanation is provided in the text (page 9 line 179).

      In our pursuit of establishing a broadly effective method for gene editing, Fig. 2H-2L and Fig. 2D revealed that previous attempts have fallen short of achieving this objective. The observed inefficiency may be attributed to the intensity of the promoter, resulting in inadequate expression. Alternatively, the insufficient duration of the operation may also contribute to the lack of success. However, in the context of sleep and rhythm research applications, the age of the fruit fly tests is typically fixed, limiting the potential to enhance efficiency by extending the manipulation time. Moreover, increasing the expression level may pose challenges related to cytotoxicity, as reported in previous studies (Port et al., 2014). We refrain from offering specific explanations, as we lack a definitive plan and cannot provide additional robust evidence to support the above speculations. Consequently, in our ongoing efforts, we aim to enhance the efficiency of the tool system while operating within the current constraints.

      Point 10-Page 9, line 179. Can the authors include a brief description of the reason for the different modifications? Only one was referenced.

      We have revised related part in the manuscript (Line 223-231):

      Cas9.M9: We fused a chromatin-modulating peptide (Ding et al., 2019), HMGN1 183 (High mobility group nucleosome binding domain 1), at the N-terminus of Cas9 and HMGB1 184 (High mobility group protein B1) at its C-terminus with GGSGP linker, termed Cas9.M9.

      Cas9.M6: We also obtained a modified Cas9.M6 with HMGN1 at the N-terminus and an undefined peptide (UDP) at the C-terminus. (NOTE:UDP was gained by accident)

      Cas9.M0: We replaced the STARD linker between Cas9 and NLS in Cas9.HC with GGSGP the linker (Zhao et al., 2016), termed Cas9.M0

      Point 11-The authors tested the impact of KO nAChR2 across the different versions of conditional disruption (Fig 1K-L, Fig 2L, Fig 3R). It is surprising they observe a difference in daytime sleep upon knocking down with Cas9.HC (2L) but not with Cas9.M9 (3R) and the reverse is seen for night-time sleep. Could the authors provide an explanation? Efficiency is not the issue at stake, is it?

      In Fig. 2K, the day sleep of flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; UAS-Cas9/+) was significantly decreased compared to flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; +/+), but not when compared to flies (R57C10-GAL4/+; UAS-Cas9/+). Our criterion for asserting a difference is that the experimental group must show a significant distinction from both control groups. Therefore, we concluded that there was no significant difference between the experimental group and the control groups in Fig. 2K.

      Point 12-Fig. 4. Which of the two strategies described in A-B was employed to assemble the expression profile of CCT genes in clock neurons shown in C? This information should be part of the fig legend.

      We have now revised the legend as follows: “(A-B) Schematic of intersection strategies used in Clk856 labelled clock neurons dissection, Flp-out strategy (A) and split-LexA strategy (B). The exact strategy used for each gene is annotated in Table S5.”

      Point 13-Similarly, how many brains were analyzed to give rise to the table shown in C?

      We have now revised the legend of Table S4 to address this concern. As indicated in: “The largest N# for each gene in Table S4 is the brain number analyzed for each gene”.

      Point 14-Finally, the sentence ¨The figure is...¨ requires revision.

      We have now revised it: “The exact cell number for each subset is annotated in Table S4”.

      Point 15-Legend to Table S3. The authors have done an incredible job testing many gRNAs for each gene potentially relevant for communication. However, there is very little information to make the most out of it; for instance, the legend does not inform why many of the targeted genes do not appear to have been tested any further. It would be useful to the reader to discern whether despite being the 3 most efficient gRNAs, they were still not effective in targeting the gene of interest, or whether they showed off-targets, or it was simply a matter of testing the educated guesses. This information would be invaluable for the reader.

      First, we designed and generated transgenic UAS-sgRNA fly lines for all these sgRNAs. We randomly selected 14 receptor genes, known for their difficulty in editing based on our experience, to assess the efficiency of our strategy, as depicted in Fig. 3M-3P, Fig. S5, and Fig. S6. We believe these results are representative and indicative of the efficiency of sgRNAs designed using our process and applied with the modified Cas9.

      Secondly, we acknowledge your valid concern. While we selected sgRNAs with no predicted off-target effects through various prediction models (outlined in the Methods under C-cCCTomics sgRNA design), we did not conduct whole-genome sequencing. Consequently, we can only assert that the off-target possibility is relatively low. To address potential misleading effects arising from off-target concerns, it is essential to validate these results through mutants, RNAi, or alternative UAS-sgRNAs targeting the same gene.

      Point 16-Table S4. Some of the data presented derives from observations made in 1-2 brains for a specific cluster; isn´t it too little to base a decision on whether a certain gene is (or not) expressed? It is surprising since the same CCT line was observed/analysed in more brains for other clusters. Can the authors explain the rationale?

      The N# number represents the GFP positive number, and we have revised the legend of Table S4. The largest N# number denotes the total number of brains analyzed for a specific CCT line. It's possible that, due to variations in our dissection or mounting process, some clusters were only observed in 1-2 brains out of the total brains analyzed. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).

      Point 17-The paragraph describing this data in the results section needs revising (lines 233-243).

      We have now revised this. (Line 286-317)

      Point 18-While it is customary for authors to attempt to improve the description of the activity patterns by introducing new parameters (i.e. MAPI and EAPI, lines 253-258) it would be interesting to understand the difference between the proposed method and the one already in use (which compares the same parameter, i.e., the slope (defined as ¨the slope of the best-fitting linear regression line over a period of 6 h prior to the transition¨, i.e., Lamaze et al. 2020 and many others). Is there a need to introduce yet another one?

      This approach is necessary. The slope defined by Lamaze et al. utilizes data from only 2 time points, which may not accurately capture the pattern within a period before light on or off. Linear regression is not well-suited for a single fly due to the high variability in activity at each time point, making it challenging to fit the model at the individual level. The parameters we have introduced (MAPI and EAPI) in this paper are concise and can be applied at the individual level, effectively reflecting the morning or evening anticipation characteristics of each fly.

      As an alternative, the activity plot of a certain fly line could be represented by an average of all flies' activity in one experiment. This would make linear regression easier to fit. However, several independent experiments are required for statistical robustness, necessitating the inclusion of hundreds of flies for each strain in a single analysis.

      Point 19-In general, the legends of supplementary figures are a bit too brief. S7 and S8: it is not clear which of the two intersectional strategies were used (it would benefit whoever is interested in replicating the experiments). Legend to Fig S8 should read ¨similar to Fig S7¨.

      We have now revised the legend and included “The exact strategy used for each gene is annotated in Table S5” in the legend.

      Point 20-The legend in Table S6 should clearly state the genotypes examined. What does the marking in bold refer to?

      We have now revised annotation of Table S6. Marking in bold refer to results out of one SD compared to control group.

      Point 21-Line 314. The sentence needs revision.

      We have revised these sentences.

      Point 22-Line 391 (and also in the results section). The authors attempt to describe the CNMa phenotype as the opposite of pdf/pdfr mutant phenotypes. However, no morning anticipation/advanced morning anticipation are not necessarily opposite phenotypes.

      We have revised related description.

      ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)

      DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)

      Reference

      Deng, B., Li, Q., Liu, X., Cao, Y., Li, B., Qian, Y., Xu, R., Mao, R., Zhou, E., Zhang, W., et al. (2019). Chemoconnectomics: mapping chemical transmission in Drosophila. Neuron 101, 876-893.e874.

      Ding, X., Seebeck, T., Feng, Y., Jiang, Y., Davis, G.D., and Chen, F. (2019). Improving CRISPR-Cas9 genome editing efficiency by fusion with chromatin-modulating peptides. Crispr j 2, 51-63.

      Duhart, J.M., Herrero, A., de la Cruz, G., Ispizua, J.I., Pírez, N., and Ceriani, M.F. (2020). Circadian Structural Plasticity Drives Remodeling of E Cell Output. Curr Biol 30, 5040-5048.e5045.

      Erion, R., King, A.N., Wu, G., Hogenesch, J.B., and Sehgal, A. (2016). Neural clocks and Neuropeptide F/Y regulate circadian gene expression in a peripheral metabolic tissue. eLife 5, e13552.

      Fujiwara, Y., Hermann-Luibl, C., Katsura, M., Sekiguchi, M., Ida, T., Helfrich-Förster, C., and Yoshii, T. (2018). The CCHamide1 neuropeptide expressed in the anterior dorsal neuron 1 conveys a circadian signal to the ventral lateral neurons in Drosophila melanogaster. Front Physiol 9, 1276.

      Goda, T., Tang, X., Umezaki, Y., Chu, M.L., Kunst, M., Nitabach, M.N.N., and Hamada, F.N. (2016). Drosophila DH31 neuropeptide and PDF receptor regulate night-onset temperature preference. J Neurosci 36, 11739-11754.

      Goda, T., Umezaki, Y., Alwattari, F., Seo, H.W., and Hamada, F.N. (2019). Neuropeptides PDF and DH31 hierarchically regulate free-running rhythmicity in Drosophila circadian locomotor activity. Sci Rep 9, 838.

      Guo, F., Cerullo, I., Chen, X., and Rosbash, M. (2014). PDF neuron firing phase-shifts key circadian activity neurons in Drosophila. Elife 3.

      He, C., Cong, X., Zhang, R., Wu, D., An, C., and Zhao, Z. (2013). Regulation of circadian locomotor rhythm by neuropeptide Y-like system in Drosophila melanogaster. Insect Mol Biol 22, 376-388.

      Hermann, C., Yoshii, T., Dusik, V., and Helfrich-Förster, C. (2012). Neuropeptide F immunoreactive clock neurons modify evening locomotor activity and free-running period in Drosophila melanogaster. J Comp Neurol 520, 970-987.

      Hyun, S., Lee, Y., Hong, S.T., Bang, S., Paik, D., Kang, J., Shin, J., Lee, J., Jeon, K., Hwang, S., et al. (2005). Drosophila GPCR Han is a receptor for the circadian clock neuropeptide PDF. Neuron 48, 267-278.

      Johard, H.A., Yoishii, T., Dircksen, H., Cusumano, P., Rouyer, F., Helfrich-Förster, C., and Nässel, D.R. (2009). Peptidergic clock neurons in Drosophila: ion transport peptide and short neuropeptide F in subsets of dorsal and ventral lateral neurons. J Comp Neurol 516, 59-73.

      Lamaze, A., Krätschmer, P., Chen, K.F., Lowe, S., and Jepson, J.E.C. (2018). A Wake-Promoting Circadian Output Circuit in Drosophila. Curr Biol 28, 3098-3105.e3093.

      Lear, B.C., Zhang, L., and Allada, R. (2009). The neuropeptide PDF acts directly on evening pacemaker neurons to regulate multiple features of circadian behavior. PLoS Biol 7, e1000154.

      Lee, G., Bahn, J.H., and Park, J.H. (2006). Sex- and clock-controlled expression of the neuropeptide F gene in Drosophila. 103, 12580-12585.

      Lelito, K.R., and Shafer, O.T. (2012). Reciprocal cholinergic and GABAergic modulation of the small ventrolateral pacemaker neurons of Drosophila's circadian clock neuron network. J Neurophysiol 107, 2096-2108.

      Ma, D., Przybylski, D., Abruzzi, K.C., Schlichting, M., Li, Q., Long, X., and Rosbash, M. (2021). A transcriptomic taxonomy of Drosophila circadian neurons around the clock. Elife 10.

      Port, F., Chen, H.M., Lee, T., and Bullock, S.L. (2014). Optimized CRISPR/Cas tools for efficient germline and somatic genome engineering in Drosophila. Proc Natl Acad Sci USA 111, E2967-2976.

      Reinhard, N., Schubert, F.K., Bertolini, E., Hagedorn, N., Manoli, G., Sekiguchi, M., Yoshii, T., Rieger, D., and Helfrich-Förster, C. (2022). The Neuronal Circuit of the Dorsal Circadian Clock Neurons in Drosophila melanogaster. Front Physiol 13, 886432.

      Renn, S.C., Park, J.H., Rosbash, M., Hall, J.C., and Taghert, P.H. (1999). A pdf neuropeptide gene mutation and ablation of PDF neurons each cause severe abnormalities of behavioral circadian rhythms in Drosophila. Cell 99, 791-802.

      Shafer, O.T., Helfrich-Förster, C., Renn, S.C., and Taghert, P.H. (2006). Reevaluation of Drosophila melanogaster's neuronal circadian pacemakers reveals new neuronal classes. J Comp Neurol 498, 180-193.

      Shafer, O.T., Kim, D.J., Dunbar-Yaffe, R., Nikolaev, V.O., Lohse, M.J., and Taghert, P.H. (2008). Widespread receptivity to neuropeptide PDF throughout the neuronal circadian clock network of Drosophila revealed by real-time cyclic AMP imaging. Neuron 58, 223-237.

      Zhang, L., Chung, B.Y., Lear, B.C., Kilman, V.L., Liu, Y., Mahesh, G., Meissner, R.A., Hardin, P.E., and Allada, R. (2010). DN1(p) circadian neurons coordinate acute light and PDF inputs to produce robust daily behavior in Drosophila. Curr Biol 20, 591-599.

      Zhao, P., Zhang, Z., Lv, X., Zhao, X., Suehiro, Y., Jiang, Y., Wang, X., Mitani, S., Gong, H., and Xue, D. (2016). One-step homozygosity in precise gene editing by an improved CRISPR/Cas9 system. Cell Res 26, 633-636.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study provides an unprecedented understanding of the roles of different combinations of NaV channel isoforms in nociceptors' excitability, with relevance for the design of better strategies targeting NaV channels to treat pain. Although the experimental combination of electrophysiological, modeling, imaging, molecular biology, and behavioral data is convincing and supports the major claims of the work, some conclusions need to be strengthened by further evidence or discussion. The work may be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

      Reviewer #1 (Public Review):

      Summary:

      In this work, Xie, Prescott, and colleagues have reevaluated the role of Nav1.7 in nociceptive sensory neuron excitability. They find that nociceptors can make use of different sodium channel subtypes to reach equivalent excitability. The existence of this degeneracy is critical to understanding neuronal physiology under normal and pathological conditions and could explain why Nav subtype-selective drugs have failed in clinical trials. More concretely, nociceptor repetitive spiking relies on Nav1.8 at DIV0 (and probably under normal conditions in vivo), but on Nav1.7 and Nav1.3 at DIV4-7 (and after inflammation in vivo).

      The conclusions of this paper are mostly well supported by data, and these findings should be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

      Strengths:

      (1.1) The authors have employed elegant electrophysiology experiments (including specific pharmacology and dynamic clamp) and computational simulations to study the excitability of a subpopulation of DRGs that would very likely match with nociceptors (they take advantage of using transgenic mice to detect Nav1.8-expressing neurons). They make a strong point showing the degeneracy that occurs at the ion channel expression level in nociceptors, adding this new data to previous observations in other neuronal types. They also demonstrate that the different Nav subtypes functionally overlap and are able to interchange their "typical" roles in action potential generation. As Xie, Prescott, and colleagues argue, the functional implications of the degenerate character of nociceptive sensory neuron excitability need to be seriously taken into account regarding drug development and clinical trials with Nav subtype-selective inhibitors.

      Weaknesses:

      (1.2) The next comments are minor criticisms, as the major conclusions of the paper are well substantiated. Most of the results presented in the article have been obtained from experiments with DRG neuron cultures, and surely there is a greater degree of complexity and heterogeneity about the degeneracy of nociceptors excitability in the "in vivo" condition. Indeed, the authors show in Figures 7 and 8 data that support their hypothesis and an increased Nav1.7's influence on nociceptor excitability after inflammation, but also a higher variability in the nociceptors spiking responses. On the other hand, DRG neurons targeted in this study (YFP (+) after crossing with Nav1.8-Cre mice) are >90% nociceptors, but not all nociceptors express Nav1.8 in vivo. As shown by Li et al., 2016 ("Somatosensory neuron types identified by high-coverage single-cell RNA-sequencing and functional heterogeneity"), there is a high heterogeneity of neuron subtypes within sensory neurons. Therefore, some caution should be taken when translating the results obtained with the DRG neuron cultures to the more complex "in vivo" panorama.

      We agree that most but not all Nav1.8+ DRG cells are nociceptors and that not all nociceptors express Nav1.8. We targeted small neurons that also express (or at some point expressed) Nav1.8, thus excluding larger neurons that express Nav1.8. This allowed us to hone in on a relatively homogeneous set of neurons, which is crucial when testing different neurons to compare between conditions (as opposed to testing longitudinally in the same neuron, which is not feasible). We expect all neurons are degenerate but likely on the basis of different ion channel combinations. Indeed, even within small Nav1.8+ neurons, other channels that we did not consider likely contribute to the degenerate regulation (as now better reflected in the revised Discussion).

      That said, there are multiple sources of heterogeneity. We suspect that heterogeneity is more increased after inflammation than after axotomy because all DRG neurons experience axotomy when cultured whereas neurons experience inflammation differently in vivo depending on whether their axon innervates the inflamed area (now explained on lines 214-215). This is not so much about whether the insult occurs in vivo or in vitro, but about how homogeneously neurons are affected by the insult. Granted, neurons are indeed more likely to be heterogeneously affected in vivo since conditions are more complex. But our goal in testing PF-71 in behavioral tests (Fig. 8) was to show that changes observed in nociceptor excitability in Figure 7, despite heterogeneity, were predictive of changes in drug efficacy. In short, we establish Nav interchangeability by comparing neurons in culture (Figs 1-6), but we then show that similar Nav shifts can develop in vivo (Fig 7) with implications for drug efficacy (Fig 8). Such results should alert readers to the importance of degeneracy for drug efficacy (which is our main goal) even without a complete picture of nociceptor degeneracy or DRG neuron heterogeneity. Additions to the Discussion (lines 248-259, 304-308) are intended to highlight these considerations.

      (1.3) Although the authors have focused their attention on Nav channels, it should be noted that degeneracy concerning other ion channels (such as potassium ion channels) could also impact the nociceptor excitability. The action potential AHP in Figure 1, panel A is very different comparing the DIV0 (blue) and DIV4-7 examples. Indeed, the conductance density values for the AHP current are higher at DIV0 than at DIV7 in the computational model (supplementary table 5). The role of other ion channels in order to obtain equivalent excitability should not be underestimated.

      We completely agree. We focused on Nav channels because of our initial observation with TTX and because of industry’s efforts to develop Nav subtype-selective inhibitors, whose likelihood of success is affected by the changes we report. But other channels are presumably changing, especially given observed changes in the AHP shape (now mentioned on lines 304-308). Investigation should be expanded to include these other channels in future studies.

      Reviewer #2 (Public Review):

      Summary:

      The authors have noted in preliminary work that tetrodotoxin (TTX), which inhibits NaV1.7 and several other TTX-sensitive sodium channels, has differential effects on nociceptors, dramatically reducing their excitability under certain conditions but not under others. Partly because of this coincidental observation, the aim of the present work was to re-examine or characterize the role of NaV1.7 in nociceptor excitability and its effects on drug efficacy. The manuscript demonstrates that a NaV1.7-selective inhibitor produces analgesia only when nociceptor excitability is based on NaV1.7. More generally and comprehensively, the results show that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes (NaV 1.3/1.7 and 1.8). This can cause widespread changes in the role of a particular subtype over time. The degenerate nature of nociceptor excitability shows functional implications that make the assignment of pathological changes to a particular NaV subtype difficult or even impossible.

      Thus, the analgesic efficacy of NaV1.7- or NaV1.8-selective agents depends essentially on which NaV subtype controls excitability at a given time point. These results explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      Strengths:

      (2.1) The above results are clearly and impressively supported by the experiments and data shown. All methods are described in detail, presumably allow good reproducibility, and were suitable to address the corresponding question. The only exception is the description of the computer model, which should be described in more detail.

      We failed to report basic information such as the software, integration method and time step in the original text. This information is now provided on lines 476-477. Notably, the full code is available on ModelDB plus all equations including the values for all gating parameters are provided in Supplementary Table 5 and values for maximal conductance densities for DIV0 and DIV7 models are provided in Supplementary Table 6. Changes in conductance densities to simulate different pharmacological conditions are reported in the relevant figure legends (now shown in red). We did not include model details in the main text to avoid disrupting the flow of the presentation, but all the model details are reported in the Methods, tables and/or figure legends.

      (2.2) The results showing that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and expression of different NaV subtypes are of great importance in the fields of basic and clinical pain research and sodium channel physiology and pharmacology, but also for a broad readership and community. The degenerate nature of nociceptor excitability, which is clearly shown and well supported by data has large functional implications. The results are of great importance because they may explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      In summary, the authors achieved their overall aim to enlighten the role of NaV1.7 in nociceptor excitability and the effects on drug efficacy. The data support the conclusions, although the clinical implications could be highlighted in a more detailed manner.

      Weaknesses:

      As mentioned before, the results that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes are impressive. However, there is some "gap" between the DRG culture experiments and acutely dissociated DRGs from mice after CFA injection. In the extensive experiments with cultured DRG neurons, different time points after dissociation were compared. Although it would have been difficult for functional testing to examine additional time points (besides DIV0 and DIV47), at least mRNA and protein levels should have been determined at additional time points (DIV) to examine the time course or whether gene expression (mRNA) or membrane expression (protein) changes slowly and gradually or rapidly and more abruptly.

      Characterizing the time course of NaV expression changes is worthwhile but, insofar as such details are not necessary to establish that excitability is degenerate, it was not include in the current study. Furthermore, since mRNA levels do not parallel the functional changes in Nav1.7 (Figure 6A), we do not think it would be helpful to measure mRNA levels at intermediate time points. Measuring protein levels would be more informative, however, as now explained on lines 362-369, neurons were recorded at intermediate time points in initial experiments and showed a lot of variability. Methods that could track fluorescently-tagged NaV channels longitudinally (i.e. at different time points in the same cell) would be well suited for this sort of characterization, but will invariably lead to more questions about membrane trafficking, phosphorylation, etc. We agree that a thorough characterization would be interesting but we think it is best left for a future study.

      It would also be interesting to clarify whether the changes that occur in culture (DIV0 vs. DIV47) are accompanied by (pro-)inflammatory changes in gene and protein expression, such as those known for nociceptors after CFA injection. This would better link the following data demonstrating that in acutely dissociated nociceptors after CFA injection, the inflammationinduced increase in NaV1.7 membrane expression enhances the effect of (or more neurons respond to) the NaV1.7 inhibitor PF-71, whereas fewer CFA neurons respond to the NaV1.8 inhibitor PF-24.

      These are some of the many good questions that emerge from our results. We are not particularly keen to investigate what happens over several days in culture, since this is not so clinically relevant, but it would be interesting to compare changes induced by nerve injury in vivo (which usually involves neuroinflammatory changes) and changes induced by inflammation. Many previous studies have touched on such issues but we are cautious about interpreting transcriptional changes, and of course all of these changes need to be considered in the context of cellular heterogeneity. It would be interesting to decipher if changes in NaV1.7 and NaV1.8 are directly linked so that an increase in one triggers a decrease in the other, and vice versa. But of course many other channels are also likely to change (as discussed above), and they too warrant attention, which makes the problem quite difficult. We look forward to tackling this in future work.

      The results shown explain, at least in part, the poor clinical outcomes with the use of subtypeselective NaV inhibitors and therefore have important implications for the future development of Nav-selective analgesics. However, this point, which is also evident from the title of the manuscript, is discussed only superficially with respect to clinical outcomes. In particular, the promising role of NaV1.7, which plays a role in nociceptor hyperexcitability but not in "normal" neurons, should be discussed in light of clinical results and not just covered with a citation of a review. Which clinical results of NaV1.7-selective drugs can now be better explained and how?

      We wish to avoid speculating on which particular clinical results are better explained because our study was not designed for that. Instead, our take-home message (which is well supported; see Discussion on lines 309-321) is that NaV1.7-selective drugs may have a variable clinical effect because nociceptors’ reliance on NaV1.7 is itself variable – much more than past studies would have readers believe. At the end of the results (line 235), which is, we think, what prompted the reviewer’s comment, we point to the Discussion. The corollary is that accounting for degeneracy could help account for variability in drug efficacy, which would of course be beneficial. The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is difficult. We certainly don’t have all the answers, but we hope our results will point readers in a new direction to help answer such questions.

      Another point directly related to the previous one, which should at least be discussed, is that all the data are from rodents, or in this case from mice, and this should explain the clinical data in humans. Even if "impediment to translation" is briefly mentioned in a slightly different context, one could (as mentioned above) discuss in more detail which human clinical data support the existence of "equivalent excitability through different sodium channels" also in humans.

      We are not aware of human data that speak directly to nociceptor degeneracy but degeneracy has been observed in diverse species; if anything, human neurons are probably even more degenerate based on progressive expansion of ion channel types, splice variants, etc. over evolution. Of course species differences extend beyond degeneracy and are always a concern for translation, because of a species difference in the drug target itself or because preclinical pain testing fails to capture the most clinically important aspects of pain (which we mention on line 35). Line 39 now reiterates that these explanations for translational difficulties are not mutually exclusive, but that degeneracy deserves greater consideration that is has hitherto received. Indeed, throughout our paper we imply that degeneracy may contribute to the clinical failure of Nav subtype-specific drugs, but those failures are certainly not evidence of degeneracy. In the Discussion (line 320-321), we now cite a recent review article on degeneracy in the context of epilepsy, and point out how parallels might help inform pain research. We wish we had a more direct answer to the reviewer’s request; in the absence of this, we hope our results motivate readers to seek out these answers in future research.

      Although speculative, it would be interesting for readers to know whether a treatment regimen based on "time since injury" with NaV1.7 and NaV1.8 inhibitors might offer benefits. Based on the data, could one hypothesize that NaV1.7 inhibitors are more likely to benefit (albeit in the short term) in patients with neuropathic pain with better patient selection (e.g., defined interval between injury and treatment)?

      We like that our data prompt this sort of prediction. However, this is potentially complicated since the injury may be subtle, which is to say that the exact timing may not be known. There are scenarios (e.g. postoperative pain) where the timing of the insult is known, but in other cases (e.g. diabetic neuropathy) the disease process is quite insidious, and different neurons might have progressed through different stages depending on how they were exposed to the insult. Our own experiments with CFA are a case in point. Notwithstanding the potential difficulties about gauging the time course, any way of predicting which Nav subtype is dominant could help more strategically choose which drug to use.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors used patch-clamp to characterize the implication of various voltagegated Na+ channels in the firing properties of mouse nociceptive sensory neurons. They report that depending on the culture conditions NaV1.3, NaV1.7, and NaV1.8 have distinct contributions to action potential firing and that similar firing patterns can result from distinct relative roles of these channels. The findings may be relevant for the design of better strategies targeting NaV channels to treat pain.

      Strengths:

      The paper addresses the important issue of understanding, from an interesting perspective, the lack of success of therapeutic strategies targeting NaV channels in the context of pain. Specifically, the authors test the hypothesis that different NaV channels contribute in a plastic manner to action potential firing, which may be the reason why it is difficult to target pain by inhibiting these channels. The experiments seem to have been properly performed and most conclusions are justified. The paper is concisely written and easy to follow.

      Weaknesses:

      (1) The most critical issue I find in the manuscript is the claim that different combinations of NaV channels result in equivalent excitability. For example, in the Abstract it is stated that: "...we show that nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8". The gating properties of these channels are not identical, and therefore their contributions to excitability should not be the same. I think that the culprit of this issue is that the authors reach their conclusion from the comparison of the (average) firing rate determined over 1 s current stimulation in distinct conditions. However, this is not the only parameter that determines how sensory neurons convey information. For instance, the time dependence of the instantaneous frequency, the actual firing pattern, may be important too. Moreover, the use of 1 s of current stimulation might not be sufficient to characterize the firing pattern if one wants to obtain conclusions that could translate to clinical settings (i.e., sustained pain). A neuron in which NaV1.7 is the main contributor is expected to have a damping firing pattern due to cumulative channel inactivation, whereas another depending mainly on NaV1.8 is expected to display more sustained firing. This is actually seen in the results of the modelling.

      This concern seems to boil down to how equivalent is equivalent? The spike shape or the full inputoutput curve for a DIV0 neuron (Nav1.8-dominant) is never equivalent to what’s seen in a DIV47 neuron (Nav1.7-dominant), but nor are any two DIV0 neurons strictly equivalent, and likewise for any two DIV4-7 neurons. Our point is that DIV0 and DIV4-7 neurons are a far more similar (less discriminable) in their excitability than expected from the qualitative difference in their TTX sensitivity (and from repeated claims in the literature that Nav1.7 is necessary for spike generation in nociceptors). Nav isoforms need not be identical to operate similarly; for instance, Nav1.8 tends to activate at “suprathreshold” voltages, but this depends on the value of threshold; if threshold increases, Nav1.8 can activate at subthreshold voltages (see Fig 5). We have modified lines 155- 175 to help clarify this.

      We completely agree that firing rate is not the only way to convey sensory information, and of course injecting current directly into the cell body via a patch pipette is not a natural stimulus. These are all factors to keep in mind when interpreting our data. Nonetheless, our data show that excitability is similar between DIV0 and DIV 4-7, so much so that data from any one neuron (without pharmacological tests or capacitance measurements) would likely not reveal if that cell is DIV0 or DIV4-7; this “indiscriminability” qualifies as “equivalent” for our purposes, and is consistent with phrasing used by other authors studying degeneracy. Notably, not every DIV4-7 neuron exhibits spike height attenuation (see Fig. 1A), likely because of concomitant changes in the AHP that were not captured in our computer model or directly tested in our experiments. This highlights that other channel changes may also contribute to degeneracy and the maintenance of repetitive spiking.

      (2) In Fig. 1, is 100 nM TTX sufficient to inhibit all TTX-sensitive NaV currents? More common in literature values to fully inhibit these currents are between 300 to 500 nM. The currents shown as TTX-sensitive in Fig. 1D look very strange (not like the ones at Baseline DIV4-7). It seems that 100 nM TTX was not enough, leading to an underestimation of the amplitude of the TTXsensitive currents.

      As now summarized in Supplementary Table 3 (which is newly added), 100 nM TTX is >20x the EC50 for Nav1.3 and Nav1.7 (but is still far below the EC50 for Nav1.8). Based on this, TTXsensitive channels are definitely blocked in our TTX experiments.

      (3) Page 8, the authors conclude that "Inflammation caused nociceptors to become much more variable in their reliance of specific NaV subtypes". However, how did the authors ensure that all neurons tested were affected by the CFA model? It could be that the heterogeneity in neuron properties results from distinct levels of effects of CFA.

      We agree with the reviewer. We also believe that variable exposure to CFA is the most likely explanation for the heightened variability in TTX-sensitivity reported in Figure 7 (now more clearly explained on lines 214-215). One could try co-injecting a retrograde dye with the CFA to label cells innervating the injection site, but differential spread of the CFA and dye are liable to preclude any good concordance. Alternatively, a pain model involving more widespread (systemic) inflammation might cause a more homogeneous effect. But, our main goal with CFA injections was to show that a Nav1.8®Nav1.7 switch can occur in vivo (and is therefore not unique to culturing), and that demonstration is true even if some neurons do not switch. Subsequent testing in Figure 8 shows that enough neurons switch to have a meaningful effect in terms of the behavioral pharmacology. So, notwithstanding tangential concerns, we think our CFA experiments succeeded in showing that Nav channels can switch in vivo and that this impacts drug efficacy.

      Recommendations for the authors:

      All reviewers agreed that these results are solid and interesting. However, the reviewers also raised several concerns that should be addressed by the authors to improve the strength of the evidence presented. Revisions considered to be essential include:

      (1) Discuss how degeneracy concerning other ion channels (such as potassium ion channels) could also impact nociceptor excitability (reviewer #1). Additionally, the translation of results from DRG neuron cultures to "in vivo" nociceptors should be better discussed.

      We have added a new paragraph to the Discussion (line 248-259) to remind readers that despite our focus on Nav channels, other ion channels likely also change (and that these changes involve diverse regulatory mechanisms that require further investigation). Likewise, despite our focus on the changes caused by culturing neurons, we remind readers that subtler, more clinically relevant in vivo perturbations can likewise cause a multitude of changes. We end that paragraph by emphasizing that although accounting for all the contributing components is required to fully understand a degenerate system, meaningful progress can be made by studying a subset of the components. We want to emphasize this because there is some middle ground between focusing on one component at a time (which is the norm) vs. trying to account for everything (which is an infeasible ideal). Additional text on lines 304-308 also addresses related points.

      (2) Discuss how different combinations of NaV channels result in equivalent excitability, in the context of the experimental conditions used (see main comment by reviewer #3). It should also be discussed in more detail which human clinical data support the existence of "equivalent excitability through different sodium channels" also in humans (reviewer #2).

      Regarding the first part of this comment, reviewer 3 wrote in the public review that “The gating properties of these channels are not identical, and therefore their contributions to excitability should not be the same.” Differences in gating properties are commonly used to argue that different Nav subtypes mediate different phases of the spike, for example, that Nav1.7 initiates the spike whereas Nav1.8 mediates subsequent depolarization because Nav1.7 and Nav1.8 activate at perithreshold and suprathrehold voltages, respectively (see lines 134-135, now shown in red). But such comparison is overly simplistic insofar as it neglects the context in which ion channels operate. For instance, if Nav1.7 is not expressed or fully inactivates, voltage threshold will be less negative, enabling Nav1.8 to contribute to spike initiation; in other words, previously “suprathreshold” voltages become “perithreshold”. Figure 5 is dedicated to explaining this context-sensitivity; specifically, we demonstrate with simulations how Nav1.8 takes over responsibility for initiating a spike when Na1.7 is absent or inactivated. Text on lines 155- 184 has been edited to help clarify this. Regarding the second part of this comment, we are not aware of any direct evidence from human sensory neurons that different sodium channels produce equivalent excitability, but that is certainly what we expect. We suggest that failure of Nav subtype-specific drugs is, at least in part, because of degeneracy, but such failures do not demonstrate degeneracy unless other contributing factors can be excluded (which they can’t). Recognizing degeneracy is difficult, and so variability that might be explained by degeneracy will go unexplained or attributed to other factors unless, by design or serendipity, experiments quantify the effects of degeneracy (as we have attempted to do here). We now cite a recent review article on degeneracy and epilepsy (line 320), which addresses relevant themes that might help inform pain research; for instance, most existing antiseizure medications act on multiple targets whereas more recently developed single-target drugs have proven largely ineffective. This is similar to but better documented than for analgesics. With this in mind, we revised the text to emphasize the circumstantial nature of existing evidence and the need to test more directly for degeneracy (lines 320-323).

      (3) Extend the discussion about the poor clinical outcomes with the use of subtype-selective NaV inhibitors. In particular, the promising role of NaV1.7, which plays a role in nociceptor hyperexcitability but not in "normal" neurons, should be discussed in light of clinical results and not just covered with a citation of a review. Which clinical results of NaV1.7-selective drugs can now be better explained and how? (reviewer #2)

      As discussed above, we are cautious avoid speculating on which clinical results are attributable to degeneracy. Instead, our take-home message (see Discussion, lines 309-323) is that NaV1.7selective drugs may have a variable clinical effect because nociceptors’ reliance on NaV1.7 is itself variable – much more than past studies would have readers believe. The corollary is that accounting for degeneracy could help account for variability in drug efficacy, which would of course be beneficial. The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is not trivial. Interpreting clinical data is also complicated by the fact that we are either dealing with genetic mutations (with unclear compensatory changes) or pharmacological results (where NaV1.7-selective drugs have a multitude of problems that might contribute to their lack of efficacy, separate from effects of degeneracy). We have striven to contextualize our results (e.g. last paragraph of results, lines 222-235). We think this is the most we can reasonably say based on the limitations of existing clinical data.

      (4) Provide a clearer and more detailed description of the computational model (reviewers #2 and #3).

      We added important details on line 476-477 but, in our honest opinion, we think our computational model is thoroughly explained. The issue seems to boil down to whether details are included in the Results vs. being left for the Methods, tables and figure legends. We prefer the latter.

      (5) Better clarify the effects of the CFA model, to provide further evidence relating inflammation with nociceptors variability (reviewers #2 and #3)

      As explained in response to a specific point by reviewer #3, we believe that variable exposure to CFA explains the heightened variability in TTX-sensitivity reported in Figure 7 (now explained on lines 214-215). One could try co-injecting a retrograde dye with the CFA to label cells innervating the injection site, but differential spread of the inflammation and dye are liable to preclude any good concordance. Alternatively, a pain model involving more widespread (systemic) inflammation might cause a more homogeneous effect. But, our main goal with CFA injections was to show that a Nav1.8®Nav1.7 switch can occur in vivo (and is therefore not unique to culturing); that demonstration holds true even if some neurons do not switch. Subsequent testing (Fig 8) shows that enough neurons switch to drug efficacy assessed behaviorally. This is emphasized with new text on lines 225-227. Overall, we think our CFA experiments succeed in showing that Nav channels can switch in vivo and, despite variability, that this occurs in enough neurons to impact drug efficacy.

      (6) Revise the text according to all recommendations raised by the reviewers and listed in the individual reviews.

      Detailed responses are provided below for all feedback and changes to the text were made whenever necessary, as identified in our responses.

      Reviewer #1 (Recommendations For The Authors):

      Minor points/recommendations:

      Protein synthesis inhibition by cercosporamide could be the direct cause of a smaller-thanexpected increase in Nav1.7 levels at DIV5. But for Nav1.8, there is a mitigation in the increased levels at DIV5, that only could be explained by several indirect mechanisms, including membrane trafficking and posttranslational modifications (phosphorylation, SUMOylation, etc.) on Nav1.8 or protein regulators of Nav1.8 channels. The authors suggest that "translational regulation is crucial", but also insinuate that other processes (membrane trafficking, etc.) could contribute to the observed outcome. It is difficult to assess the relative importance of these different explanations without knowing the exact mechanisms that are acting here.

      We agree. We relied on electrophysiology (and pharmacology) to measure functional changes, but we wanted to verify those data with another method. We expected mRNA levels to parallel the functional changes but, when that did not pan out, we proceeded to look at protein levels. Perhaps we should have stopped there, but by blocking protein translation, we show that there is not enough Nav1.7 protein already available that can be trafficked to the membrane. That does not explain why Nav1.8 levels drop. Our immunohistochemistry could not tease apart membrane expression from overall expression, which limits interpretation. We have enhanced the text to discuss this (lines 200-204), but further experiments are needed. Though admittedly incomplete, our initial finding help set the stage for future experiments on this matter.

      Page 15, typo: "contamination from genomic RNA" -> "contamination from genomic DNA" (appears twice).

      This has been corrected on lines 420 and 421.

      Page 17: I could not find the computer code at ModelDB (http://modeldb.yale.edu/267560). It seems to be an old web link. It should be available at some web repository.

      We confirmed that the link works. Entry is password-protected (password = excitability; see line 476). Password protection will be removed once the paper is officially published.

      Page 19, reference 36, typo: "Inhibitio of" -> "Inhibition of".

      This has been corrected (line 557).

      Page 33, typo: "are significantly larger than differences at DIV1" -> "are significantly larger than differences at DIV0".

      This has been corrected (line 796).

      Page 35, figure 6 legend. The number of experiments (n) is not indicated for panel C data.

      N = 3 is now reported (line 828).

      Reviewer #2 (Recommendations For The Authors):

      p. 3/4 and Data of Fig. 6: It should be commented on why days 1-3 were not investigated. An investigation of the time course (by higher frequency testing) would certainly have an added value because it would be possible to deduce whether the changes develop slowly and gradually, or whether the excitability induced by different NaVs changes suddenly. At least mRNA and protein levels should be determined at additional time points to examine the time course or whether gene expression (mRNA) or membrane expression (protein) changes slowly and gradually or rapidly and more abruptly. It would also be interesting to clarify whether the changes that occur in culture (DIV0 vs. DIV4-7) are accompanied by (pro-)inflammatory changes in gene and protein expression, such as those known for nociceptors after CFA injection. Or is the latter question clear in the literature?

      We now explain (lines 362-369) that intermediate time points (DIV1-3) were tested in initial current clamp recordings. Those data showed that TTX-sensitivity stabilized by DIV4 and differed from the TTX-insensitivity observed at DIV0. TTX-sensitivity was mixed at DIV1-3 and crosscell variability complicated interpretation. Subsequent experiments were prioritized to clarify why NaV1.7 is not always critical for nociceptor excitability, contrary to past studies. Our efforts to measure mRNA and protein levels were primarily to validate our electrophysiological findings; we are also interested in deciphering the underlying regulatory processes but this is an entire study on its own. Unfortunately, the existing literature does not help or point to an explanation for the Nav1.7/1.8 shift we observed.

      Our evidence that mRNA levels do not parallel functional changes argues against pursuing transcriptional changes in Nav1.7, though transcriptional changes in other factors might be important. Interpretation of immuno quantification would be complicated by the high variability we observed with the physiology at intermediate time points and, furthermore, we cannot resolve surface expression from overall expression based on available antibodies. Methods conducive to longitudinal measurements would be more appropriate (as now mentioned on line 367-369). In short, a lot more work is required to understand the mechanisms involved in the switch, but we think the existing demonstration suffices to show that NaV1.7 and NaV1.8 protein levels vary, with crucial implications for which Nav subtype controls nociceptor excitability, and important implications for drug efficacy. Explaining why and how quickly those protein levels change will be no small feat is best left for a future study.

      p. 4 and following: In order to enable the interpretation of the used concentration of PF-24, PF71, and ICA, the respective IC50 should be indicated.

      A table (now Supplementary Table 3; line 861) has been added to report EC50 values for all drugs for blocking NaV1.7, NaV1.8 and NaV1.3. The concentrations we used are included on that table for easy comparison.

      p. 5, end of the middle paragraph: Here it should be briefly explained -for less familiar readers- why NaV1.1 cannot be causative (ICA inhibits NaV1.1 and 1.3).

      We now explain (lines 117-120) that NaV1.1 is expressed almost exclusively in medium-diameter (A-delta) neurons whereas NaV1.3 is known to be upregulated in small-diameter neurons, and so the effect we observe in small neurons is most likely via blockade NaV1.3.

      p. 6, lines 4/5: At least once it should read computer model instead of model.

      “Computer” has been added the first time we refer to DIV0 or DIV4-7 computer models (lines 138-139)

      p. 6: the difference between Fig. 4B and Fig. 4 - Figure suppl. 1 should be mentioned briefly.

      We now explain (lines 150-154) that Fig. 4B involves replacing a native channel with a different virtual channel (to demonstrate their interchangeability) whereas and Fig. 4 - Figure supplement 1 involves replacing a native channel with the equivalent virtual channel (as a positive control).

      p. 6/7: the text and the conclusions regarding Figure 5 are difficult to follow. Somewhat more detailed explanations of why which data demonstrate or prove something would be helpful.

      The text describing Figure 5 (lines 155-175) has been revised to provide more detail.

      p. 7, last sentence of the first paragraph: How is this supported by the data? Or should this sentence be better moved to the discussion?

      This sentence (now lines 182-184) is designed as a transition. The first half – “a subtype’s contribution shifts rapidly (because of channel inactivation)” – summarizes the immediately preceding data (Figure 5). The second half – “or slowly (because of [changes in conductance density])” – introduces the next section. The text show in square brackets has been revised. We hope this will be clearer based on revisions to the associated text.

      p. 7, second paragraph, line 3: Please delete one "at both".

      Corrected

      p. 7, second paragraph: Please explain why different time points (DIV4-7, DIV5, or DIV7) were used or studied.

      Initial electrophysiological experiments determined that TTX sensitivity stabilized by DIV 4 (see response to opening point) and we did not maintain neurons longer than 7 days, and so neurons recorded between DIV4 and 7 were pooled. If non-electrophysiological tests were conducted on a specific day within that range, we report the specific day, but any day within the DIV4-7 range is expected to give comparable results. This is now explained on lines 365-367.

      p. 8: the text regarding Fig. 7 should also include the important data (e.g. percentage of neurons showing repetitive spinking) mentioned in the legend.

      This text (lines 216-220) has been revised to include the proportion of neurons converted by PF71 and PF-24 and the associated statistical results.

      Fig. 1: third panel (TTX-sensitive current...) of D & Fig. 2 subpanel of A (Nav1.8 current...). These panels should be explained or mentioned in the text and/or legends.

      We now explain in the figure legends (lines 708-710; 714-715; 736-738) how those currents are found through subtraction.

      Fig. 2 - figure supplement 2. One might consider taking Panel A to Fig. 2 so that the comparison to DIV0 is apparent without switching to Suppl. Figs.

      We left this unchanged so that Figures 2 and 3 are equivalently organized, with negative control data left to the supplemental figures. Elife formatting makes it easy to reach the supplementary figure from the main figure, so we hope this won’t be an impediment to readers.

      Fig. 6 C, middle graph (graph of Nav1.7): Please re-check, whether DIV5 none vs. 24 h and none vs. 120 h are really significantly different with such a low p-value.

      We re-checked the statistics and the difference pointed out by the reviewer is significant at p=0.007. We mistakenly reported p<0.001 for all comparisons, and so this p value has been corrected; all the other p values are indeed <0.001. Notably, the data are summarized as median ± quartile because of their non-Gaussian distribution; this is now explained on line 827 (as a reminder to the statement on lines 461-462). Quartiles are more comparable to SD than to SEM (in that quartiles and SD represent the distribution rather than confidence in estimating the mean, like SEM), and so medians can differ very significantly even if quartiles overlap, as in this case.

      Reviewer #3 (Recommendations For The Authors):

      (1) A critical issue in the manuscript is the use of teleological language. It is likely that this is not the intention, but careful revision of the language should be done to avoid the use of expressions that confer purpose to a biological process. Please, find below a list of statements that I consider require correction.

      • In the Abstract, the first sentence: "Nociceptive sensory neurons convey pain signals to the CNS using action potentials". Neurons do not really "use" action potentials, they have no will or purpose to do so. Action potentials are not tools or means to be "used" by neurons. Other examples of misuse of the verb "use" are found in several other sentences:

      "...nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8"

      "Flexible use of different NaV subtypes - an example of degeneracy - compromises..."

      "Nociceptors can achieve equivalent excitability using different sodium channel subtypes" "...degeneracy - the ability of a biological system to achieve equivalent function using different components..."

      "...nociceptors can achieve equivalent excitability using different sodium channel subtypes..."

      "Our results show that nociceptors can achieve similar excitability using different NaV channels" "...the spinal dorsal horn circuit can achieve similar output using different synaptic weight combinations..."

      "Contrary to the view that certain ion channels are uniquely responsible for certain aspects of neuronal function, neurons use diverse ion channel combinations to achieve similar function" "In summary, our results show that nociceptors can achieve equivalent excitability using different NaV subtypes"

      “Use” can mean to put into action (without necessarily implying intention). Based on definitions of the word in various dictionaries, we feel we are well within the realm of normal usage of this term. In trying to achieve a clear and succinct writing style, we have stuck with our original word choice.

      • At the end of page 5 and in the legend of Fig. 7, the word "encourage" is not properly used in the sentence "The ability of NaV1.3, NaV1.7 and NaV1.8 to each encourage repetitive spiking is seemingly inconsistent with the common view...". Encouraging is really an action of humans or animals on other humans or animals.

      Like for “use”, we verified our usage in various dictionaries and we do not think that most readers will be confused or disturbed by our word choice. We use “encourage” to explain that increasing NaV1.3, NaV1.7 or NaV1.8 can increase the likelihood of repetitive spiking; we avoided “cause” because the probability of repetitive spiking is not raised to 100%, since other factors must always be considered.

      • In the Abstract and other places in the manuscript, the word "responsibility" seems to be wrongly employed. It is true that one can say, for instance, on page 4 last paragraph "we sought to identify the NaV subtype responsible for repetitive spiking at each time point". However, to confer channels with the human quality of having "responsibility" for something does not seem appropriate. See also page 8 last paragraph, the first paragraph of the Discussion, and the three paragraphs of page 11.

      Again, we must respectfully disagree with the reviewer. We appreciate that this reviewer does not like our writing style but we do not believe that our style violates English norms.

      (2) In the first sentence of the Abstract, nociceptive sensory neurons do not convey "pain signals". Pain is a sensation that is generated in the brain.

      “Pain” is used as an adjective for “signal” and is used to help identify the type of signal. Nonetheless, since the word count allowed for it, we now refer to “pain-related signals” (line 10).

      (3) I do not see the point of plotting the firing rate as a function of relative stimulus amplitude (normalized to the rheobase, e.g., Fig. 1A bottom panels, Fig. 2B, bottom-right, Fig. 2 Supp2A right, Fig. 3 B bottom panels, etc) instead of as a function of the actual stimulus amplitude. I have the impression that this maneuver hides information. This is equivalent to plotting the current amplitudes as a function of the voltage normalized by the voltage threshold for current activation, which is obviously not done.

      This is how the experiments were performed, so it would be impossible to perform the statistical analysis using the absolute amplitudes post-hoc; specifically, stimulus intensities were tested at increments defined relative to rheobase rather than in absolute terms. There are pros and cons to each approach, and both approaches are commonly used. Notably, we report the value of rheobase on the figures so that readers can, with minimal arithmetic, convert to absolute stimulus intensities. No information is hidden by our approach.

      (4) On page 4 it is stated that "We show later that similar changes develop in vivo following inflammation with consequences for drug efficacy assessed behaviourally (see Fig. 8), meaning the NaV channel reconfiguration described above is not a trivial epiphenomenon of culturing". However, what happens in culture may have nothing in common with what happens in vivo during inflammation. Thus, the latter data may not serve to answer whether the culture conditions induce artifacts or not. I suggest tuning down this statement by changing "meaning" to "suggesting".

      On line 97, we now write “suggesting”.

      (5) Page 5, first paragraph, I miss a clear description of the mathematical models. Having to skip to the Methods section to look for the details of the models as the artifices introduced to simulate different conditions is rather inconvenient.

      So as not to disrupt the flow of the presentation with methodological details, we only provide a short description of the model in the Results. We have slightly expanded this to point out that the conductance-based model is also single-compartment (line 111). We provide a very thorough description of our model in the Methods, especially considering all the details provided in Supplementary Tables 1, 5 and 6. We also report conductance densities and % changes in figure legends (lines 722, 747-748; now shown in red). This is also true for Figure 3-figure supplement 2 (lines 756-759). We tried very hard to find a good balance that we hope most readers will appreciate.

      (6) Page 6, second paragraph, simulations do not serve to "measure" currents.

      The sentence been revised to indicate that simulations were used to “infer” currents during different phases of the spike (line 155).

      (7) Page 7, regarding the tile of the subsection "Control of changes in NaV subtype expression between DIV0 and DIV4-7", the authors measured the levels of expression, but not really the mechanisms "controlling" them. I suggest writing "changes in NaV subtype expression between DIV0 and DIV4-7"

      We have removed “control of” from the section title (line 185)

      (8) What was the reason for adding a noise contribution in the model?

      We now explain that noise was added to reintroduce the voltage noise that is otherwise missing from simulations (line 474). For instance, in the absence of noise, membrane potential can approach voltage threshold very slowly without triggering a spike, which does not happen under realistically noisy conditions. Of course membrane potential fluctuates noisily because of stochastic channel opening and a multitude of other reasons. This is not a major issue for this study, and so we think our short explanation should suffice.

      (9) Please, define the concept of degeneracy upon first mention.

      Degeneracy is now succinctly defined in the abstract (line 20).

    1. And if someone is in the bathroom, they’re 10-100 (or 10-200 as the case may be), but they’re definitely not “in the can”, which is what you say when a scene is completed.

      In cinema, " In the can " doesn't mean what we think it does. However the people working on the set know the linguo, and understand it means the scene is complete.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study reports a novel mechanism linking DHODH inhibition-mediated pyrimidine nucleotide depletion to antigen presentation. Alternative means of inducing antigen presentation provide therapeutic opportunities to augment immune checkpoint blockade for cancer treatment. While the solid mechanistic data in vitro are compelling, in vivo assessments of the functional relevance of this mechanism are still incomplete.

      Public Reviews:

      We thank all Reviewers for their insightful comments and excellent suggestions.

      Reviewer #1 (Public Review):

      The manuscript by Mullen et al. investigated the gene expression changes in cancer cells treated with the DHODH inhibitor brequinar (BQ), to explore the therapeutic vulnerabilities induced by DHODH inhibition. The study found that BQ treatment causes upregulation of antigen presentation pathway (APP) genes and cell surface MHC class I expression, mechanistically which is mediated by the CDK9/PTEFb pathway triggered by pyrimidine nucleotide depletion.

      No comment from authors

      The combination of BQ and immune checkpoint therapy demonstrated a synergistic (or additive) anti-cancer effect against xenografted melanoma, suggesting the potential use of BQ and immune checkpoint blockade as a combination therapy in clinical therapeutics.

      No comment from authors

      The interesting findings in the present study include demonstrating a novel cellular response in cancer cells induced by DHODH inhibition. However, whether the increased antigen presentation by DHODH inhibition actually contributed to the potentiation of the efficacy of immune-check blockade (ICB) is not directly examined is the limitation of the study.

      No comment from authors for preceding text, comment addresses the following text

      Moreover, the mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways.

      We appreciate this comment, and we would like to explain why we did not pursue these approaches. According to DepMap, CRISPR/Cas9-mediated knockout of CDK9 in cancer cell lines is almost universally deleterious, scoring as “essential” in 99.8% (1093/1095) of all cell lines tested (see Author response image 1 below). This makes sense, as P-TEFb is required for productive RNA polymerase II elongation of most mammalian genes. As such, it was not feasible to generate cell lines with stable genetic knockout of CDK9 to test our hypothesis.

      While knockdown of CDK9 by RNA interference could support our results, DepMap data seems to indicate that RNAi-mediated knockdown of CDK9 is generally ineffective in silencing its activity, as this perturbation scored as “essential” in only 6.2% (44/710) of tested cell lines. This suggests that incomplete depletion of CDK9 will likely not be sufficient to block APP induction downstream of nucleotide depletion. Furthermore, RNAi-mediated depletion of CDK9 may trigger transcriptional changes in the cell by virtue of its many documented protein-protein interactions, and it would be difficult to establish a consistent “time zero” at which point CDK9 protein depletion is substantial but secondary effects of this have not yet occurred to a significant degree. These factors constitute major limitations of experiments using RNAi-mediated knockdown of CDK9.

      Author response image 1.

      Essentiality score from CRISPR and RNAi perturbation of CDK9 in cancer cell lines https://depmap.org/portal/gene/CDK9?tab=overview&dependency=RNAi_merged

      At any rate, we provide evidence that three different inhibitors of CDK9 (flavopiridol, dinaciclib, and AT7519) all inhibit our effect of interest (Fig 4B). The same results were observed using a previously validated CDK9-directed proteolysis targeting chimera (PROTAC2), and this was reversed by addition of excess pomalidomide (Fig 4C), which correlated with the presence/absence of CDK9 on western blot under the exact same conditions (Fig 4D).

      It is formally possible that all CDK9 inhibitors we tested are blocking BQ-mediated APP induction by some shared off-target mechanism (or perhaps by two or more different off-target mechanisms) AND this CDK9-independent target also happens to be degraded by PROTAC2. However, this would be an extraordinarily non-parsimonious explanation for our results, and so we contend that we have provided compelling evidence for the requirement of CDK9 for BQ-mediated APP induction.

      Finally, high concentrations of BQ have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, and the authors should discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      We are intrigued by the results shown to us by Reviewer #1 in the linked preprint (Mishima et al 2022, https://doi.org/10.21203/rs.3.rs-2190326/v1). We have also observed in our unpublished data that very high concentrations of BQ (>150µM) cause loss of cell viability that is not rescued by uridine supplementation and that occurs even in DHODH knockout cells. This effect of high-dose BQ must be DHODH-independent. We also agree that Mishima et al provide compelling evidence that the ferroptosis-sensitizing effect of high-dose BQ treatment is due (at least in large part) to inhibition of FSP1.

      Although we showed that DHODH is strongly inhibited in tumor cells in vivo (Fig 5C), we did not directly measure the concentration of BQ in the tumor or plasma. Sykes et al (PMID: 27641501) found that the maximum plasma concentration (Cmax) for [BQ]free following a single IP administration in C57Bl6/J mice (15mg/kg) is approximately 3µM, while the Cmax for [BQ]total was around 215µM. Because polar drug molecules bound to serum proteins (predominantly albumin) are not available to bind other targets, [BQ]free is the relevant parameter.

      Given a Cmax for [BQ]free of 3µM and half-life of 12.0 hours, we estimate that the steady-state [BQ]free with daily IP injections at this dose is around 4µM. Since we used an administration schedule of 10mg/kg every 24 hours, we estimate that the steady-state plasma [BQ]free in our system was 2.67µM (assuming initial Cmax of 2µM and half-life of 12.0 hours).

      To derive an upper-bound estimate for the Cmax of [BQ]free over the 12-day treatment period (Fig 5A-D), we will use the observed data for 15mg/kg dose, and we will assume that 1) there is no clearance of BQ whatsoever and 2) that [BQ]free increases linearly with increasing [BQ]total. This yields a maximum free BQ concentration of 12 x 3 = 36µM.

      Therefore, we consider it very unlikely that plasma concentrations of free BQ in our experiment exceeded the lower limit of the ferroptosis-sensitizing dose range reported by Mishima et al. However, without direct pharmacokinetic analysis, we cannot say for sure what the maximal [BQ]free was under our experimental conditions.

      Reviewer #2 (Public Review):

      In their manuscript entitled "DHODH inhibition enhances the efficacy of immune checkpoint blockade by increasing cancer cell antigen presentation", Mullen et al. describe an interesting mechanism of inducing antigen presentation. The manuscript includes a series of experiments that demonstrate that blockade of pyrimidine synthesis with DHODH inhibitors (i.e. brequinar (BQ)) stimulates the expression of genes involved in antigen presentation. The authors provide evidence that BQ mediated induction of MHC is independent of interferon signaling. A subsequent targeted chemical screen yielded evidence that CDK9 is the critical downstream mediator that induces RNA Pol II pause release on antigen presentation genes to increase expression. Finally, the authors demonstrate that BQ elicits strong anti-tumor activity in vivo in syngeneic models, and that combination of BQ with immune checkpoint blockade (ICB) results in significant lifespan extension in the B16-F10 melanoma model. Overall, the manuscript uncovers an interesting and unexpected mechanism that influences antigen presentation and provides an avenue for pharmacological manipulation of MHC genes, which is therapeutically relevant in many cancers. However, a few key experiments are needed to ensure that the proposed mechanism is indeed functional in vivo.

      The combination of DHODH inhibition with ICB reflects more of an additive response instead of a synergistic combination. Moreover, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. To confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition, the authors should examine whether depletion of immune cells can reduce the therapeutic efficacy of BQ in vivo.

      We concur with this assessment.

      Moreover, they should examine whether BQ treatment induces antigen presentation in non-malignant cells and APCs to determine the cancer specificity.

      Although we showed that this occurs in HEK-293T cells, we appreciate that this cell line is not representative of human cells of any organ system in vivo. So, we agree it is important to determine if DHODH inhibition induces antigen presentation in human tissues and professional antigen presenting cells, and this is an excellent focus for future studies.

      However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline (i.e. even in the absence of DHODH inhibitor treatment), since all nucleated cells express MHC-I.

      This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”

      If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.

      Finally, although the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level, only MHC-I is validated by flow cytometry given the importance of MHC-II expression on epithelial cancers, including melanoma, MHC-II should be validated as well.

      We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.

      If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.

      In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.

      [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]

      Overall, the paper is clearly written and presented. With the additional experiments described above, especially in vivo, this manuscript would provide a strong contribution to the field of antigen presentation in cancer. The distinct mechanisms by which DHODH inhibition induces antigen presentation will also set the stage for future exploration into alternative methods of antigen induction.

      Reviewer #3 (Public Review):

      Mullen et al present an important study describing how DHODH inhibition enhances efficacy of immune checkpoint blockade by increasing cell surface expression of MHC I in cancer cells. DHODH inhibitors have been used in the clinic for many years to treat patients with rheumatoid arthritis and there has been a growing interest in repurposing these inhibitors as anti-cancer drugs. In this manuscript, the Singh group build on their previous work defining combinatorial strategies with DHODH inhibitors to improve efficacy. The authors identify an increase in expression of genes involved in the antigen presentation pathway and MHC I after BQ treatment and they narrow the mechanism to be strictly pyrimidine and CDK9/P-TEFb dependent. The authors rationalize that increased MHC I expression induced by DHODH inhibition might favor efficacy of dual immune checkpoint blockade. This combinatorial treatment prolonged survival in an immunocompetent B16F10 melanoma model.

      [No comment from authors]

      Previous studies have shown that DHODH inhibitors can increase expression of innate immunity-related genes but the role of DHODH and pyrimidine nucleotides in antigen presentation has not been previously reported. A strength of the manuscript is the use of multiple controls across a panel of cell lines to exclude off-target effects and to confirm that effects are exclusively dependent on pyrimidine depletion. Overall, the authors do a thorough characterization of the mechanism that mediates MHC I upregulation using multiple strategies. Furthermore, the in vivo studies provide solid evidence for combining DHODH inhibitors with immune checkpoint blockade.

      No comment from authors

      However, despite the use of multiple cell lines, most experiments are only performed in one cell line, and it is hard to understand why particular gene sets, cell lines or time points are selected for each experiment. It would be beneficial to standardize experimental conditions and confirm the most relevant findings in multiple cell lines.

      We appreciate this comment, and we understand how the use of various cell lines may seem puzzling. We would like to explain how our cell line panel evolved over the course of the study. Our first indication that BQ caused APP upregulation came from transcriptomics experiments (Figs 1A-D, S1A) performed as part of a previous study investigating BQ resistance (Mullen et al, 2023 Cancer Letters). In that study, we used CFPAC-1 as a model for BQ sensitivity and S2-013 as a model for BQ resistance. We did RNA sequencing +/- BQ in these cell lines to look for gene expression patterns that might underlie resistance/sensitivity to BQ. When analyzing this data, we serendipitously discovered the APP/MHC phenomenon, which gave rise to the present study.

      Our next step was to extend these findings to cancer cell lines of other histologies, and we prioritized cell lines derived from common cancer types for which immunotherapy (specifically ICB) are clinically approved. This is why A549 (lung adenocarcinoma), HCT116 (colorectal adenocarcinoma), A375 (cutaneous melanoma), and MDA-MB-231 (triple-negative breast cancer) cell lines were introduced.

      Because PDAC is considered to have an especially “immune-cold” tumor microenvironment, we reasoned that even dramatically increasing cancer cell antigen presentation may be insufficient to elicit an effective anti-tumor immune response in vivo. So we shifted our focus towards melanoma, because a subset of melanoma patients is very responsive to ICB and loss of antigen presentation (by direct silencing or homozygous loss-of-function mutations in MHC-I components such as B2M, or by functional loss of IFN-JAK1/2-STAT signaling) has been shown to mediate ICB resistance in human melanoma patients. This is why we extended our findings to B16F10 murine melanoma cells, intending to use them for in vivo studies with syngeneic immunocompetent recipient mice.

      The PDAC cell line MiaPaCa2 was introduced because a collaborator at our institution (Amar Natarajan) happened to have IKK2 knockout MiaPaCa2 cells, which allowed us to genetically validate our inhibitor results showing that IKK1 and IKK2 (crucial effectors for NF-kB signaling) are dispensable for our effect of interest.

      Ultimately, realizing that our results spanned various human and murine cell lines, we chose to use HEK-293T cells to validate the general applicability of our findings to proliferating cells in 2D culture, since HEK-293T cells (compared to our cancer cell lines) have relatively few genetic idiosyncrasies and express MHC-I at baseline.

      The differential in vivo survival depending on dosing schedule is interesting. However, this section could be strengthened with a more thorough evaluation of the tumors at endpoint.

      Overall, this is an interesting manuscript proposing a mechanistic link between pyrimidine depletion and MHC I expression and a novel therapeutic strategy combining DHODH inhibitors with dual checkpoint blockade. These results might be relevant for the clinical development of DHODH inhibitors in the treatment of solid tumors, a setting where these inhibitors have not shown optimal efficacy yet.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The main issue is that it did not directly examine whether the increased antigen presentation by DHODH inhibition contributed to the potentiation of the efficacy of immune-check blockade (ICB). The additional effect of BQ in the xenograft tumor study was not examined to determine if it was due to increased antigen presentation toward the cancer cells or due to merely cell cycle arrest effect by pyrimidine depletion in the tumor cells. The different administration timing of ICB with BQ treatment (Fig 5E) would not be sufficient to answer this issue.

      We agree with this assessment and, and we believe the experiment proposed by Reviewer #2 below (comparing the efficacy of BQ in Rag-null versus immunocompetent recipients) would address this question directly. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.

      (2) Additionally, in the in vivo study, the increase in surface MHC1 in the protein level in by BQ treatment was not examined in the tumor samples, and it was not confirmed whether increased antigen presentation by BQ treatment actually promoted an anti-cancer immune response in immune cells. To support the story presented in the study, these data would be necessary.

      We attempted to show this by immunohistochemistry, but unfortunately the anti-H2-Db antibody that we obtained for this purpose did not have satisfactory performance to assess this in our tissue samples harvested at necropsy.

      (3) The mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways. In general, results only by the inhibitor assay have a limitation of off-target effects.

      Please see our above reply to Reviewer #1 comment making this same point, where we spell out our rationale for not pursuing these experiments.

      (4) High concentrations of BQ (> 50 uM) have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, an iron-mediated lipid peroxidation-dependent cell death, independent of DHODH inhibition (https://www.researchsquare.com/article/rs-2190326/v1). It would be needed to discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      Please see our above reply to Reviewer #1 comment making this same point, where we explain why we are very confident that the BQ dose administered in our animal experiments was far below the minimum reported BQ dose required to sensitize cancer cells to ferroptosis in vitro.

      Reviewer #2 (Recommendations For The Authors):

      Major Points

      (1) According to the proposed model, BQ mediated induction of antigen presentation is a contributing factor to the efficacy of this therapeutic strategy. If this is true, then depletion of immune cells should reduce the therapeutic efficacy of BQ in vivo. The authors should perform the B16-F10 transplant experiments in either Rag null mice (if available) or with CD8/CD4 depletion. The expectation would be that T cell depletion (or MHC loss with genetic manipulation) should reduce the efficacy of BQ treatment. Absent this critical experiment, it is difficult to confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition.

      We agree with this assessment and the proposed experiment comparing the response in Rag-null versus immunocompetent recipients. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.

      (2) Does BQ treatment induce antigen presentation in non-malignant cells? APCs? If the induction of antigen presentation is not cancer specific and related to a pyrimidine depletion stress response, then there is a possibility that healthy tissues will also exhibit a similar phenotype, raising concerns about the specificity of a de novo immune response. The authors should examine antigen presentation genes in healthy tissues treated with BQ.

      We agree it is important to examine if our findings regarding nucleotide depletion and antigen presentation are true of APCs and other non-transformed cells, but we are not so concerned about the possibility of raising an immune response against non-malignant host tissues, as explained above. We have reproduced the relevant section below:

      “However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline, since all nucleated cells express MHC-I.

      This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”

      If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.”

      (3) In the title, the authors claim that DHODH enhances the efficacy of ICB. However, the experiment shown in Figure 5D does not demonstrate this. The Kaplan Meier curves reflect more of an additive response versus a synergistic combination. Furthermore, the concurrent treatment of BQ and ICB seems to inhibit the efficacy of ICB due to BQ toxicity in immune cells. This result seems to contradict the title.

      We do not agree with this assessment. Given that the effect of dual ICB alone was very marginal, while the effect of BQ monotherapy was quite marked, we cannot conclude from Fig 5 that BQ treatment inhibited ICB efficacy due to immune suppression.

      (4) Related to Point 3, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. One explanation for the results is that BQ treatment reduces tumor burden, and then a subsequent course of ICB also reduces tumor burden but not that the two therapies are functioning in synergy. To address this, the authors should measure the duration of BQ mediated induction of antigen presentation after stopping treatment.

      We agree that the alternative explanation proposed by Reviewer #2 is possible and we appreciate the suggestion to test the stability of APP induction after stopping BQ treatment.

      (5) In Figure 1, the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level. However, they only validate MHC-I by flow cytometry. A simple experiment to evaluate the effect of BQ treatment on MHC-II surface expression would provide important additional mechanistic insight into the immunomodulatory effects of DHODH inhibition, especially given recent literature reinforcing the importance of MHC-II expression on epithelial cancers, including melanoma (Oliveira et al. Nature 2022).

      We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.

      If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.

      In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.

      [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]

      Minor Points

      (1) The authors show ChIP-seq tracks from Tan et al. for HLA-B. However, given the pervasive effect of Ter treatment across many HLA genes, the authors should either show tracks at additional loci, or provide a heatmap of read density across more loci. This would substantiate the mechanistic claim that RNA Pol II occupancy and activity across antigen presentation genes is the major driver of response to DHODH inhibition as opposed to mRNA stabilization/increased translation.

      We appreciate this suggestion. We have changed Fig 4 by replacing the HLA-B track (old Fig 4E) with a representation of fold change (Ter/DMSO) in Pol II occupancy versus fold change (Ter/DMSO) in mRNA abundance for 23 relevant genes (new Fig 4G); both of these datasets were obtained from the Tan et al manuscript. This new figure panel (Fig 4G) also shows linear regression analysis demonstrating that Pol II occupancy and mRNA expression are significantly correlated for APP genes. While we recognize that this data in itself is not formal proof of our hypothesis, it does strongly support the notion that increased transcription is responsible for the increased mRNA abundance of APP genes that we have observed.

      (2) A compelling way to demonstrate a change in antigen presentation is through mass spectrometry based immunopeptidomics. Performing immunopeptidomic analysis of BQ treated cell lines would provide substantial mechanistic insight into the outcome of BQ treatment. While this approach may be outside the scope of the current work, the authors should speculate on how this treatment may specifically alter the antigenic landscape where future directions would include empirical immunopeptidomics measurements.

      We fully agree with this comment. While the abundance of cancer cell surface MHC-I is an important factor for anticancer immunity, another crucial factor is the identity of peptides that are presented. Treatments that cause presentation of more immunogenic peptides can enhance T-cell recognition even in the absence of a relative change in cell surface MHC-I abundance.

      While we did not perform the immunopeptidomics experiments described, we can offer some speculation regarding this comment. As shown in Fig 1D-E, transcriptomics experiments suggest that immunoproteasome subunits (PSMB8, PSMB9, PSMB10) are upregulated upon DHODH inhibition. If this change in mRNA levels translates into greater immunoproteasome activity (which was not tested in our study), this would be expected to alter the repertoire of peptides available for presentation and could thereby change the immunopeptidome.

      However, this hypothesis requires direct testing, and we hope future studies will delineate the effects of DHODH inhibition and other cancer therapies on the immunopeptidome, as this area of research will have important clinical implications.

      (3) While the signaling through CDK9 seems convincing, it still does not provide a mechanistic link between depleted pyrimidines and CDK9 activity. The authors should speculate on the mechanism that signals to CDK9.

      We agree with the assessment. A mechanistic link between depleted pyrimidines and CDK9 activity will be a subject of future studies.

      (4) Related to minor point 2, the authors should consider a genetic approach to confirm the importance of CDK9. While the pharmacological approach, including multiple mechanistically distinct CDK9 inhibitors provides strong evidence, an additional experiment with genetic depletion of CDK9 (CRISPR KO, shRNA, etc) would provide compelling mechanistic confirmation.

      Reviewer #1 raised this very same point, and we agree. Please see our reply to Reviewer #1, which details why we did not pursue this approach and argues that the evidence we present is compelling even in absence of genetic manipulation.

      Additionally, please see the new Fig 4E and 4F, which is a repeat of Fig 4B using HCT116 cells. Figure 4E shows that, in this cell line, CDK9 inhibitors (flavopiridol, dinaciclib, and AT7519) block BQ-mediated APP induction, while PROTAC2 does not. Figure 4F shows that (for reasons we cannot fully explain) PROTAC2 does not lead to CDK9 degradation in HCT116 cells. This data strongly implicates CDK9, because it excludes a CDK9-degradation-independent effect of PROTAC2.

      (5) Figure 2B needs a legend.

      Thank you for pointing this out. We have added a legend to Fig 2B.

      (6) The authors should comment in the discussion on how this strategy may be particularly useful in patients harboring genetic or epigenetic loss of interferon signaling, a known mechanism of ICB resistance. Perhaps DHODH inhibition could rescue MHC expression in cells that are deficient in interferon sensing.

      Thank you for this suggestion! We have amended the Discussion section to mention this important point. Please see paragraph 2 of the revised Discussion section where we have added the following text:

      “Because BQ-mediated APP induction does not require interferon signaling, this strategy may have particular relevance for clinical scenarios in which tumor antigen presentation is dampened by the loss or silencing of cancer cell interferon signaling, which has been demonstrated to confer both intrinsic and acquired ICB resistance in human melanoma patients.”

      Reviewer #3 (Recommendations For The Authors):

      The authors present convincing evidence of the mechanism by which pyrimidine nucleotides regulate MHC I levels and about the potential of combining DHODH inhibitors with dual immune checkpoint blockade (ICB). This is an interesting paper given the clinical relevance of DHODH inhibitors. The studies raise some questions, and some points might need clarifying as below:

      • In Figure 2C, why do the authors focus on these two genes in the uridine rescue? These are important genes mediating antigen presentation, but it might be more interesting to see how H2-Db and H2-Kb expression correlate with the protein data shown in Fig 2D. Fig. 2C-2D is a relevant control, so it would be important to validate in a different cancer cell line (e.g. one of the PDAC cell lines used for the RNAseq).

      We appreciate this comment. Although Fig 3C shows that BQ-induced expression of H2-Db, H2-Kb, and B2m is reversed by uridine (in B16F10 cells), we recognize that this was not the best placement for this data, as it can easily be overlooked here since uridine reversal is not the main point of Fig 3C. We have left Fig 3C as is, because we think that the uridine reversal demonstrated in that panel serves as a good internal positive control for reversal of BQ-mediated APP induction in that experiment.

      We have repeated the experiments shown in the original Fig 2C and substituted the original Fig 2C with a new Fig 2C and Fig S2B, which show both Tap1 and Nlrc5 as well as H2-Db, H2-Kb, and B2m after treatment with either BQ (new Fig 2C) or teriflunomide (new Fig S2B). The original Fig S2B is now Fig S2C, and it shows that uridine has no effect on the expression of any of the genes assayed in the new Fig 2C or S2B.

      The reversibility of cell surface MHC-I induction was also validated in HCT116 cells (Fig 3F). We included the uridine reversal in Fig 3F to avoid duplicating the control and BQ FACS data in multiple panels.

      We have also added the qPCR data for HCT116 cells showing this same phenotype (at the mRNA level), which is the new Fig S2D.

      We decided to prioritize HCT116 cells for our mechanistic studies (Figures S2D, S4A, and 4E-F) because previous reports indicate that it is diploid and therefore less genetically deranged compared to our other cancer cell lines.

      • Figure 2F shows an elegant experiment to discard off-target effects related to cell death and to confirm that the increased MHC I expression is uniquely dependent on pyrimidines. DHODH has recently been involved in ferroptosis, a highly immunogenic type of cell death. What are the authors´ thoughts on BQ-induced ferroptosis as a possible contributor to the effects of ICB? Does BQ + ferroptosis inhibitor (ferrostatin) affect cell surface MHC I and/or expression of antigen processing genes?

      The potential role of DHODH in ferroptosis protection (Mao et al 2021) has important implications, so we are glad that multiple reviewers raised questions concerning ferroptosis. We did not directly test the effect of ferroptosis inducing agents (with or without BQ) on MHC-I/APP expression, but that is certainly a worthwhile line of investigation.

      The DHODH/ferroptosis issue is complicated by a study pointed out by Reviewer #1 that challenges the role of DHODH inhibition in BQ-mediated ferroptosis sensitization (Mishima et al, 2022). This study argues that high-dose BQ treatment causes FSP1 inhibition, and this underlies the effect of BQ on the cellular response to ferroptosis-inducing agents.

      Regardless of whether BQ-induced ferroptosis-sensitization is dependent on DHODH, FSP1, or some other factor, the Mao and Mishima studies agree that a relatively high dose of BQ is required to observe these effects (100-200µM for most cell lines and >50µM even in the most ferroptosis-sensitive cell lines). As we explained above, we consider it very unlikely that the in vivo BQ exposure in our experiments (Fig 5) was high enough to cause significant ferroptosis, especially in the absence of any dedicated ferroptosis-inducing agent (which is typically required to cause ferroptosis even in the presence of high-dose BQ).

      • The authors nail down the mechanism to CDK9 (Fig 4). However, all these experiments are performed in 293T cells. I would like to see a repeat of Fig. 4B in a cancer cell line (either PDAC or B16). Also, does BQ have any effect on CDK9 expression/protein levels?

      We have added two figure panels that address this comment (new Fig 4E and 4F). Figure 4E (which is a repeat of Fig 4B with HCT116 cells) shows that CDK9 inhibitors (flavopiridol, AT7519, and dinaciclib) reverse BQ-mediated APP induction in HCT116 cells (this agrees with Fig S4A showing that flavopiridol reverses MHC induction by various nucleotide synthesis inhibitors in this cell line), but PROTAC2 does not. Figure 4F shows that PROTAC2 (for reasons we cannot explain) does not cause CDK9 degradation in HCT116 cells. This adds further support to our thesis that CDK9 is a critical mediator of BQ-mediated APP induction (because how else can this pattern of results be explained?). The text of the Results section has been amended to reflect this.

      We chose to use HCT116 cells for this repeat experiment 1) to align with Fig S4A and 2) because, as previously mentioned, we consider HCT116 to be a good cell line for mechanistic studies because of its relative lack of idiosyncratic genetic features (compared to CFPAC-1, for example, which was derived from a patient with cystic fibrosis).

      • What are the differences in tumor size for the experiment shown in Figure 5E? What about tumor cell death in the ICB vs. BQ+ICB groups?

      Because this was a survival assay, direct comparisons of tumor volumes between groups was not possible at later time points, since mice that die or have to be euthanized are removed from their experimental group, which lowers the average group tumor burden at subsequent time points. Although tumor volume was the most common euthanasia criteria reached, a subset of mice were either found dead or had to be euthanized for other reasons attributed to their tumor burden (moribund state, inability to ambulate or stand, persistent bleeding from tumor ulceration, severe loss of body mass, etc.). This confounds any comparison of endpoint measurements (such as immunohistochemical quantification of tumor cell death markers, T-cell markers, etc.).

      • The different response in the concurrent vs delayed treatment is very interesting. The authors suggest two possible mechanisms to explain this: "1) Concurrent BQ dampens the initial anticancer immune response generated by dual ICB, or b) cancer cell MHC-I and related genes are not maximally upregulated at the time of ICB administration with concurrent treatment". However, and despite the caveat of comparing the in vitro to the in vivo setting, Fig 2D shows upregulation of MHC I already at 24h of treatment in B16 cells. Have the authors checked T cell infiltration in the concurrent and delayed treatment setting?

      For the same reasons described in response to the preceding comment, tumors harvested upon mouse death/euthanasia from our survival experiment were not suitable for cross-cohort comparison of tumor endpoint measurements. An additional experiment in which mice are necropsied at a prespecified time point (before any mice have died or reached euthanasia criteria, as in the experiment for Fig 5A-D) would be required to answer this question.

      • Page 5, line 181 -do the authors mean "nucleotide salvage inhibitors" instead of "synthesis"?

      We believe the reviewer is referring to the following sentence:

      “The other drugs screened included nucleotide synthesis inhibitors (5-fluorouracil, methotrexate, gemcitabine, and hydroxyurea), DNA damage inducers (oxaliplatin, irinotecan, and cytarabine), a microtubule targeting drug (paclitaxel), a DNA methylation inhibitor (azacytidine), and other small molecule inhibitors (Fig 2F).”

      In this context, we believe our use of “synthesis” instead of “salvage” is correct, because methotrexate and 5-FU inhibit thymidylate synthase (which mediates de novo dTTP synthesis), while gemcitabine and hydroxyurea inhibit ribonucleotide reductase (which mediates de novo synthesis of all dNTPs).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations

      Recommendation #1: Address potential confounds in the experimental design:

      (1a) Confounding factors between baseline to early learning. While the visual display of the curved line remains constant, there are at least three changes between these two phases: 1) the presence of reward feedback (the focus of the paper); 2) a perturbation introduced to draw a hidden, mirror-symmetric curved line; 3) instructions provided to use reward feedback to trace the line on the screen (intentionally deceitful). As such, it remains unclear which of these factors are driving the changes in both behavior and bold signals between the two phases. The absence of a veridical feedback phase in which participants received reward feedback associated with the shown trajectory seems like a major limitation.

      (1b) Confounding Factors Between Early and Late Learning. While the authors have focused on interpreting changes from early to late due to the explore-exploit trade-off, there are three additional factors possibly at play: 1) increasing fatigue, 2) withdrawal of attention, specifically related to individuals who have either successfully learned the perturbation within the first few trials or those who have simply given up, or 3) increasing awareness of the perturbation (not clear if subjective reports about perturbation awareness were measured.). I understand that fMRI research is resource-intensive; however, it is not clear how to rule out these alternatives with their existing data without additional control groups. [Another reviewer added the following: Why did the authors not acquire data during a control condition? How can we be confident that the neural dynamics observed are not due to the simple passage of time? Or if these effects are due to the task, what drives them? The reward component, the movement execution, increased automaticity?]

      We have opted to address both of these points above within a single reply, as together they suggest potential confounding factors across the three phases of the task. We would agree that, if the results of our pairwise comparisons (e.g., Early > Baseline or Late > Early) were considered in isolation from one another, then these critiques of the study would be problematic. However, when considering the pattern of effects across the three task phases, we believe most of these critiques can be dismissed. Below, we first describe our results in this context, and then discuss how they address the reviewers’ various critiques.

      Recall that from Baseline to Early learning, we observe an expansion of several cortical areas (e.g., core regions in the DMN) along the manifold (red areas in Fig. 4A, see manifold shifts in Fig. 4C) that subsequently exhibit contraction during Early to Late learning (blue areas in Fig. 4B, see manifold shifts in Fig. 4D). We show this overlap in brain areas in Author response image 1 below, panel A. Notably, several of these brain areas appear to contract back to their original, Baseline locations along the manifold during Late learning (compare Fig. 4C and D). This is evidenced by the fact that many of these same regions (e.g., DMN regions, in Author response image 1 panel A below) fail to show a significant difference between the Baseline and Late learning epochs (see Author response image 1 panel B below, which is taken from supplementary Fig 6). That is, the regions that show significant expansion and subsequent contraction (in Author response image 1 panel A below) tend not to overlap with the regions that significantly changed over the time course of the task (in Author response image 1 panel B below).

      Author response image 1.

      Note that this basic observation above is not only true of our regional manifold eccentricity data, but also in the underlying functional connectivity data associated with individual brain regions. To make this second point clearer, we have modified and annotated our Fig. 5 and included it below. Note the reversal in seed-based functional connectivity from Baseline to Early learning (leftmost brain plots) compared to Early to Late learning (rightmost brain plots). That is, it is generally the case that for each seed-region (A-C) the areas that increase in seed-connectivity with the seed region (in red; leftmost plot) are also the areas that decrease in seed-connectivity with the seed region (in blue; rightmost plot), and vice versa. [Also note that these connectivity reversals are conveyed through the eccentricity data — the horizontal red line in the rightmost plots denote the mean eccentricity of these brain regions during the Baseline phase, helping to highlight the fact that the eccentricity of the Late learning phase reverses back towards this Baseline level].

      Author response image 2.

      Critically, these reversals in brain connectivity noted above directly counter several of the critiques noted by the reviewers. For instance, this reversal pattern of effects argues against the idea that our results during Early Learning can be simply explained due to the (i) presence of reward feedback, (ii) presence of the perturbation or (iii) instructions to use reward feedback to trace the path on the screen. Indeed, all of these factors are also present during Late learning, and yet many of the patterns of brain activity during this time period revert back to the Baseline patterns of connectivity, where these factors are absent. Similarly, this reversal pattern strongly refutes the idea that the effects are simply due to the passage of time, increasing fatigue, or general awareness of the perturbation. Indeed, if any of these factors alone could explain the data, then we would have expected a gradual increase (or decrease) in eccentricity and connectivity from Baseline to Early to Late learning, which we do not observe. We believe these are all important points when interpreting the data, but which we failed to mention in our original manuscript when discussing our findings.

      We have now rectified this in the revised paper, where we now write in our Discussion:

      “Finally, it is important to note that the reversal pattern of effects noted above suggests that our findings during learning cannot be simply attributed to the introduction of reward feedback and/or the perturbation during Early learning, as both of these task-related features are also present during Late learning. In addition, these results cannot be simply explained due to the passage of time or increasing subject fatigue, as this would predict a consistent directional change in eccentricity across the Baseline, Early and Late learning epochs.”

      However, having said the above, we acknowledge that one potential factor that our findings cannot exclude is that they are (at least partially) attributable to changes in subjects’ state of attention throughout the task. Indeed, one can certainly argue that Baseline trials in our study don’t require a great deal of attention (after all, subjects are simply tracing a curved path presented on the screen). Likewise, for subjects that have learned the hidden shape, the Late learning trials are also likely to require limited attentional resources (indeed, many subjects at this point are simply producing the same shape trial after trial). Consequently, the large shift in brain connectivity that we observe from Baseline to Early Learning, and the subsequent reversion back to Baseline-levels of connectivity during Late learning, could actually reflect a heightened allocation of attention as subjects are attempting to learn the (hidden) rewarded shape. However, we do not believe that this would reflect a ‘confound’ of our study per se — indeed, any subject who has participated in a motor learning study would agree that the early learning phase of a task is far more cognitively demanding than Baseline trials and Late learning trials. As such, it is difficult to disentangle this ‘attention’ factor from the learning process itself (and in fact, it is likely central to it).

      Of course, one could have designed a ‘control’ task in which subjects must direct their attention to something other than the learning task itself (e.g., divided attention paradigm, e.g., Taylor & Thoroughman, 2007, 2008, and/or perform a secondary task concurrently (Codol et al., 2018; Holland et al., 2018), but we know that this type of manipulation impairs the learning process itself. Thus, in such a case, it wouldn’t be obvious to the experimenter what they are actually measuring in brain activity during such a task. And, to extend this argument even further, it is true that any sort of brain-based modulation can be argued to reflect some ‘attentional’ process, rather than modulations related to the specific task-based process under consideration (in our case, motor learning). In this regard, we are sympathetic to the views of Richard Andersen and colleagues who have eloquently stated that “The study of how attention interacts with other neural processing systems is a most important endeavor. However, we think that over-generalizing attention to encompass a large variety of different neural processes weakens the concept and undercuts the ability to develop a robust understanding of other cognitive functions.” (Andersen & Cui, 2007, Neuron). In short, it appears that different fields/researchers have alternate views on the usefulness of attention as an explanatory construct (see also articles from Hommel et al., 2019, “No one knows what attention is”, and Wu, 2023, “We know what attention is!”), and we personally don’t have a dog in this fight. We only highlight these issues to draw attention (no pun intended) that it is not trivial to separate these different neural processes during a motor learning study.

      Nevertheless, we do believe these are important points worth flagging for the reader in our paper, as they might have similar questions. To this end, we have now included in our Discussion section the following text:

      “It is also possible that some of these task-related shifts in connectivity relate to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      Finally, we should note that, at the end of testing, we did not assess participants' awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path). In hindsight, this would have been a good idea and provided some value to the current project. Nevertheless, it seems clear that, based on several of the learning profiles observed (e.g., subjects who exhibited very rapid learning during the Early Learning phase, more on this below), that many individuals became aware of a shape approximating the rewarded path. Note that we have included new figures (see our responses below) that give a better example of what fast versus slower learning looks like. In addition, we now note in our Methods that we did not probe participants about their subjective awareness re: the perturbation:

      “Note that, at the end of testing, we did not assess participants’ awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path of the visible path).”

      Recommendation #2: Provide more behavioral quantification.

      (2a) The authors chose to only plot the average learning score in Figure 1D, without an indication of movement variability. I think this is quite important, to give the reader an impression of how variable the movements were at baseline, during early learning, and over the course of learning. There is evidence that baseline variability influences the 'detectability' of imposed rotations (in the case of adaptation learning), which could be relevant here. Shading the plots by movement variability would also be important to see if there was some refinement of the moment after participants performed at the ceiling (which seems to be the case ~ after trial 150). This is especially worrying given that in Fig 6A there is a clear indication that there is a large difference between subjects' solutions on the task. One subject exhibits almost a one-shot learning curve (reaching a score of 75 after one or two trials), whereas others don't seem to really learn until the near end. What does this between-subject variability mean for the authors' hypothesized neural processes?

      In line with these recommendations, we have now provided much better behavioral quantification of subject-level performance in both the main manuscript and supplementary material. For instance, in a new supplemental Figure 1 (shown below), we now include mean subject (+/- SE) reaction times (RTs), movement times (MTs) and movement path variability (our computing of these measures are now defined in our Methods section).

      As can be seen in the figure, all three of these variables tended to decrease over the course of the study, though we note there was a noticeable uptick in both RTs and MTs from the Baseline to Early learning phase, once subjects started receiving trial-by-trial reward feedback based on their movements. With respect to path variability, it is not obvious that there was a significant refinement of the paths created during late learning (panel D below), though there was certainly a general trend for path variability to decrease over learning.

      Author response image 3.

      Behavioral measures of learning across the task. (A-D) shows average participant reward scores (A), reaction times (B), movement times (C) and path variability (D) over the course of the task. In each plot, the black line denotes the mean across participants and the gray banding denotes +/- 1 SEM. The three equal-length task epochs for subsequent neural analyses are indicated by the gray shaded boxes.

      In addition to these above results, we have also created a new Figure 6 in the main manuscript, which now solely focuses on individual differences in subject learning (see below). Hopefully, this figure clarifies key features of the task and its reward structure, and also depicts (in movement trajectory space) what fast versus slow learning looks like in the task. Specifically, we believe that this figure now clearly delineates for the reader the mapping between movement trajectory and the reward score feedback presented to participants, which appeared to be a source of confusion based on the reviewers’ comments below. As can be clearly observed in this figure, trajectories that approximated the ‘visible path’ (black line) resulted in fairly mediocre scores (see score color legend at right), whereas trajectories that approximated the ‘reward path’ (dashed black line, see trials 191-200 of the fast learner) resulted in fairly high scores. This figure also more clearly delineates how fPCA loadings derived from our functional data analysis were used to derive subject-level learning scores (panel C).

      Author response image 4.

      Individual differences in subject learning performance. (A) Examples of a good learner (bordered in green) and poor learner (bordered in red). (B) Individual subject learning curves for the task. Solid black line denotes the mean across all subjects whereas light gray lines denote individual participants. The green and red traces denote the learning curves for the example good and poor learners denoted in A. (C) Derivation of subject learning scores. We performed functional principal component analysis (fPCA) on subjects’ learning curves in order to identify the dominant patterns of variability during learning. The top component, which encodes overall learning, explained the majority of the observed variance (~75%). The green and red bands denote the effect of positive and negative component scores, respectively, relative to mean performance. Thus, subjects who learned more quickly than average have a higher loading (in green) on this ‘Learning score’ component than subjects who learned more slowly (in red) than average. The plot at right denotes the loading for each participant (open circles) onto this Learning score component.

      The reviewers note that there are large individual differences in learning performance across the task. This was clearly our hope when designing the reward structure of this task, as it would allow us to further investigate the neural correlates of these individual differences (indeed, during pilot testing, we sought out a reward structure to the task that would allow for these intersubject differences). The subjects who learn early during the task end up having higher fPCA scores than the subjects who learn more gradually (or learn the task late). From our perspective, these differences are a feature, and not a bug, and they do not negate any of our original interpretations. That is, subjects who learn earlier on average tend to contract their DAN-A network during the early learning phase whereas subjects who learn more slowly on average (or learn late) instead tend to contract their DAN-A network during late learning (Fig. 7).

      (2b) In the methods, the authors stated that they scaled the score such that even a perfectly traced visible path would always result in an imperfect score of 40 patients. What happens if a subject scores perfectly on the first try (which seemed to have happened for the green highlighted subject in Fig 6A), but is then permanently confronted with a score of 40 or below? Wouldn't this result in an error-clamp-like (error-based motor adaptation) design for this subject and all other high performers, which would vastly differ from the task demands for the other subjects? How did the authors factor in the wide between-subject variability?

      We think the reviewers may have misinterpreted the reward structure of the task, and we apologize for not being clearer in our descriptions. The reward score that subjects received after each trial was based on how well they traced the mirror-image of the visible path. However, all the participant can see on the screen is the visible path. We hope that our inclusion of the new Figure 6 (shown above) makes the reward structure of the task, and its relationship to movement trajectories, much clearer. We should also note that, even for the highest performing subject (denoted in Fig. 6), it still required approximately 20 trials for them to reach asymptote performance.

      (2c) The study would benefit from a more detailed description of participants' behavioral performance during the task. Specifically, it is crucial to understand how participants' motor skills evolve over time. Information on changes in movement speed, accuracy, and other relevant behavioral metrics would enhance the understanding of the relationship between behavior and brain activity during the learning process. Additionally, please clarify whether the display on the screen was presented continuously throughout the entire trial or only during active movement periods. Differences in display duration could potentially impact the observed differences in brain activity during learning.

      We hope that with our inclusion of the new Supplementary Figure 1 (shown above) this addresses the reviewers’ recommendation. Generally, we find that RTs, MTs and path variability all decrease over the course of the task. We think this relates to the early learning phase being more attentionally demanding and requiring more conscious effort, than the later learning phases.

      Also, yes, the visible path was displayed on the screen continuously throughout the trial, and only disappeared at the 4.5 second mark of each trial (when the screen was blanked and the data was saved off for 1.5 seconds prior to commencement of the next trial; 6 seconds total per trial). Thus, there were no differences in display duration across trials and phases of the task. We have now clarified this in the Methods section, where we now write the following:

      “When the cursor reached the target distance, the target changed color from red to green to indicate that the trial was completed. Importantly, other than this color change in the distance marker, the visible curved path remained constant and participants never received any feedback about the position of their cursor.”

      (2d) It is unclear from plots 6A, 6B, and 1D how the scale of the behavioral data matches with the scaling of the scores. Are these the 'real' scores, meaning 100 on the y-axis would be equivalent to 40 in the task? Why then do all subjects reach an asymptote at 75? Or is 75 equivalent to 40 and the axis labels are wrong?

      As indicated above, we clearly did a poor job of describing the reward structure of our task in our original paper, and we now hope that our inclusion of Figure 6 makes things clear. A ‘40’ score on the y-axis would indicate that a subject has perfectly traced the visible path whereas a perfect ‘100’ score would indicate that a subject has perfectly traced the (hidden) mirror image path.

      The fact that several of the subjects reach asymptote around 75 is likely a byproduct of two factors. Firstly, the subjects performed their movements in the absence of any visual error feedback (they could not see the position of a cursor that represented their hand position), which had the effect of increasing motor variability in their actions from trial to trial. Secondly, there appears to be an underestimation among subjects regarding the curvature of the concealed, mirror-image path (i.e., that the rewarded path actually had an equal but opposite curvature to that of the visible path). This is particularly evident in the case of the top-performing subject (illustrated in Figure 6A) who, even during late learning, failed to produce a completely arched movement.

      (2e) Labeling of Contrasts: There is a consistent issue with the labeling of contrasts in the presented figures, causing confusion. While the text refers to the difference as "baseline to early learning," the label used in figures, such as Figure 4, reads "baseline > early." It is essential to clarify whether the presented contrast is indeed "baseline > early" or "early > baseline" to avoid any misinterpretation.

      We thank the reviewers for catching this error. Indeed, the intended label was Early > Baseline, and this has now been corrected throughout.

      Recommendation #3. Clarify which motor learning mechanism(s) are at play.

      (3a) Participants were performing at a relatively low level, achieving around 50-60 points by the end of learning. This outcome may not be that surprising, given that reward-based learning might have a substantial explicit component and may also heavily depend on reasoning processes, beyond reinforcement learning or contextual recall (Holland et al., 2018; Tsay et al., 2023). Even within our own data, where explicit processes are isolated, average performance is low and many individuals fail to learn (Brudner et al., 2016; Tsay et al., 2022). Given this, many participants in the current study may have simply given up. A potential indicator of giving up could be a subset of participants moving straight ahead in a rote manner (a heuristic to gain moderate points). Consequently, alterations in brain networks may not reflect exploration and exploitation strategies but instead indicate levels of engagement and disengagement. Could the authors plot the average trajectory and the average curvature changes throughout learning? Are individuals indeed defaulting to moving straight ahead in learning, corresponding to an average of 50-60 points? If so, the interpretation of brain activity may need to be tempered.

      We can do one better, and actually give you a sense of the learning trajectories for every subject over time. In the figure below, which we now include as Supplementary Figure 2 in our revision, we have plotted, for each subject, a subset of their movement trajectories across learning trials (every 10 trials). As can be seen in the diversity of these trajectories, the average trajectory and average curvature would do a fairly poor job of describing the pattern of learning-related changes across subjects. Moreover, it is not obvious from looking at these plots the extent to which poor learning subjects (i.e., subjects who never converge on the reward path) actually ‘give up’ in the task — rather, many of these subjects still show some modulation (albeit minor) of their movement trajectories in the later trials (see the purple and pink traces). As an aside, we are also not entirely convinced that straight ahead movements, which we don’t find many of in our dataset, can be taken as direct evidence that the subject has given up.

      Author response image 5

      Variability in learning across subjects. Plots show representative trajectory data from each subject (n=36) over the course of the 200 learning trials. Coloured traces show individual trials over time (each trace is separated by ten trials, e.g., trial 1, 10, 20, 30, etc.) to give a sense of the trajectory changes throughout the task (20 trials in total are shown for each subject).

      We should also note that we are not entirely opposed to the idea of describing aspects of our findings in terms of subject engagement versus disengagement over time, as such processes are related at some level to exploration (i.e., cognitive engagement in finding the best solution) and exploitation (i.e., cognitively disengaging and automating one’s behavior). As noted in our reply to Recommendation #1 above, we now give some consideration of these explanations in our Discussion section, where we now write:

      “It is also possible that these task-related shifts in connectivity relates to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      (3b) The authors are mixing two commonly used paradigms, reward-based learning, and motor adaptation, but provide no discussion of the different learning processes at play here. Which processes were they attempting to probe? Making this explicit would help the reader understand which brain regions should be implicated based on previous literature. As it stands, the task is hard to interpret. Relatedly, there is a wealth of literature on explicit vs implicit learning mechanisms in adaptation tasks now. Given that the authors are specifically looking at brain structures in the cerebral cortex that are commonly associated with explicit and strategic learning rather than implicit adaptation, how do the authors relate their findings to this literature? Are the learning processes probed in the task more explicit, more implicit, or is there a change in strategy usage over time? Did the authors acquire data on strategies used by the participants to solve the task? How does the baseline variability come into play here?

      As noted in our paper, our task was directly inspired by the reward-based motor learning tasks developed by Dam et al., 2013 (Plos One) and Wu et al., 2014 (Nature Neuroscience). What drew us to these tasks is that they allowed us to study the neural bases of reward-based learning mechanisms in the absence of subjects also being able to exploit error-based mechanisms to achieve learning. Indeed, when first describing the task in the Results section of our paper we wrote the following:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014).”

      If the reviewers are referring to ‘motor adaptation’ in the context in which that terminology is commonly used — i.e., the use of sensory prediction errors to support error-based learning — then we would argue that motor adaptation is not a feature of the current study. It is true that in our study subjects learn to ‘adapt’ their movements across trials, but this shaping of the movement trajectories must be supported through reinforcement learning mechanisms (and, of course, supplemented by the use of cognitive strategies as discussed in the nice review by Tsay et al., 2023). We apologize for not being clearer in our paper about this key distinction and we have now included new text in the introduction to our Results to directly address this:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014). That is, subjects could not use error-based learning mechanisms to achieve learning in our study, as this form of learning requires sensory errors that convey both the change in direction and magnitude needed to correct the movement.”

      With this issue aside, we are well aware of the established framework for thinking about sensorimotor adaptation as being composed of a combination of explicit and implicit components (indeed, this has been a central feature of several of our other recent neuroimaging studies that have explored visuomotor rotation learning, e.g., Gale et al., 2022 PNAS, Areshenkoff et al., 2022 elife, Standage et al., 2023 Cerebral Cortex). However, there has been comparably little work done on these parallel components within the domain of reinforcement learning tasks (though see Codol et al., 2018; Holland et al., 2018, van Mastrigt et al., 2023; see also the Tsay et al., 2023 review), and as far as we can tell, nothing has been done to date in the reward-based motor learning area using fMRI. By design, we avoided using descriptors of ‘explicit’ or ‘implicit’ in our study because our experimental paradigm did not allow a separate measurement of those two components to learning during the task. Nevertheless, it seems clear to us from examining the subjects’ learning curves (see supplementary figure 2 above), that individuals who learn very quickly are using strategic processes (such as action exploration to identify the best path) to enhance their learning. As we noted in an above response, we did not query subjects after the fact about their strategy use, which admittedly was a missed opportunity on our part.

      Author response image 6.

      With respect to the comment on baseline variability and its relationship to performance, this is an interesting idea and one that was explored in the Wu et al., 2014 Nature Neuroscience paper. Prompted by the reviewers, we have now explored this idea in the current data set by testing for a relationship between movement path variability during baseline trials (all 70 baseline trials, see Supplementary Figure 1D above for reference) and subjects’ fPCA score on our learning task. However, when we performed this analysis, we did not observe a significant positive relationship between baseline variability and subject performance. Rather, we actually found a trend towards a negative relationship (though this was non-significant; r=-0.2916, p=0.0844). Admittedly, we are not sure what conclusions can be drawn from this analysis, and in any case, we believe it to be tangential to our main results. We provide the results (at right) for the reviewers if they are interested. This may be an interesting avenue for exploration in future work.

      Recommendation #4: Provide stronger justification for brain imaging methods.

      (4a) Observing how brain activity varies across these different networks is remarkable, especially how sensorimotor regions separate and then contract with other, more cognitive areas. However, does the signal-to-noise ratio in each area/network influence manifold eccentricity and limit the possible changes in eccentricity during learning? Specifically, if a region has a low signal-to-noise ratio, it might exhibit minimal changes during learning (a phenomenon perhaps relevant to null manifold changes in the striatum due to low signal-to-noise); conversely, regions with higher signal-to-noise (e.g., motor cortex in this sensorimotor task) might exhibit changes more easily detected. As such, it is unclear how to interpret manifold changes without considering an area/network's signal-to-noise ratio.

      We appreciate where these concerns are coming from. First, we should note that the timeseries data used in our analysis were z-transformed (mean zero, 1 std) to allow normalization of the signal both over time and across regions (and thus mitigate the possibility that the changes observed could simply reflect mean overall signal changes across different regions). Nevertheless, differences in signal intensity across brain regions — particularly between cortex and striatum — are well-known, though it is not obvious how these differences may manifest in terms of a task-based modulation of MR signals.

      To examine this issue in the current data set, we extracted, for each subject and time epoch (Baseline, Early and Late learning) the raw scanner data (in MR arbitrary units, a.u.) for the cortical and striatal regions and computed the (1) mean signal intensity, (2) standard deviation of the signal (Std) and (3) temporal signal to noise ratio (tSNR; calculated by mean/Std). Note that in the fMRI connectivity literature tSNR is often the preferred SNR measure as it normalizes the mean signal based on the signal’s variability over time, thus providing a general measure of overall ‘signal quality’. The results of this analysis, averaged across subjects and regions, is shown below.

      Author response image 7.

      Note that, as expected, the overall signal intensity (left plot) of cortex is higher than in the striatum, reflecting the closer proximity of cortex to the receiver coils in the MR head coil. In fact, the signal intensity in cortex is approximately 38% higher than that in the striatum (~625 - 450)/450). However, the signal variation in cortex is also greater than striatum (middle plot), but in this case approximately 100% greater (i.e., (~5 - 2.5)/2.5)). The result of this is that the tSNR (mean/std) for our data set and the ROI parcellations we used is actually greater in the striatum than in cortex (right plot). Thus, all else being equal, there seems to have been sufficient tSNR in the striatum for us to have detected motor-learning related effects. As such, we suspect the null effects for the striatum in our study actually stem from two sources.

      The first likely source is the relatively lower number of striatal regions (12) as compared to cortical regions (998) used in our analysis, coupled with our use of PCA on these data (which, by design, identifies the largest sources of variation in connectivity). In future studies, this unbalance could be rectified by using finer parcellations of the striatum (even down to the voxel level) while keeping the same parcellation of cortex (i.e., equate the number of ‘regions’ in each of striatum and cortex). The second likely source is our use of a striatal atlas (the Harvard-Oxford atlas) that divides brain regions based on their neuroanatomy rather than their function. In future work, we plan on addressing this latter concern by using finer, more functionally relevant parcellations of striatum (such as in Tian et al., 2020, Nature Neuroscience). Note that we sought to capture these interrelated possible explanations in our Discussion section, where we wrote the following:

      “While we identified several changes in the cortical manifold that are associated with reward-based motor learning, it is noteworthy that we did not observe any significant changes in manifold eccentricity within the striatum. While clearly the evidence indicates that this region plays a key role in reward-guided behavior (Averbeck and O’Doherty, 2022; O’Doherty et al., 2017), there are several possible reasons why our manifold approach did not identify this collection of brain areas. First, the relatively small size of the striatum may mean that our analysis approach was too coarse to identify changes in the connectivity of this region. Though we used a 3T scanner and employed a widely-used parcellation scheme that divided the striatum into its constituent anatomical regions (e.g., hippocampus, caudate, etc.), both of these approaches may have obscured important differences in connectivity that exist within each of these regions. For example, areas such the hippocampus and caudate are not homogenous areas but themselves exhibit gradients of connectivity (e.g., head versus tail) that can only be revealed at the voxel level (Tian et al., 2020; Vos de Wael et al., 2021). Second, while our dimension reduction approach, by design, aims to identify gradients of functional connectivity that account for the largest amounts of variance, the limited number of striatal regions (as compared to cortex) necessitates that their contribution to the total whole-brain variance is relatively small. Consistent with this perspective, we found that the low-dimensional manifold architecture in cortex did not strongly depend on whether or not striatal regions were included in the analysis (see Supplementary Fig. 6). As such, selective changes in the patterns of functional connectivity at the level of the striatum may be obscured using our cortex x striatum dimension reduction approach. Future work can help address some of these limitations by using both finer parcellations of striatal cortex (perhaps even down to the voxel level)(Tian et al., 2020) and by focusing specifically on changes in the interactions between the striatum and cortex during learning. The latter can be accomplished by selectively performing dimension reduction on the slice of the functional connectivity matrix that corresponds to functional coupling between striatum and cortex.”

      (4b) Could the authors clarify how activity in the dorsal attention network (DAN) changes throughout learning, and how these changes also relate to individual differences in learning performance? Specifically, on average, the DAN seems to expand early and contract late, relative to the baseline. This is interpreted to signify that the DAN exhibits lesser connectivity followed by greater connectivity with other brain regions. However, in terms of how these changes relate to behavior, participants who go against the average trend (DAN exhibits more contraction early in learning, and expansion from early to late) seem to exhibit better learning performance. This finding is quite puzzling. Does this mean that the average trend of expansion and contraction is not facilitative, but rather detrimental, to learning? [Another reviewer added: The authors do not state any explicit hypotheses, but only establish that DMN coordinates activity among several regions. What predictions can we derive from this? What are the authors looking for in the data? The work seems more descriptive than hypothesis-driven. This is fine but should be clarified in the introduction.]

      These are good questions, and we are glad the reviewers appreciated the subtlety here. The reviewers are indeed correct that the relationship of the DAN-A network to behavioral performance appears to go against the grain of the group-level results that we found for the entire DAN network (which we note is composed of both the DAN-A and DAN-B networks). That is, subjects who exhibited greater contraction from Baseline to Early learning and likewise, greater expansion from Early to Late learning, tended to perform better in the task (according to our fPCA scores). However, on this point it is worth noting that it was mainly the DAN-B network which exhibited group-level expansion from Baseline to Early Learning whereas the DAN-A network exhibited negligible expansion. This can be seen in Author response image 8 below, which shows the pattern of expansion and contraction (as in Fig. 4), but instead broken down into the 17-network parcellation. The red asterisk denotes the expansion from Baseline to Early learning for the DAN-B network, which is much greater than that observed for the DAN-A network (which is basically around the zero difference line).

      Author response image 8.

      Thus, it appears that the DAN-A and DAN-B networks are modulated to a different extent during the task, which likely contributes to the perceived discrepancy between the group-level effects (reported using the 7-network parcellation) and the individual differences effects (reported using the finer 17-network parcellation). Based on the reviewers’ comments, this seems like an important distinction to clarify in the manuscript, and we have now described this nuance in our Results section where we now write:

      “...Using this permutation testing approach, we found that it was only the change in eccentricity of the DAN-A network that correlated with Learning score (see Fig. 7C), such that the more the DAN-A network decreased in eccentricity from Baseline to Early learning (i.e., contracted along the manifold), the better subjects performed at the task (see Fig. 7C, scatterplot at right). Consistent with the notion that changes in the eccentricity of the DAN-A network are linked to learning performance, we also found the inverse pattern of effects during Late learning, whereby the more that this same network increased in eccentricity from Early to Late learning (i.e., expanded along the manifold), the better subjects performed at the task (Fig. 7D). We should note that this pattern of performance effects for the DAN-A — i.e., greater contraction during Early learning and greater expansion during Late learning being associated with better learning — appears at odds with the group-level effects described in Fig. 4A and B, where we generally find the opposite pattern for the entire DAN network (composed of the DAN-A and DAN-B subnetworks). However, this potential discrepancy can be explained when examining the changes in eccentricity using the 17-network parcellation (see Supplementary Figure 8). At this higher resolution level we find that these group-level effects for the entire DAN network are being largely driven by eccentricity changes in the DAN-B network (areas in anterior superior parietal cortex and premotor cortex), and not by mean changes in the DAN-A network. By contrast, our present results suggest that it is the contraction and expansion of areas of the DAN-A network (and not DAN-B network) that are selectively associated with differences in subject learning performance.”

      Finally, re: the reviewers’ comments that we do not state any explicit hypotheses etc., we acknowledge that, beyond our general hypothesis stated at the outset about the DMN being involved in reward-based motor learning, our study is quite descriptive and exploratory in nature. Such little work has been done in this research area (i.e., using manifold learning approaches to study motor learning with fMRI) that it would be disingenuous to have any stronger hypotheses than those stated in our Introduction. Thus, to make the exploratory nature of our study clear to the reader, we have added the following text (in red) to our Introduction:

      “Here we applied this manifold approach to explore how brain activity across widely distributed cortical and striatal systems is coordinated during reward-based motor learning. We were particularly interested in characterizing how connectivity between regions within the DMN and the rest of the brain changes as participants shift from learning the relationship between motor commands and reward feedback, during early learning, to subsequently using this information, during late learning. We were also interested in exploring whether learning-dependent changes in manifold structure relate to variation in subject motor performance.”

      We hope these changes now make it obvious the intention of our study.

      (4c) The paper examines a type of motor adaptation task with a reward-based learning component. This, to me, strongly implicates the cerebellum, given that it has a long-established crucial role in adaptation and has recently been implicated in reward-based learning (see work by Wagner & Galea). Why is there no mention of the cerebellum and why it was left out of this study? Especially given that the authors state in the abstract they examine cortical and subcortical structures. It's evident from the methods that the authors did not acquire data from the cerebellum or had too small a FOV to fully cover it (34 slices at 4 mm thickness 136 mm which is likely a bit short to fully cover the cerebellum in many participants). What was the rationale behind this methodological choice? It would be good to clarify this for the reader. Related to this, the authors need to rephrase their statements on 'whole-brain' connectivity matrices or analyses - it is not whole-brain when it excludes the cerebellum.

      As we noted above, we do not believe this task to be a motor adaptation task, in the sense that subjects are not able to use sensory prediction errors (and thus error-based learning mechanisms) to improve their performance. Rather, by denying subjects this sensory error feedback they are only able to use reinforcement learning processes, along with cognitive strategies (nicely covered in Tsay et al., 2023), to improve performance. Nevertheless, we recognize that the cerebellum has been increasingly implicated in facets of reward-based learning, particularly within the rodent domain (e.g., Wagner et al., 2017; Heffley et al., 2018; Kostadinov et al., 2019, etc.). In our study, we did indeed collect data from the cerebellum but did not include it in our original analyses, as we wanted (1) the current paper to build on prior work in the human and macaque reward-learning domain (which focuses solely on striatum and cortex, and which rarely discusses cerebellum, see Averbeck & O’Doherty, 2022 & Klein-Flugge et al., 2022 for recent reviews), and, (2) allow this to be a more targeted focus of future work (specifically we plan on focusing on striatal-cerebellar interactions during learning, which are hypothesized based on the neuroanatomical tract tracing work of Bostan and Strick, etc.). We hope the reviewers respect our decisions in this regard.

      Nevertheless, we acknowledge that based on our statements about ‘whole-brain’ connectivity and vagueness about what we mean by ‘subcortex,’ that this may be confusing for the reader. We have now removed and/or corrected such references throughout the paper (however, note that in some cases it is difficult to avoid reference to “whole-brain” — e.g., “whole-brain correlation map” or “whole-brain false discovery rate correction”, which is standard terminology in the field).

      In addition, we are now explicit in our Methods section that the cerebellum was not included in our analyses.

      “Each volume comprised 34 contiguous (no gap) oblique slices acquired at a ~30° caudal tilt with respect to the plane of the anterior and posterior commissure (AC-PC), providing whole-brain coverage of the cerebrum and cerebellum. Note that for the current study, we did not examine changes in cerebellar activity during learning.”

      (4d) The authors centered the matrices before further analyses to remove variance associated with the subject. Why not run a PCA on the connectivity matrices and remove the PC that is associated with subject variance? What is the advantage of first centering the connectivity matrices? Is this standard practice in the field?

      Centering in some form has become reasonably common in the functional connectivity literature, as there is considerable evidence that task-related (or cognitive) changes in whole-brain connectivity are dwarfed by static, subject-level differences (e.g., Gratton, et al, 2018, Neuron). If covariance matrices were ordinary scalar values, then isolating task-related changes could be accomplished simply by subtracting a baseline scan or mean score; but because the space of covariance matrices is non-Euclidean, the actual computations involved in this subtraction are more complex (see our Methods). However, fundamentally (and conceptually) our procedure is simply ordinary mean-centering, but adapted to this non-Euclidean space. Despite the added complexity, there is considerable evidence that such computations — adapted directly to the geometry of the space of covariance matrices — outperform simpler methods, which treat covariance matrices as arrays of real numbers (e.g. naive substraction, see Dodero et al. & Ng et al., references below). Moreover, our previous work has found that this procedure works quite well to isolate changes associated with different task conditions (Areshenkoff et al., 2021, Neuroimage; Areshenkoff et al., 2022, elife).

      Although PCA can be adapted to work well with covariance matrix valued data, it would at best be a less direct solution than simply subtracting subjects' mean connectivity. This is because the top components from applying PCA would be dominated by both subject-specific effects (not of interest here), and by the large-scale connectivity structure typically observed in component based analyses of whole-brain connectivity (i.e. the principal gradient), whereas changes associated with task-condition (the thing of interest here) would be buried among the less reliable components. By contrast, our procedure directly isolates these task changes.

      References cited above:

      Dodero, L., Minh, H. Q., San Biagio, M., Murino, V., & Sona, D. (2015, April). Kernel-based classification for brain connectivity graphs on the Riemannian manifold of positive definite matrices. In 2015 IEEE 12th international symposium on biomedical imaging (ISBI) (pp. 42-45). IEEE.

      Ng, B., Dressler, M., Varoquaux, G., Poline, J. B., Greicius, M., & Thirion, B. (2014). Transport on Riemannian manifold for functional connectivity-based classification. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part II 17 (pp. 405-412). Springer International Publishing.

      (4e) Seems like a missed opportunity that the authors just use a single, PCA-derived measure to quantify learning, where multiple measures could have been of interest, especially given that the introduction established some interesting learning-related concepts related to exploration and exploitation, which could be conceptualized as movement variability and movement accuracy. It is unclear why the authors designed a task that was this novel and interesting, drawing on several psychological concepts, but then chose to ignore these concepts in the analysis.

      We were disappointed to hear that the reviewers did not appreciate our functional PCA-derived measure to quantify subject learning. This is a novel data-driven analysis approach that we have previously used with success in recent work (e.g., Areshenkoff et al., 2022, elife) and, from our perspective, we thought it was quite elegant that we were able to describe the entire trajectory of learning across all participants along a single axis that explained the majority (~75%) of the variance in the patterns of behavioral learning data. Moreover, the creation of a single behavioral measure per participant (what we call a ‘Learning score’, see Fig. 6C) helped simplify our brain-behavior correlation analyses considerably, as it provided a single measure that accounts for the natural auto-correlation in subjects’ learning curves (i.e., that subjects who learn quickly also tend to be better overall learners by the end of the learning phase). It also avoids the difficulty (and sometimes arbitrariness) of having to select specific trial bins for behavioral analysis (e.g., choosing the first 5, 10, 20 or 25 trials as a measure of ‘early learning’, and so on). Of course, one of the major alternatives to our approach would have involved fitting an exponential to each subject’s learning curves and taking measures like learning rate etc., but in our experience we have found that these types of models don’t always fit well, or derive robust/reliable parameters at the individual subject level. To strengthen the motivation for our approach, we have now included the following text in our Results:

      “To quantify this variation in subject performance in a manner that accounted the auto-correlation in learning performance over time (i.e., subjects who learned more quickly tend to exhibit better performance by the end of learning), we opted for a pure data-driven approach and performed functional principal component analysis (fPCA; (Shang, 2014)) on subjects’ learning curves. This approach allowed us to isolate the dominant patterns of variability in subject’s learning curves over time (see Methods for further details; see also Areshenkoff et al., 2022).”

      In any case, the reviewers may be pleased to hear that in current work in the lab we are using more model-based approaches to attempt to derive sets of parameters (per participant) that relate to some of the variables of interest described by the reviewers, but that we relate to much more dynamical (shorter-term) changes in brain activity.

      (4f) Overall Changes in Activity: The manuscript should delve into the potential influence of overall changes in brain activity on the results. The choice of using Euclidean distance as a metric for quantifying changes in connectivity is sensitive to scaling in overall activity. Therefore, it is crucial to discuss whether activity in task-relevant areas increases from baseline to early learning and decreases from early to late learning, or if other patterns emerge. A comprehensive analysis of overall activity changes will provide a more complete understanding of the findings.

      These are good questions and we are happy to explore this in the data. However, as mentioned in our response to query 4a above, it is important to note that the timeseries data for each brain region was z-scored prior to analysis, with the aim of removing any mean changes in activity levels (note that this is a standard preprocessing step when performing functional connectivity analysis, given that mean signal changes are not the focus of interest in functional connectivity analyses).

      To further emphasize these points, we have taken our z-scored timeseries data and calculated the mean signal for each region within each task epoch (Baseline, Early and Late learning, see panel A in figure below). The point of showing this data (where each z-score map looks near identical across the top, middle and bottom plots) is to demonstrate just how miniscule the mean signal changes are in the z-scored timeseries data. This point can also be observed when plotting the mean z-score signal across regions for each epoch (see panel B in figure below). Here we find that Baseline and Early learning have a near identical mean activation level across regions (albeit with slightly different variability across subjects), whereas there is a slight increase during late learning — though it should be noted that our y-axis, which measures in the thousandths, really magnifies this effect.

      To more directly address the reviewers’ comments, using the z-score signal per region we have also performed the same statistical pairwise comparisons (Early > Baseline and Late>Early) as we performed in the main manuscript Fig. 4 (see panel C in Author response image 9 below). In this plot, areas in red denote an increase in activity from Baseline to Early learning (top plot) and from Early to Late learning (bottom plot), whereas areas in blue denote a decrease for those same comparisons. The important thing to emphasize here is that the spatial maps resulting from this analysis are generally quite different from the maps of eccentricity that we report in Fig. 4 in our paper. For instance, in the figure below, we see significant changes in the activity of visual cortex between epochs but this is not found in our eccentricity results (compare with Fig. 4). Likewise, in our eccentricity results (Fig. 4), we find significant changes in the manifold positioning of areas in medial prefrontal cortex (MPFC), but this is not observed in the activation levels of these regions (panel C below). Again, we are hesitant to make too much of these results, as the activation differences denoted as significant in the figure below are likely to be an effect on the order of thousandths of a z-score (e.g., 0.002 > 0.001), but this hopefully assuages reviewers’ concerns that our manifold results are solely attributable to changes in overall activity levels.

      We are hesitant to include the results below in our paper as we feel that they don’t add much to the interpretation (as the purpose of z-scoring was to remove large activation differences). However, if the reviewers strongly believe otherwise, we would consider including them in the supplement.

      Author response image 9.

      Examination of overall changes in activity across regions. (A) Mean z-score maps across subjects for the Baseline (top), Early Learning (middle) and Late learning (bottom) epochs. (B) Mean z-score across brain regions for each epoch. Error bars represent +/- 1 SEM. (C) Pairwise contrasts of the z-score signal between task epochs. Positive (red) and negative (blue) values show significant increases and decreases in z-score signal, respectively, following FDR correction for region-wise paired t-tests (at q<0.05).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to address a critical challenge in the field of bioinformatics: the accurate and efficient identification of protein binding sites from sequences. Their work seeks to overcome the limitations of current methods, which largely depend on multiple sequence alignments or experimental protein structures, by introducing GPSite, a multi-task network designed to predict binding residues of various molecules on proteins using ESMFold.

      Strengths:

      • Benchmarking. The authors provide a comprehensive benchmark against multiple methods, showcasing the performances of a large number of methods in various scenarios.

      • Accessibility and Ease of Use. GPSite is highlighted as a freely accessible tool with user-friendly features on its website, enhancing its potential for widespread adoption in the research community.

      RE: We thank the reviewer for acknowledging the contributions and strengths of our work!

      Weaknesses:

      • Lack of Novelty. The method primarily combines existing approaches and lacks significant technical innovation. This raises concerns about the original contribution of the work in terms of methodological development. Moreover, the paper reproduces results and analyses already presented in previous literature, without providing novel analysis or interpretation. This further diminishes the contribution of this paper to advancing knowledge in the field.

      RE: The novelty of this work is primarily manifested in four key aspects. Firstly, although we have employed several existing tools such as ProtTrans and ESMFold to extract sequence features and predict protein conformations, these techniques were hardly explored in the field of binding site prediction. We have successfully demonstrated the feasibility of substituting multiple sequence alignments with language model embeddings and training with predicted structures, providing a new solution to overcome the limitations of current methods for genome-wide applications. Secondly, though a few methods tend to capture geometric information based on protein surfaces or atom graphs, surface calculation and property mapping are usually time-consuming, while massage passing on full atom graphs is memory-consuming and thus challenging to process long sequences. Besides, these methods are sensitive towards details and errors in the predicted structures. To facilitate large-scale annotations, we have innovatively applied geometric deep learning to protein residue graphs for comprehensively capturing backbone and sidechain geometric contexts in an efficient and effective manner (Figure 1). Thirdly, we have not only exploited multi-task learning to integrate diverse ligands and enhance performance, but also shown its capability to easily extend to the binding site prediction of other unseen ligands (Figure 4 D-E). Last but not least, as a “Tools and Resources” article, we have provided a fast, accurate and user-friendly webserver, as well as constructed a large annotation database for the sequences in Swiss-Prot. Leveraging this database, we have conducted extensive analyses on the associations between binding sites and molecular functions, biological processes, and disease-causing mutations (Figure 5), indicating the potential of our tool to unveil unexplored biology underlying genomic data.

      We have now revised the descriptions in the “The geometry-aware protein binding site predictor (GPSite)” section to highlight the novelty of our work in a clearer manner:

      “In conclusion, GPSite is distinguished from the previous approaches in four key aspects. First, profiting from the effectiveness and low computational cost of ProtTrans and ESMFold, GPSite is liberated from the reliance on MSA and native structures, thus enabling genome-wide binding site prediction. Second, unlike methods that only explore the Cα models of proteins 25,40, GPSite exploits a comprehensive geometric featurizer to fully refine knowledge in the backbone and sidechain atoms. Third, the employed message propagation on residue graphs is global structure-aware and time-efficient compared to the methods based on surface point clouds 21,22, and memory-efficient unlike methods based on full atom graphs 23,24. Residue-based message passing is also less sensitive towards errors in the predicted structures. Last but not least, instead of predicting binding sites for a single molecule type or learning binding patterns separately for different molecules, GPSite applies multi-task learning to better model the latent relationships among different binding partners.”

      • Benchmark Discrepancies. The variation in benchmark results, especially between initial comparisons and those with PeSTo. GPSite achieves a PR AUC of 0.484 on the global benchmark but a PR AUC of 0.61 on the benchmark against PeSTo. For consistency, PeSTo should be included in the benchmark against all other methods. It suggests potential issues with the benchmark set or the stability of the method. This inconsistency needs to be addressed to validate the reliability of the results.

      RE: We thank the reviewer for the constructive comments. Since our performance comparison experiments involved numerous competitive methods whose training sets are disparate, it was difficult to compare or rank all these methods fairly using a single test set. Given the substantial overlap between our protein-binding site test set and the training set of PeSTo, we meticulously re-split our entire protein-protein binding site dataset to generate a new test set that avoids any overlap with the training sets of both GPSite and PeSTo and performed a separate evaluation, where GPSite achieves a higher AUPR than PeSTo (0.610 against 0.433). This is quite common in this field. For instance, in the study of PeSTo (Nat Commun 2023), the comparisons of PeSTo with MaSIF-site, SPPIDER, and PSIVER were conducted using one test set, while the comparison with ScanNet was performed on a separate test set.

      Based on the reviewer’s suggestion, we have now replaced this experiment with a direct comparison with PeSTo using the datasets from PeSTo, in order to enhance the completeness and convincingness of our results. The corresponding descriptions are now added in Appendix 1-note 2, and the results are added in Appendix 2-table 4. For convenience, we also attach the note and table here:

      “Since 340 out of 375 proteins in our protein-protein binding site test set share > 30% identity with the training sequences of PeSTo, we performed a separate comparison between GPSite and PeSTo using the training and test datasets from PeSTo. By re-training with simply the same hyperparameters, GPSite achieves better performance than PeSTo (AUPR of 0.824 against 0.797) as shown in Appendix 2-table 4. Furthermore, when using ESMFold-predicted structures as input, the performance of PeSTo decreases substantially (AUPR of 0.691), and the superiority of our method will be further reflected. As in 24, the performance of ScanNet is also included (AUPR of 0.720), which is also largely outperformed by GPSite.”

      Author response table 1.

      Performance comparison of GPSite with ScanNet and PeSTo on the protein-protein binding site test set from PeSTo 24

      Note: The performance of ScanNet and PeSTo are directly obtained from 24. PeSTo* denotes evaluation using the ESMFold-predicted structures as input. The metrics provided are the median AUPR, median AUC and median MCC. The best/second-best results are indicated by bold/underlined fonts.

      • Interface Definition Ambiguity. There is a lack of clarity in defining the interface for the binding site predictions. Different methods are trained using varying criteria (surfaces in MaSIF-site, distance thresholds in ScanNet). The authors do not adequately address how GPSite's definition aligns with or differs from these standards and how this issue was addressed. It could indicate that the comparison of those methods is unreliable and unfair.

      RE: We thank the reviewer for the comments. The precise definition of ligand-binding sites is elucidated in the “Benchmark datasets” section. Specifically, the datasets of DNA, RNA, peptide, ATP, HEM and metal ions used to train GPSite were collected from the widely acknowledged BioLiP database [PMID: 23087378]. In BioLiP, a binding residue is defined if the smallest atomic distance between the target residue and the ligand is <0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms. Meanwhile, most comparative methods regarding these ligands were also trained on data from BioLiP, thereby ensuring fair comparisons.

      However, since BioLiP does not include data on protein-protein binding sites, studies for protein-protein binding site prediction may adopt slightly distinct label definitions, as the reviewer suggested. Here, we employed the protein-protein binding site data from our previous study [PMID: 34498061], where a protein-binding residue was defined as a surface residue (relative solvent accessibility > 5%) that lost more than 1 Å2 absolute solvent accessibility after protein-protein complex formation. This definition was initially introduced in PSIVER [PMID: 20529890] and widely applied in various studies (e.g., PMID: 31593229, PMID: 32840562). SPPIDER [PMID: 17152079] and MaSIF-site [PMID: 31819266] have also adopted similar surface-based definitions as PSIVER. On the other hand, ScanNet [PMID: 35637310] employed an atom distance threshold of 4 Å to define contacts while PeSTo [PMID: 37072397] used a threshold of 5 Å. However, it is noteworthy that current methods in this field including ScanNet (Nat Methods 2022) and PeSTo (Nat Commun 2023) directly compared methods using different label definitions without any alignment in their benchmark studies, likely due to the subtle distinctions among these definitions. For instance, the study of PeSTo directly performed comparisons with ScanNet, MaSIF-site, SPPIDER, and PSIVER. Therefore, we followed these previous works, directly comparing GPSite with other protein-protein binding site predictors.

      In the revised “Benchmark datasets” section, we have now provided more details for the binding site definitions in different datasets to avoid any potential ambiguity:

      “The benchmark datasets for evaluating binding site predictions of DNA, RNA, peptide, ATP, and HEM are constructed from BioLiP”; “A binding residue is defined if the smallest atomic distance between the target residue and the ligand is < 0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms”; “Besides, the benchmark dataset of protein-protein binding sites is directly from 26, which contains non-redundant transient heterodimeric protein complexes dated up to May 2021. Surface regions that become solvent inaccessible on complex formation are defined as the ground truth protein-binding sites. The benchmark datasets of metal ion (Zn2+, Ca2+, Mg2+ and Mn2+) binding sites are directly from 18, which contain non-redundant proteins dated up to December 2021 from BioLiP.”

      While GPSite demonstrates the potential to surpass state-of-the-art methods in protein binding site prediction, the evidence supporting these claims seems incomplete. The lack of methodological novelty and the unresolved questions in benchmark consistency and interface definition somewhat undermine the confidence in the results. Therefore, it's not entirely clear if the authors have fully achieved their aims as outlined.

      The work is useful for the field, especially in disease mechanism elucidation and novel drug design. The availability of genome-scale binding residue annotations GPSite offers is a significant advancement. However, the utility of this tool could be hampered by the aforementioned weaknesses unless they are adequately addressed.

      RE: We thank the reviewer for acknowledging the advancement and value of our work, as well as pointing out areas where improvements can be made. As discussed above, we have now carried out the corresponding revisions in the revised manuscript to enhance the completeness and clearness of our work.

      Reviewer #2 (Public Review):

      Summary:

      This work provides a new framework, "GPsite" to predict DNA, RNA, peptide, protein, ATP, HEM, and metal ions binding sites on proteins. This framework comes with a webserver and a database of annotations. The core of the model is a Geometric featurizer neural network that predicts the binding sites of a protein. One major contribution of the authors is the fact that they feed this neural network with predicted structure from ESMFold for training and prediction (instead of native structure in similar works) and a high-quality protein Language Model representation. The other major contribution is that it provides the public with a new light framework to predict protein-ligand interactions for a broad range of ligands.

      The authors have demonstrated the interest of their framework with mostly two techniques: ablation and benchmark.

      Strengths:

      • The performance of this framework as well as the provided dataset and web server make it useful to conduct studies.

      • The ablations of some core elements of the method, such as the protein Language Model part, or the input structure are very insightful and can help convince the reader that every part of the framework is necessary. This could also guide further developments in the field. As such, the presentation of this part of the work can hold a more critical place in this work.

      RE: We thank the reviewer for recognizing the contributions of our work and for noting that our experiments are thorough.

      Weaknesses:

      • Overall, we can acknowledge the important effort of the authors to compare their work to other similar frameworks. Yet, the lack of homogeneity of training methods and data from one work to the other makes the comparison slightly unconvincing, as the authors pointed out. Overall, the paper puts significant effort into convincing the reader that the method is beating the state of the art. Maybe, there are other aspects that could be more interesting to insist on (usability, interest in protein engineering, and theoretical works).

      RE: We sincerely appreciate the reviewer for the constructive and insightful comments. As to the concern of training data heterogeneity raised by the reviewer, it is noteworthy that current studies in this field, such as ScanNet (Nat Methods 2022) and PeSTo (Nat Commun 2023), directly compare methods trained on different datasets in their benchmark experiments. Therefore, we have adhered to the paradigm in these previous works. According to the detailed recommendations by the reviewer, we have now improved our manuscript by incorporating additional ablation studies regarding the effects of training procedure and language model representations, as well as case studies regarding the predicted structure’s quality and GPSite-based function annotations. We have also refined the Discussion section to focus more on the achievements of this work. A comprehensive point-by-point response to the reviewer’s recommendations is provided below.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Overall I think the work is slightly deserved by its presentation. Some improvements could be made to the paper to better highlight the significance of your contribution.

      RE: We thank the reviewer for recognizing the significance of our work!

      • Line 188: "As expected, the performance of these methods mostly decreases substantially utilizing predicted structures for testing because they were trained with high-quality native structures.

      This is a major ablation that was not performed in this case. You used the predicted structure to train, while the other did not. One better way to assess the interest of this approach would be to compare the performance of a network trained with only native structure to compare the leap in performance with and without this predicted structure as you did after to assess the interest of some other aspect of your method such as single to multitask.

      RE: We thank the reviewer for the valuable recommendation. We have now assessed the benefit of training with predicted instead of native structures, which brings an average AUPR increase of 4.2% as detailed in Appendix 1-note 5 and Appendix 2-table 9. For convenience, we also attach the note and table here:

      “We examined the performance under different training and evaluation settings as shown in Appendix 2-table 9. As expected, the model yields exceptional performance (average AUPR of 0.656) when trained and evaluated using native structures. However, if this model is fed with predicted structures of the test proteins, the performance substantially declines to an average AUPR of 0.573. This trend aligns with the observations for other structure-based methods as illustrated in Figure 2. More importantly, in the practical scenario where only predicted structures are available for the target proteins, training the model with predicted structures (i.e., GPSite) results in superior performance than training the model with native structures (average AUPR of 0.594 against 0.573), probably owing to the consistency between the training and testing data. For completeness, the results in Appendix 3-figure 2 are also included where GPSite is tested with native structures (average AUPR of 0.637).”

      Author response table 2.

      Performance comparison on the ten binding site test sets under different training and evaluation settings

      Note: The numbers in this table are AUPR values. “Pep” and “Pro” denote peptide and protein, respectively. “Avg” means the average AUPR values among the ten test sets. “native” and “predicted” denote applying native and predicted structures as input, respectively.

      • Line 263: "ProtTrans consistently obtains competitive or superior performance compared to the MSA profiles, particularly for the target proteins with few homologous sequences (Neff < 2)."

      This seems a bit far-fetched. If we see clearly in the figure that the performances are far superior for Neff < 2. The performances seem rather similar for higher Neff. Could the author evaluate numerically the significance of the improvement? MSA profiles outperform GPSite on 4 intervals and I don't know the distribution of the data.

      RE: We thank the reviewer for the valuable suggestion. We have now revised this sentence to avoid any potential ambiguity:

      “As evidenced in Figure 4B and Appendix 2-table 8, ProtTrans consistently obtains competitive or superior performance compared to the MSA profile. Notably, for the target proteins with few homologous sequences (Neff < 2), ProtTrans surpasses MSA profile significantly with an improvement of 3.9% on AUC (P-value = 4.3×10-8).”

      The detailed significance tests and data distribution are now added in Appendix 2-table 8 and attached below as Author response-table 3 for convenience:

      Author response table 3.

      Performance comparison between GPSite and the baseline model using MSA profile for proteins with different Neff values in the combined test set of the ten ligands

      Note: Significance tests are performed following the procedure in 12,25. If P-value < 0.05, the difference between the performance is considered statistically significant.

      • Line 285: "We first visualized the distributions of residues in this dataset using t-SNE, where the residues are encoded by raw feature vectors encompassing ProtTrans embeddings and DSSP structural properties, or latent embedding vectors from the shared network of GPSite. "

      Wouldn't embedding from single-task be more relevant to show the interest of multi-task training here? Is the difference that big when comparing embeddings from single-task training to embeddings from multi-task training? Otherwise, I think the evidence from Figure 4e is sufficient, the interest of multitasking could be well-shown by single-task vs. multi-task AUPR and a few examples or predictions that are improved.

      RE: We thank the reviewer for the comment. In the second paragraph of the “The effects of protein features and model designs” section, we have compared the performance of multi-task and single-task learning. However, the visualization results in Figure 4D are related to the third paragraph, where we conducted a downstream exploration of the possibility to extend GPSite to other unseen ligands. This is based on the hypothesis that the shared network in GPSite may have captured certain common ligand-binding mechanisms during the preceding multi-task training process. We visualized the distributions of residues in an unseen carbohydrate-binding site dataset using t-SNE, where the residues are encoded by raw feature vectors (ProtTrans and DSSP), or latent embedding vectors from the shared network trained before. Although the shared network has not been specifically trained on the carbohydrate dataset, the latent representations from GPSite effectively improve the discriminability between the binding and non-binding residues as shown in Figure 4D. This finding indicates that the shared network trained on the initial set of ten molecule types has captured common binding mechanisms and may be applied to other unseen ligands.

      We have now added more descriptions in this paragraph to avoid potential ambiguity:

      “Residues that are conserved during evolution, exposed to solvent, or inside a pocket-shaped domain are inclined to participate in ligand binding. During the preceding multi-task training process, the shared network in GPSite should have learned to capture such common binding mechanisms. Here we show how GPSite can be easily extended to the binding site prediction for other unseen ligands by adopting the pre-trained shared network as a feature extractor. We considered a carbohydrate-binding site dataset from 54 which contains 100 proteins for training and 49 for testing. We first visualized the distributions of residues in this dataset using t-SNE 55, where the residues are encoded by raw feature vectors encompassing ProtTrans embeddings and DSSP structural properties, or latent embedding vectors from the shared network of GPSite trained on the ten molecule types previously.”

      • Line291: "Employing these informative hidden embeddings as input features to train a simple MLP exhibits remarkable performance with an AUC of 0.881 (Figure 4E), higher than that of training a single-task version of GPSite from scratch (AUC of 0.853) or other state-of-the-art methods such as MTDsite and SPRINT-CBH."

      Is it necessary to introduce other methods here? The single-task vs multi-task seems enough for what you want to show?

      RE: We thank the reviewer for the comment. As discussed above, here we aim to show the potential of GPSite for the binding site prediction of unseen ligand (i.e., carbohydrate) by adopting the pre-trained shared network as a feature extractor. Thus, we think it’s reasonable to also include the performance of other state-of-the-art methods in this carbohydrate benchmark dataset as baselines.

      • Line 321: "Specifically, a protein-level binding score can be generated for each ligand by averaging the top k predicted scores among all residues. Empirically, we set k to 5 for metal ions and 10 for other ligands, considering that the binding interfaces of metal ions are usually smaller."

      Since binding sites are usually not localized on one single amino-acid, we can expect that most of the top k residues are localized around the same area of the protein both spatially and along the sequence. Is it something you observe and could consider in your method?

      RE: We thank the reviewer for the comment. We employed a straightforward method (top-k average) to convert GPSite’s residue-level annotations into protein-level annotations, where k was set empirically based on the distributions of the numbers of binding residues per sequence observed in the training set. We have not put much effort in optimizing this strategy since it mainly serves as a proof-of-concept experiment (Figure 5 A-C) to show the potential of GPSite in discriminating ligand-binding proteins. We have now revised this sentence to better explain how we selected k:

      “Specifically, a protein-level binding score indicating the overall binding propensity to a specific ligand can be generated by averaging the top k predicted scores among all residues. Empirically, we set k to 5 for metal ions and 10 for other ligands, considering the distributions of the numbers of binding residues per sequence observed in the training set.”

      As for the question raised by the reviewer, we can indeed expect that most of the top k predicted binding residues tend to cluster into several but not necessarily one area. For instance, certain macromolecules like DNA may interact with several protein surface patches due to their elongated structures (e.g., Author esponse-figure 1A). Another case may be a protein binding to multiple molecules of the same ligand type (e.g., Author response-figure 1B).

      Author response image 1.

      The structures of 4XQK (A) and 4KYW (B) in PDB.

      • Line 327: The accuracy of the GPSite protein-level binding scores is further validated by the ROC curves in Figure 5B, where GPSite achieves satisfactory AUC values for all ligands except protein (AUC of 0.608).

      Here may be a good place to compare yourself with others, do other frameworks experience the same problem? If so, AUC and AUPR are not relevant here, can you expose some recall scores for example?

      RE: We thank the reviewer for the valuable recommendation. We have conducted comprehensive method comparisons in the preceding “GPSite outperforms state-of-the-art methods” section, where GPSite surpasses all existing frameworks across various ligands. Here, the genome-wide analyses of Swiss-Prot in Figure 5 serve as a downstream demonstration of GPSite’s capacity for large-scale annotations. We didn’t compare with other methods since most of them are time-consuming or memory-consuming, thus unavailable to process sequences of substantial quantity or length. For example, it takes about 8 min for the MSA-based method GraphBind to annotate a protein with 500 residues, while it just takes about 20 s for GPSite (see Appendix 3-figure 1 for detailed runtime comparison). It is also challenging for the atom-graph-based method PeSTo to process structures more than 100 kDa (~1000 residues) on a 32 GB GPU as the authors suggested, while GPSite can easily process structures containing up to 2500 residues on a 16 GB GPU.

      Regarding the recall score mentioned by the reviewer, GPSite achieves a recall of 0.95 (threshold = 0.5) for identifying protein-binding proteins. This indicates that GPSite can accurately identify positive samples, but it also tends to misclassify negative samples as positive. In our original manuscript, we claimed that “This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete”. To better support this claim, we have now added two examples in Appendix 1-note 7, where GPSite confidently predicted the presences of the “protein binding” function (GO:0005515). Notably, this function was absent in these two proteins in the Swiss-Prot database at the time of manuscript preparation (release: 2023-05-03), but has been included in the latest release of Swiss-Prot (release: 2023-11-08). For convenience, we also attach the note here:

      “As depicted in Figure 5A, GPSite assigns relatively high prediction scores to the proteins without “protein binding” function in the Swiss-Prot annotations, leading to a modest AUC value of 0.608 (Figure 5B). This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete. To support this hypothesis, we present two proteins as case studies, both sharing < 20% sequence identity with the protein-binding training set of GPSite. The first case is Aminodeoxychorismate synthase component 2 from Escherichia coli (UniProt ID: P00903). GPSite confidently predicted this protein as a protein-binding protein with a high prediction score of 0.936. Notably, this protein was not annotated with the “protein binding” function (GO:0005515) or any of its GO child terms in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P00903?format=txt&versions=171, release: 2023-05-03). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P00903?format=txt&versions=174, release: 2023-11-08) during manuscript revision, this protein is annotated with the “protein heterodimerization activity” function (GO:0046982), which is a child term of “protein binding”. In fact, the heterodimerization activity of this protein has been validated through experiments in the year of 1996 (PMID: 8679677), indicating the potential incompleteness of the Swiss-Prot annotations. The other case is Hydrogenase-2 operon protein HybE from Escherichia coli (UniProt ID: P0AAN1), which was also predicted as a protein-binding protein by GPSite (score = 0.909). Similarly, this protein was not annotated with the “protein binding” function in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=108). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=111), this protein is annotated with the “preprotein binding” function (GO:0070678), which is a child term of “protein binding”. In fact, the preprotein binding function of this protein has been validated through experiments in the year of 2003 (PMID: 12914940). These cases demonstrate the effectiveness of GPSite for completing the missing function annotations in Swiss-Prot.”

      • Line 381: 'Despite the noteworthy advancements achieved by GPSite, there remains scope for further improvements. Given that the ESM Metagenomic Atlas 34 provides 772 million predicted protein structures along with pre-computed language model embeddings, self-supervised learning can be employed to train a GPSite model for predicting masked sequence and structure attributes, or maximizing the similarity between the learned representations of substructures from identical proteins while minimizing the similarity between those from different proteins using a contrastive loss function training from scratch. Additional opportunities for upgrade exist within the network architecture. For example, a variational Expectation-Maximization (EM) framework 58 can be adopted to handle the hierarchical graph structure inherent in proteins, which contains the top view of the residue graph and the bottom view of the atom graph inside a residue. Such an EM procedure enables training two separate graph neural networks for the two views while simultaneously allowing interaction and mutual enhancement between the two modules. Meta-learning could also be explored in this multi-task scenario, which allows fast adaptation to unseen tasks with limited labels.'

      I think this does not belong here. It feels like half of your discussion is not talking about the achievements of this paper but future very specific directions. Focus on the take-home arguments (performances of the model, ability to predict a large range of tasks, interest in key components of your model, easy use) of the paper and possible future direction but without being so specific.

      RE: We thank the reviewer for the valuable suggestion. We have now simplified the discussions on the future directions notably:

      “Despite the noteworthy advancements achieved by GPSite, there remains scope for further improvements. GPSite may be improved by pre-training on the abundant predicted structures in ESM Metagenomic Atlas, and then fine-tuning on binding site datasets. Besides, the hidden embeddings from ESMFold may also serve as informative protein representations. Additional opportunities for upgrade exist within the network architecture. For example, a variational Expectation-Maximization framework can be adopted to handle the hierarchical atom-to-residue graph structure inherent in proteins. Meta-learning could also be explored in this multi-task scenario, which allows fast adaptation to unseen tasks with limited labels.”

      • Overall there is also a lack of displayed structure. You should try to select a few examples of binding sites that were identified correctly by your method and not by others, if possible get some insights on why. Also, some negative examples could be interesting so as to have a better idea of the interest.

      RE: We thank the reviewer for the valuable recommendation. We have performed a case study for the structure of the glucocorticoid receptor in Figure 3 D-H to illustrate a potential reason for the robustness of GPSite. Moreover, we have now added a case study in Appendix 1-note 3 and Appendix 3-figure 5 to explain why GPSite sometimes is not as accurate as the state-of-the-art structure-based method. For convenience, we also attach the note and figure here:

      “Here we present an example of an RNA-binding protein, i.e., the ribosome biogenesis protein ERB1 (PDB: 7R6Q, chain m), to illustrate the impact of predicted structure’s quality. As shown in Appendix 3-figure 5, ERB1 is an integral component of a large multimer structure comprising protein and RNA chains (i.e., the state E2 nucleolar 60S ribosome biogenesis intermediate). Likely due to the neglect of interactions from other protein chains, ESMFold fails to predict the correct conformation of the ERB1 chain (TM-score = 0.24). Using this incorrect predicted structure, GPSite achieves an AUPR of 0.580, lower than GraphBind input with the native structure (AUPR = 0.636). However, the performance of GraphBind substantially declines to an AUPR of 0.468 when employing the predicted structure as input. Moreover, if GPSite adopts the native structure for prediction, a notable performance boost can be obtained (AUPR = 0.681).”

      Author response image 2.

      The prediction results of GPSite and GraphBind for the ribosome biogenesis protein ERB1. (A) The state E2 nucleolar 60S ribosome biogenesis intermediate (PDB: 7R6Q). The ribosome biogenesis protein ERB1 (chain m) is highlighted in blue, while other protein chains are colored in gray. The RNA chains are shown in orange. (B) The RNA-binding sites on ERB1 (colored in red). (C) The ESMFold-predicted structure of ERB1 (TM-score = 0.24). The RNA-binding sites are also mapped onto this predicted structure (colored in red). (D-G) The prediction results of GPSite and GraphBind for the predicted and native ERB1 structures. The confidence of the predictions is represented with a gradient of color from blue for non-binding to red for binding.

      Minor comments:

      • Line 169: "Note that since our test sets may partly overlap with the training sets of these methods, the results reported here should be the upper limits for the existing methods."

      Yes, but they were potentially not trained on the most recent structures in that case. These methods could also see improved performance with an updated training set.

      RE: We thank the reviewer for the comment. We have now deleted this sentence.

      • Line176: "Since 358 of the 375 proteins in our protein-binding site test set share > 30% identity with the training sequences of PeSTo, we re-split our protein-binding dataset to generate a test set of 65 proteins sharing < 30% identity with the training set of PeSTo for a fair evaluation."

      Too specific to be here in my opinion.

      RE: We thank the reviewer for the comment. We have now moved these details to Appendix 1-note 2. The description in the main text here is now more concise:

      “Given the substantial overlap between our protein-binding site test set and the training set of PeSTo, we conducted separate training and comparison using the datasets of PeSTo, where GPSite still demonstrates a remarkable improvement over PeSTo (Appendix 1-note 2).”

      • Figure 2. The authors should try to either increase Fig A's size or increase the font size. This could probably be done by compressing the size of Figure C into a single figure.

      RE: We thank the reviewer for the suggestion. We have now increased the font size in Figure A. Besides, the figures in the final version of the manuscript should be clearer where we could upload SVG files.

      • Have you tried using embeddings from more structure-aware pLM such as ESM Fold embeddings (fine-tuned) or ProstTrans (that may be more recent than this study)?

      RE: We thank the reviewer for the insightful comment. We have not yet explored the embeddings from structure-aware pLM, but we acknowledge its potential as a promising avenue for future investigation. We have now added this point in our Discussion section:

      “Besides, the hidden embeddings from ESMFold may also serve as informative protein representations.”

      Reviewer #3 (Public Review):

      Summary

      The authors of this work aim to address the challenge of accurately and efficiently identifying protein binding sites from sequences. They recognize that the limitations of current methods, including reliance on multiple sequence alignments or experimental protein structure, and the under-explored geometry of the structure, which limit the performance and genome-scale applications. The authors have developed a multi-task network called GPSite that predicts binding residues for a range of biologically relevant molecules, including DNA, RNA, peptides, proteins, ATP, HEM, and metal ions, using a combination of sequence embeddings from protein language models and ESMFold-predicted structures. Their approach attempts to extract residual and relational geometric contexts in an end-to-end manner, surpassing current sequence-based and structure-based methods.

      Strengths

      • The GPSite model's ability to predict binding sites for a wide variety of molecules, including DNA, RNA, peptides, and various metal ions.

      • Based on the presented results, GPSite outperforms state-of-the-art methods in several benchmark datasets.

      • GPSite adopts predicted structures instead of native structures as input, enabling the model to be applied to a wider range of scenarios where native structures are rare.

      • The authors emphasize the low computational cost of GPSite, which enables rapid genome-scale binding residue annotations, indicating the model's potential for large-scale applications.

      RE: We thank the reviewer for recognizing the significance and value of our work!

      Weaknesses

      • One major advantage of GPSite, as claimed by the authors, is its efficiency. Although the manuscript mentioned that the inference takes about 5 hours for all datasets, it remains unclear how much improvement GPSite can offer compared with existing methods. A more detailed benchmark comparison of running time against other methods is recommended (including the running time of different components, since some methods like GPSite use predicted structures while some use native structures).

      RE: We thank the reviewer for the valuable suggestion. Empirically, it takes about 5-20 min for existing MSA-based methods to make predictions for a protein with 500 residues, while it only takes about 1 min for GPSite (including structure prediction). However, it is worth noting that some predictors in our benchmark study are solely available as webservers, and it is challenging to compare the runtime between a standalone program and a webserver due to the disparity in hardware configurations. Therefore, we have now included comprehensive runtime comparisons between the GPSite webserver and other top-performing servers in Appendix 3-figure 1 to illustrate the practicality and efficiency of our method. For convenience, we also attach the figure here as Author response-figure 3. The corresponding description is now added in the “GPSite outperforms state-of-the-art methods” section:

      “Moreover, GPSite is computationally efficient, achieving comparable or faster prediction speed compared to other top-performing methods (Appendix 3-figure 1).”

      Author response image 3.

      Runtime comparison of the GPSite webserver with other top-performing servers. Five protein chains (i.e., 8HN4_B, 8USJ_A, 8C1U_A, 8K3V_A and 8EXO_A) comprising 100, 300, 500, 700, and 900 residues, respectively, were selected for testing, and the average runtime is reported for each method. Note that a significant portion of GPSite’s runtime (75 s, indicated in orange) is allocated to structure prediction using ESMFold.

      • Since the model uses predicted protein structure, the authors have conducted some studies on the effect of the predicted structure's quality. However, only the 0.7 threshold was used. A more comprehensive analysis with several different thresholds is recommended.

      RE: We thank the reviewer for the comment. We assessed the effect of the predicted structure's quality by evaluating GPSite’s performance on high-quality (TM-score > 0.7) and low-quality (TM-score ≤ 0.7) predicted structures. We did not employ multiple thresholds (e.g., 0.3, 0.5, and 0.7), as the majority of proteins in the test sets were accurately predicted by ESMFold. Specifically, as shown in Figure 3B, Appendix 3-figure 3 and Appendix 2-table 5, the numbers of proteins with TM-score ≤ 0.7 are small in most datasets (e.g., 42 for DNA and 17 for ATP). Consequently, there is insufficient data available for analysis with lower thresholds, except for the RNA test set. Notably, Figure 3C presents a detailed inspection of the 104 proteins with TM-score < 0.5 in the RNA test set. Within this subset, GPSite consistently outperforms the state-of-the-art structure-based method GraphBind with predicted structures as input, regardless of the prediction quality of ESMFold. Only in cases where structures are predicted with extremely low quality (TM-score < 0.3) does GPSite fall behind GraphBind input with native structures. This result further demonstrates the robustness of GPSite. We have now added clearer explanations in the “GPSite is robust for low-quality predicted structures” section:

      “Figure 3B and Appendix 3-figure 3 show the distributions of TM-scores between native and predicted structures calculated by US-align in the ten benchmark datasets, where most proteins are accurately predicted with TM-score > 0.7 (see also Appendix 2-table 5)”; “Given the infrequency of low-quality predicted structures except for the RNA test set, we took a closer inspection of the 104 proteins with predicted structures of TM-score < 0.5 in the RNA test set.”

      • To demonstrate the robustness of GPSite, the authors performed a case study on human GR containing two zinc fingers, where the predicted structure is not perfect. The analysis could benefit from more a detailed explanation of why the model can still infer the binding site correctly even though the input structural information is slightly off.

      RE: We thank the reviewer for the comment. We have actually explained the potential reason for the robustness of GPSite in the second paragraph of the “GPSite is robust for low-quality predicted structures” section. In summary, although the whole structure of this protein is not perfectly predicted, the local structures of the binding domains of peptide, DNA and Zn2+ are actually predicted accurately as evidenced by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can still make reliable predictions. We have now revised this paragraph to explain these more clearly:

      “Figure 3D shows the structure of the human glucocorticoid receptor (GR), a transcription factor that binds DNA and assembles a coactivator peptide to regulate gene transcription (PDB: 7PRW, chain A). The DNA-binding domain of GR also consists of two C4-type zinc fingers to bind Zn2+ ions. Although the structure of this protein is not perfectly predicted (TM-score = 0.72), the local structures of the binding domains of peptide and DNA are actually predicted accurately as viewed by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can correctly predict all Zn2+ binding sites and precisely identify the binding sites of DNA and peptide with AUPR values of 0.949 and 0.924, respectively (Figure 3F, G and H).”

      • To analyze the relatively low AUC value for protein-protein interactions, the authors claimed that it is "due to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete", which is unjustified. It is highly recommended to support this claim by showing at least one example where GPSite's prediction is a valid binding site that is not present in the current Swiss-Prot database or via other approaches.

      RE: We thank the reviewer for the valuable recommendation. To support this claim, we have now added two examples in Appendix 1-note 7, where GPSite confidently predicted the presences of the “protein binding” function (GO:0005515). Notably, this function was absent in these two proteins in the Swiss-Prot database at the time of manuscript preparation (release: 2023-05-03), but has been included in the latest release of Swiss-Prot (release: 2023-11-08). For convenience, we also attach the note below:

      “As depicted in Figure 5A, GPSite assigns relatively high prediction scores to the proteins without “protein binding” function in the Swiss-Prot annotations, leading to a modest AUC value of 0.608 (Figure 5B). This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete. To support this hypothesis, we present two proteins as case studies, both sharing < 20% sequence identity with the protein-binding training set of GPSite. The first case is Aminodeoxychorismate synthase component 2 from Escherichia coli (UniProt ID: P00903). GPSite confidently predicted this protein as a protein-binding protein with a high prediction score of 0.936. Notably, this protein was not annotated with the “protein binding” function (GO:0005515) or any of its GO child terms in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P00903?format=txt&versions=171, release: 2023-05-03). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P00903?format=txt&versions=174, release: 2023-11-08) during manuscript revision, this protein is annotated with the “protein heterodimerization activity” function (GO:0046982), which is a child term of “protein binding”. In fact, the heterodimerization activity of this protein has been validated through experiments in the year of 1996 (PMID: 8679677), indicating the potential incompleteness of the Swiss-Prot annotations. The other case is Hydrogenase-2 operon protein HybE from Escherichia coli (UniProt ID: P0AAN1), which was also predicted as a protein-binding protein by GPSite (score = 0.909). Similarly, this protein was not annotated with the “protein binding” function in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=108). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=111), this protein is annotated with the “preprotein binding” function (GO:0070678), which is a child term of “protein binding”. In fact, the preprotein binding function of this protein has been validated through experiments in the year of 2003 (PMID: 12914940). These cases demonstrate the effectiveness of GPSite for completing the missing function annotations in Swiss-Prot.”

      • The authors reported that many GPSite-predicted binding sites are associated with known biological functions. Notably, for RNA-binding sites, there is a significantly higher proportion of translation-related binding sites. The analysis could benefit from a further investigation into this observation, such as the analyzing the percentage of such interactions in the training site. In addition, if there is sufficient data, it would also be interesting to see the cross-interaction-type performance of the proposed model, e.g., train the model on a dataset excluding specific binding sites and test its performance on that class of interactions.

      RE: We thank the reviewer for the suggestion. We would like to clarify that the analysis in Figure 5C was conducted at “protein-level” instead of “residue-level”. As described in the second paragraph of the “Large-scale binding site annotation for Swiss-Prot” section, a protein-level ligand-binding score was assigned to a protein by averaging the top k residue-level predicted binding scores. This protein-level score indicates the overall binding propensity of the protein to a specific ligand. We gathered the top 20,000 proteins with the highest protein-level binding scores for each ligand and found that their biological process annotations from Swiss-Prot were consistent with existing knowledge. We have now revised the corresponding sentence to explain these more clearly:

      “Exploiting the residue-level binding site annotations, we could readily extend GPSite to discriminate between binding and non-binding proteins of various ligands. Specifically, a protein-level binding score indicating the overall binding propensity to a specific ligand can be generated by averaging the top k predicted scores among all residues.”

      As for the cross-interaction-type performance raised by the reviewer, we have now conducted cross-type evaluations to investigate the specificity of the ligand-specific MLPs and the inherent similarities among different ligands in Appendix 1-note 6 and Appendix 2-table 10. For convenience, we also attach the note and table here:

      “We conducted cross-type evaluations by applying different ligand-specific MLPs in GPSite for the test sets of different ligands. As shown in Appendix 2-table 10, for each ligand-binding site test set, the corresponding ligand-specific network consistently achieves the best performance. This indicates that the ligand-specific MLPs have specifically learned the binding patterns of particular molecules. We also noticed that the cross-type performance is reasonable for the ligands sharing similar properties. For instance, the DNA-specific MLP exhibits a reasonable AUPR when predicting RNA-binding sites, and vice versa. Similar trends are also observed between peptide and protein, as well as among metal ions as expected. Interestingly, the cross-type performance between ATP and HEM is also acceptable, potentially attributed to their comparable molecular weights (507.2 and 616.5, respectively).”

      Author response table 4.

      Cross-type performance by applying different ligand-specific MLPs in GPSite for the test sets of different ligands

      Note: “Pep” and “Pro” denote peptide and protein, respectively. The numbers in this table are AUPR values. The best/second-best result in each test set is indicated by bold/underlined font.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Hats off to the authors for taking time to decipher the seemingly subtle but important differences between the Gnai2/3 double mutant and Ptx mutant phenotypes. These results further illustrate the dynamic requirement of Gnai/0 in hair bundle establishment. I have some minor suggestions for the authors to consider and it is up to the authors to decide whether to incorporate them:

      We decided to make the current (revised) version the version of record, and we explain why below. Please include these comments in the review+rebuttal material.

      (1) The abstract could be modified to reflect the revised interpretations of the results.

      Response: the abstract is high-level and the changes in interpretation in the revised manuscript do not modify the message there. Briefly, the abstract only states that Gnai2; Gnai3 double mutants recapitulate two defects previously only observed with pertussis toxin. There is no claim about the timing or dose of GNAI proteins involved.

      (2) The three rows of OHCs are like a different beast from each other. Mireille Montcouquiol's lab has demonstrated that there is a differential requirement for Gnai3 in hair bundle orientation among the three rows of OHCs. The results described in this manuscript support this notion as well.

      To clarify, Gnai3 inactivation does not affect OHC orientation. Only pertussis toxin, and in this work Gnai2; Gnai3 double mutants, do. The Montcouquiol lab showed different degree of OHC1, OHC2 and OHC3 misorientation upon use of pertussis toxin in vitro using cochlear explants (Ezan et al 2013). We showed the same thing in vivo using transgenic models (Tarchini et al 2013; Tarchini et al 2016). The different OHC responses by row and corresponding citations are mentioned in several locations in the manuscript, including first on line 112 in the Introduction and in Fig. 1C in a graphical summary.

      (3) I wonder if "compensate" or "redundancy" may be a better term to use than "rescue" in the Discussion and figure.

      Use of “rescue” in the Discussion is line 603 and 604. We think that “rescue” is appropriate to refer to the ability of GNAI2 to compensate for the loss of GNAI1 and GNAI3 in mutant context. We would argue that these different wordings are largely interchangeable and do not change the message.


      Author Response

      The following is the authors’ response to the original reviews.

      We really appreciate the time the reviewers spent reading and commenting on the original manuscript. Although they were positive already, we decided to spend some time to address the main comments with new experiments as thoroughly as possible in a new manuscript version. We also heavily edited some sections accordingly.: 1) we delayed pertussis toxin activation in hair cells with Atoh1-Cre to show that the resulting misorientation phenotype is delayed compared to FoxG1-Cre results, as also seen in Gnai2; Gnai3 double mutants. It follows that Gnai2; Gnai3 and pertussis mutants do share a similar misorientation profile, and that GNAI proteins are required to normally reverse OHC1-2 (from medial to lateral), but also to maintain the lateral orientation, at least transiently. 2) We experimentally verified that one of our GNAI antibodies can indeed detect GNAI1, and consequently that absence of signal in Gnai2; Gnai3 double mutants is evidence that GNAI1 is not involved in apical hair cell polarization. We believe these changes strengthen the manuscript and its conclusions.

      Reviewer #1 (Public Review):

      A subclass of inhibitory heterotrimeric guanine nucleotide-binding protein subunits, GNAI, has been implicated in sensory hair cell formation, namely the establishment of hair bundle (stereocilia) orientation and staircase formation. However, the former role of hair bundle orientation has only been demonstrated in mutants expressing pertussis toxin, which blocks all GNAI subunits, but not in mutants with a single knockout of any of the Gnai genes, suggesting that there is a redundancy among various GNAI proteins in this role. Using various conditional mutants, the authors concluded that GNAI3 is the primary GNAI proteins required for hair bundle morphogenesis, whereas hair bundle orientation requires both GNAI2 and GNAI3.

      Strength

      Various compound mutants were generated to decipher the contribution of individual GNAI1, GNAI2, GNAI3 and GNAIO in the establishment of hair bundle orientation and morphogenesis. The study is thorough with detailed quantification of hair bundle orientation and morphogenesis, as well as auditory functions.

      Weakness

      While the hair bundle orientation phenotype in the Foxg1-cre; Gnai2-/-; Gnai3 lox/lox (double mutants) appear more severe than those observed in Ptx cKO mutants, it may be an oversimplification to attribute the differences to more GNAI function in the Ptx cko mutants. The phenotypes between the double mutants and Ptx cko mutants appear qualitatively different. For example, assuming the milder phenotypes in the Ptx cKO is due to incomplete loss of GNAI function, one would expect the Ptx phenotype would be reproducible by some combination of compound mutants among various Gnai genes. Such information was not provided. Furthermore, of all the double mutant specimens analyzed for hair bundle orientation (Fig. 8), the hair bundle/kinocilium position started out normally in the lateral quadrant at E17.5 but failed to be maintained by P0. This does not appear to be the case for Ptx cKO, in which all affected hair cells showed inverted orientation by E17.5. It is not clear whether this is the end-stage of bundle orientation in Ptx cKO, and the kinocilium position started out normal, similar to the double mutants before the age of analysis at E17.5. Understanding these differences may reveal specific requirements of individual GNAI subunits or other factors are being affected in the Ptx mutants.

      This criticism was very useful and prompted new experiments as well as a change in data presentation and a fundamental rewrite regarding hair cell orientation. These changes are detailed below. Of note, however, please let us clarify that the original manuscript did show that the ptxA orientation phenotype is reproduced to some extent in Gnai2; Gnai3 double mutants (previously Fig. 8 and corresponding text line 505). We showed that OHC1-2 are also inverted in the double mutant, although at a later differentiation stage. We recognize that similarities in hair cell misorientation between ptxA and Gnai2; Gnai3 DKO were not explained and discussed well enough. This part of the manuscript has been re-worked extensively, and we hope that along with new results, comparisons between mutant models are easier to follow and understand. We notably fully adopted the idea that there are qualitative differences between ptxA and Gnai2; Gnai3 mutants, and not only a difference in the remaining “dose” of GNAI activity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comments related to clarification of the weakness:

      (1) In general, hair bundle orientation in the double mutants is established in the lateral quadrant of the cochlea before being inverted (Fig. 8). These results are intriguing because the lateral orientation is the correct position for these hair bundles normally and Gnai proteins are thought to be required to get the kinocilium to the lateral position. This process appears to proceed normally in the double mutants but the kinocilium reverted to the medial default position over time, which suggests that Gnai2 and Gnai3 are only required for the maintenance and not the establishment of the kinocilium in the lateral position. Is this phenotype qualitatively similar in the Ptx cKO?

      We addressed these issues with two types of modifications to the data:

      (1) We modified the eccentricity threshold used at E17.5 in Fig. 8 (orientation) to be more stringent, using 0.4 (instead of 0.25 previously) in both controls and mutants. This means that we now only graph the orientation of cells where eccentricity is more marked. The rationale is that at early stages, it is challenging to distinguish immature vs defective near-symmetrical cells. We kept a threshold of 0.25 at P0 when the hair cell apical surface is larger and better differentiated (Fig. 8C-D). Importantly, the dataset remains rigorously identical. This change usefully highlights that a large proportion of OHC1 is in fact inverted (oriented medially) at E17.5 in Gnai2; Gnai3 double mutants at the cochlear mid, as also seen in the ptxA model at the same stage and position (see new Fig. 8A). At the E17.5 base (Fig. 8B), a slightly more mature position, the outcome is unchanged (the majority of OHC1 are inverted using either a 0.25 or 0.4 threshold in double mutants and in ptxA).

      Interestingly however, the orientation trend is unchanged for OHC2: OHC2 remain oriented largely laterally (i.e. normally) at the E17.5 mid and base in Gnai2; Gnai3 double mutants even with a raised eccentricity thresholds, whereas by contrast OHC2 in ptxA are inverted at these stage and positions. In the double mutant, OHC2 only become inverted at the P0 base (Fig. 8D). This suggests that there are similarities (OHC1) but also differences (OHC2s) between the two mouse models, and that double mutants show a delay in adopting an inverted orientation compared to ptxA. Of note, OHC2 have been shown to differentiate later than OHC1 (for example, Anniko 1983 PMID:6869851).

      (2) To directly test the idea that the misorientation phenotype (inverted OHC1-2) is comparable between the two models but delayed in Gnai2; Gnai3 mutants, we performed a new experiment and added new results in the manuscript. We delayed ptxA action by using Atoh1-Cre (postmitotic hair cells) instead of FoxG1-Cre (otic progenitors). Remarkably, this produced a pattern of OHC1-2 misorientation more similar to Gnai2; Gnai3 mutants: at the E17.5 base and P0 apex, OHC2 were still largely oriented laterally (normally) in Atoh1-Cre; ptxA as in Gnai2; Gnai3 mutants whereas at the P0 base a large proportion of OHC2 were inverted (Fig. 8 Supp 1B). OHC1 were inverted at all stages and positions in the Atoh1-Cre as in the FoxG1-Cre; ptxA model. For Atoh1-Cre; ptxA, we only illustrated OHC1 and OHC2 and did not add E17.5 mid or P0 mid results because other cell types and stage/positions did not provide additional insight. In addition, we are well aware that the full FoxG1-Cre; ptxA and Gnai2; Gnai3 results for 4 cells types (IHC, OHC1-3) and 5 stages/positions is already a lot of data for cell orientation.

      These results suggest that:

      (a) The normal reversal of OHC1-2 to adopt a lateral orientation needs to be maintained, at least transiently, and that maintenance also relies on GNAI/O (Results starting line 529. Disussion line 621).

      (b) ptxA is more severe than Gnai2; Gnai3 when it comes to OHC1-2 orientation (Figure 9, role b). Oppositely, Gnai2; Gnai3 is obviously more severe when it comes to symmetry-breaking (Fig. 9, role a) and hair bundle morphogenesis (Fig. 9, c). It follows that the two early GNAI/O activities are qualitatively different and not just based on dose. This is essentially what this Reviewer correctly pointed out, and we have fully edited both Results and Discussion accordingly. We now speculate that the difference may lie in the identity of the necessary GNAI/O protein for each role. Any GNAI/O proteins acting as a switch downstream of the GPR156 receptor may relay orientation information (Fig. 9, role b), making ptxA a particularly effective disruption strategy since it downregulates all GNAI/O proteins. In contrast, symmetry-breaking may rely more specifically on GNAI2 and GNAI3, and ptxA is not expected to achieve a loss-of-function of GNAI2 and GNAI3 as extensive as a double targeted genetic inactivation of the corresponding genes. Please see new Results starting line 526 and Discussion starting line 603. We consequently abandoned the notion that increased doses of GNAI/O is required for each role, and we also clarify that symmetry-breaking (a) and orientation (b) occur at the same time (Fig. 9).

      (2) P0 may not be late enough a stage to access phenotype maturity in the double mutants. For example, it is not clear from the basal PO results whether the IHC will acquire an inverted phenotype or just misorientation in the lateral side.

      For context, the OHC1-2 misorientation pattern in the ptxA model at P0 does represent the end stage, as the same pattern is observed in adults (illustrated in Fig. 2A). In addition, OHC1-2 that express ptxA are inverted as soon as they break planar symmetry, and this was established at E16.5 in a previous publication where ptxA and Gpr156 misorientation patterns were compared and shown to be identical (Kindt et al., 2021 Supp. fig. 5C-D). However, we clearly failed to mention these important results in the original manuscript. We now cite Figure 2 for adult defects (line 522), and provide a citation for OHC1-2 inversion being observed from earliest stage of hair cell differentiation (Kindt et al., 2021) (line 519).

      The vast majority of Gnai2; Gnai3 double mutants die before weaning but the single specimen we managed to collect at P21 also showed inverted OHC1-2 (representative example in Fig. 2A). Again, we previously failed to point out this important result. We now do so line 214 and 555. This is another evidence that OHC1-2 misorientation is in fact similar in the ptxA and Gnai2; Gnai3 models (but milder and delayed in the latter).

      When it comes to IHCs and OHC3s however, the situation is less clear. These cell types are mildly misoriented in ptxA and Gpr156 mutants, but IHCs in particular appear severely misoriented in Gnai2; Gnai3 mutants based on the position of the basal body (Fig. 8). However, very dysmorphic hair bundles can pull on the basal body via the kinocilium and affect its position, which obscures hair cell orientation inferred from the basal body and subsequent interpretations. We do not delve on IHC and OHC3 and their orientation in Gnai2; Gnai3 mutants in the revision since we do not observe similar orientation defects in a different mouse model and lack sufficient adult data.

      Suggestions to improve upon the manuscript for readers:

      (1) Line 294, indicate on the figure the staining in bare zone and tips of stereocilia on row 1.

      Pertains to Figure 4. In A, we now point out the bare zone and stereocilia tips with arrow and arrowheads, respectively (as in other figures).

      (2) Fig.8 schematic diagram, the labels of the line and 90o side by side is misleading.

      We added black ticks for 0, 90, 180, 270 degree references. In contrast, the hair cell angle represented was switched to magenta.

      (3) Fig. 7 legend, redundancy towards the end of the paragraph.

      Thank you for catching this issue. A large portion of the legend was indeed accidentally repeated and is now deleted.

      (4) Line 490-493, Another plausible explanation is that other factors besides Gnai2 and Gnai3 are involved in breaking symmetry during bundle establishment.

      We now acknowledge that other proteins besides GNAI/O may be involved (Discussion line 614). That said, the notion that we do not achieve sufficient and/or early enough GNAI loss is supported for example by the Beer-Hammer 2018 study where no defects in symmetry-breaking or orientation were reported in their Gnai2 flox/flox; Gnai3 flox/flox model (Discussion new Line 637).

      (5) Line 518, the base were largely inverted (Figure 8B). Should Fig 8A be cited instead of 8B?

      Fig. 8B has graphs for the E17.5 cochlear base where OHC1-2 are inverted in both ptxA and Gnai2;3 DKO models. Fig. 8A has graphs of the E17.5 cochlear mid (less differentiated hair cells) where an inversion was not obvious previously, but is now clear although only partial in Gnai2; Gnai3 DKO (see above; raised eccentricity threshold). In the context of the previous text, this citation was thus correct. However, this section has been heavily modified to better compare Gnai2; Gnai3 DKO and ptxA and is hopefully less confusing in the revised version.

      Reviewer #2 (Public Review):

      Jarysta and colleagues set out to define how similar GNAI/O family members contribute to the shape and orientation of stereocilia bundles on auditory hair cells. Previous work demonstrated that loss of particular GNAI proteins, or inhibition of GNAIs by pertussis toxin, caused several defects in hair bundle morphogenesis, but open questions remained which the authors sought to address. Some of these questions include whether all phenotypes resulting from expression of pertussis toxin stemmed from GNAI inhibition; which GNAI family members are most critical for directing bundle development; whether GNAI proteins are needed for basal body movements that contribute to bundle patterning. These questions are important for understanding how tissue is patterned in response to planar cell polarity cues.

      To address questions related to the GNAI family in auditory hair cell development, the authors assembled an impressive and nearly comprehensive collection of mouse models. This approach allowed for each Gnai and Gnao gene to be knocked out individually or in combination with each other. Notably, a new floxed allele was generated for Gnai3 because loss of this gene in combination with Gnai2 deletion was known to be embryonic lethal. Besides these lines, a new knockin mouse was made to conditionally express untagged pertussis toxin following cre induction from a strong promoter. The breadth and complexity involved in generating and collecting these strains makes this study unique, and likely the authoritative last word on which GNAI proteins are needed for which aspect of auditory hair bundle development.

      Appropriate methods were employed by the authors to characterize auditory hair bundle morphology in each mouse line. Conclusions were carefully drawn from the data and largely based on excellent quantitative analysis. The main conclusions are that GNAI3 has the largest effect on hair bundle development. GNAI2 can compensate for GNAI3 loss in early development but incompletely in late development. The Gnai2 Gnai3 double mutant recapitulates nearly all the phenotypic effects associated with pertussis toxin expression and also reveals a role for GNAIs in early movement of the basal body. Although these results are not entirely unexpected based on earlier reports, the current results both uncover new functions and put putative functions on more solid ground.

      Based on this study, loss of GNAI1 and GNAO show a slight shortening of the tallest row of stereocilia but no other significant changes to bundle shape. Antibody staining shows no change in GNAI localization in the Gnai1 knockout, suggesting that little to no protein is found in hair cells. One caveat to this interpretation is that the antibody, while proposed to cross-react with GNAI1, is not clearly shown to immunolabel GNAI1. More than anything, this reservation mostly serves to illustrate how challenging it is to nail down every last detail. In turn, the comprehensive nature of the current study seems all the more impressive.

      (1) The original manuscript quantified stereocilia properties in Gnai1 and Gnai2 single mutants, and in Gnai1; Gnai2 double mutants using non-parametric t-tests (Mann-Whitney) for comparisons. This approach indeed suggested subtle reduction in row 1 height in IHCs in all 3 mutants. We did not quantify stereocilia features in Gnao1 mutants but could not observe defects (new Fig. 2 Supp. 1E-F). In fact, we could not observe defects in Gnai1 and Gnai2 single mutants, and in Gnai1; Gnai2 double mutants either. For this reason we have been ambivalent about reporting defects for Gnai1 and Gnai2 single and Gnai1; Gnai2 double mutants.

      In the revision, we applied a nested (hierarchical) t-test to avoid pseudo-replication (Eisner 2021; PMID: 33464305; https://pubmed.ncbi.nlm.nih.gov/33464305/). In our data, the nested t-tests structure measurements by animal instead of having all stereocilia or other cell measurements treated as independent values. This more stringent approach no longer finds row 1 height reduction significant in single Gnai1 or Gnai2 mutants, or in Gnai1; Gnai2 double mutants. We modified the text accordingly in Results and Discussion. Nested t-tests were applied uniformly across the manuscript and, besides IHC measurements in Fig. 2, now also apply to bare zone surface area in Fig. 6 and eccentricity in Fig. 7. For these experiments in contrast, previous conclusions are not changed. We think that this more careful statistical treatment is a closer representation of the data in term of the conclusions we can safely make.

      (2) The reviewer's criticism about antibody specificity is accurate and fair, and is fully addressed in the revised manuscript. First, we provide a phylogeny cartoon as Figure 1A to compare the GNAI/O proteins and highlight how closely related they are in sequence. To validate the assumption that our approach would detect GNAI1 if it were present in hair cells, we took a new dual experimental approach in the revision. First, we electroporated Gnai1, Gnai2 and Gnai3 expression constructs in the E13.5 inner ear and tested whether the two GNAI antibodies used in the study can detect ectopic GNAI1 in Kolliker organ. This revealed that “ptGNAI2” detects GNAI1 very well (in addition to GNAI2), but that “scbtGNAI3” does not detect GNAI1 efficiently (although it does detect GNAI3 very well). To verify in vivo that “ptGNAI2” can detect endogenous GNAI1, we immunolabeled the gallbladder epithelium in Gnai1 mutants and littermate controls using the “ptGNAI2” antibody. Based on IMPC consortium data* about the Gnai1 LacZ mouse strain, Gnai1 is specifically expressed in the adult gallbladder. We could verify that signals detected in the Gnai1 mutants were visually reduced in comparison to littermate controls. We now added this validation step in Results line 309 and the data in Fig. 4 Supp. 1A-B).

      *https://www.mousephenotype.org/data/genes/MGI:95771

      Reviewer #2 (Recommendations For The Authors):

      Minor comments that may marginally improve clarity.

      Abstract line 24: delete "nor polarized" because polarization cannot be assessed since the protein is undetectable.

      This is a fair point, now deleted.

      Consider revising: Lines 80-82; 188-202 (the order in which the mutants were presented was hard to follow for me); 239-240.

      Lines 80-82: Used to read as "Ptx recapitulates severe stereocilia stunting and immature-looking hair bundles observed when GPSM2 or both GNAI2 and GNAI3 are inactivated."

      Line 88: Was now changed to "Ptx provokes immature-looking hair bundles with severely stunted stereocilia, mimicking defects in Gpsm2 mutants and Gnai2; Gnai3 double mutants".

      Lines 188-202: This was the first paragraph describing adult stereocilia defects in the different Gnai/o mouse strains. We completely rewrote the entire section to reflect the order in which the strains appear in Figure 2, hopefully making the text easier to follow because it better matches panels in Fig. 2 . We also made several other modifications to streamline comparisons and better introduce the orientation defects that are later detailed at neonate stages.

      Lines 239-240: Used to read "GNAI2 makes a clear contribution since stereocilia defects increase in severity when GNAI loss extends from GNAI3 to both GNAI2 and GNAI3".

      Line 247: Was now changed for "GNAI2 makes a clear contribution since Gnai3neo stereocilia defects dramatically increase in severity when GNAI2 is absent as well in Gnai2; Gnai3 double mutants."

      Line 164: hardwired is unclear. Conserved?

      We modified this sentence as follows: Line 171: "We reasoned that apical HC development is probably highly constrained and less likely to be influenced by genetic heterogeneity compared to susceptibility to disease, for example."

      Line 299: It is not clear why GNAI1 is a better target than GNAI3. This phrase is repeated in line 303, I suspect inadvertently. Is there evidence that this antibody detects GNAI1, perhaps in another tissue? Line 308: GNAI1 may also not be detected by this antibody.

      Please see point 2 above. We removed these hypothetical statements entirely and we instead now experimentally show that one of the two commercial antibodies used can readily detect GNAI1 (yet does not detect signal in hair cells when GNAI2 and GNAI3 are absent in Fig. 4F).

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      (1) Substantial revision of the claims and interpretation of the results is needed, especially in the setting of additional data showing enhanced erythrophagocytosis with decreased RBC lifespan.

      Thank you for your valuable feedback and suggestion for a substantial revision of the claims and interpretation of our results. We acknowledge the importance of considering additional data that shows enhanced erythrophagocytosis with decreased RBC lifespan. In response, we have revised our manuscript and incorporated additional experimental data to support and clarify our findings.

      (1) In our original manuscript, we reported a decrease in the number of splenic red pulp macrophages (RPMs) and phagocytic erythrocytes after hypobaric hypoxia (HH) exposure. This conclusion was primarily based on our observations of reduced phagocytosis in the spleen.

      (2) Additional experimental data on RBC labeling and erythrophagocytosis:

      • Experiment 1 (RBC labeling and HH exposure)

      We conducted an experiment where RBCs from mice were labeled with PKH67 and injected back into the mice. These mice were then exposed to normal normoxia (NN) or HH for 7 or 14 days. The subsequent assessment of RPMs in the spleen using flow cytometry and immunofluorescence detection revealed a significant decrease in both the population of splenic RPMs (F4/80hiCD11blo, new Figure 5A and C) and PKH67-positive macrophages after HH exposure (as depicted in new Figure 5A and C-E). This finding supports our original claim of reduced phagocytosis under HH conditions.

      Author response image 1.

      -Experiment 2 (erythrophagocytosis enhancement)

      To examine the effects of enhanced erythrophagocytosis, we injected Tuftsin after administering PKH67-labelled RBCs. Our observations showed a significant decrease in PKH67 fluorescence in the spleen, particularly after Tuftsin injection compared to the NN group. This result suggests a reduction in RBC lifespan when erythrophagocytosis is enhanced (illustrated in new Figure 7, A-B).

      Author response image 2.

      (3) Revised conclusions:

      • The additional data from these experiments support our original findings by providing a more comprehensive view of the impact of HH exposure on splenic erythrophagocytosis.

      • The decrease in phagocytic RPMs and phagocytic erythrocytes after HH exposure, along with the observed decrease in RBC lifespan following enhanced erythrophagocytosis, collectively suggest a more complex interplay between hypoxia, erythrophagocytosis, and RBC lifespan than initially interpreted.

      We think that these revisions and additional experimental data provide a more robust and detailed understanding of the effects of HH on splenic erythrophagocytosis and RBCs lifespan. We hope that these changes adequately address the concerns raised and strengthen the conclusions drawn in our manuscript.

      (2) F4/80 high; CD11b low are true RPMs which the cells which the authors are presenting, i.e. splenic monocytes / pre-RPMs. To discuss RPM function requires the presentation of these cells specifically rather than general cells in the proper area of the spleen.

      Thank you for your feedback requesting a substantial revision of our claims and interpretation, particularly considering additional data showing enhanced erythrophagocytosis with decreased RBC lifespan. In response, we have thoroughly revised our manuscript and included new experimental data that further elucidate the effects of HH on RPMs and erythrophagocytosis.

      (1) Re-evaluation of RPMs population after HH exposure:

      • Flow cytometry analysis (new Figure 3G, Figure 5A and B): We revisited the analysis of RPMs (F4/80hiCD11blo) in the spleen after 7 and 14 days of HH exposure. Our revised flow cytometry data consistently showed a significant decrease in the RPMs population post-HH exposure, reinforcing our initial findings.

      Author response image 3.

      Author response image 4.

      • In situ expression of RPMs (Figure S1, A-D):

      We further confirmed the decreased population of RPMs through in situ co-staining with F4/80 and CD11b, and F4/80 and CD68, in spleen tissues. These results clearly demonstrated a significant reduction in F4/80hiCD11blo (Figure S1, A and B) and F4/80hiCD68hi (Figure S1, C and D) cells following HH exposure.

      Author response image 5.

      (2) Single-cell sequencing analysis of splenic RPMs:

      • We conducted a single-cell sequencing analysis of spleen samples post 7 days of HH exposure (Figure S2, A-C). This analysis revealed a notable shift in the distribution of RPMs, predominantly associated with Cluster 0 under NN conditions, to a reduced presence in this cluster after HH exposure.

      • Pseudo-time series analysis indicated a transition pattern change in spleen RPMs, with a shift from Cluster 2 and Cluster 1 towards Cluster 0 under NN conditions, and a reverse transition following HH exposure (Figure S2, B and D). This finding implies a decrease in resident RPMs in the spleen under HH conditions.

      (3) Consolidated findings and revised interpretation:

      • The comprehensive analysis of flow cytometry, in situ staining, and single-cell sequencing data consistently indicates a significant reduction in the number of RPMs following HH exposure.

      • These findings, taken together, strongly support the revised conclusion that HH exposure leads to a decrease in RPMs in the spleen, which in turn may affect erythrophagocytosis and RBC lifespan.

      Author response image 6.

      In conclusion, our revised manuscript now includes additional experimental data and analyses, strengthening our claims and providing a more nuanced interpretation of the impact of HH on spleen RPMs and related erythrophagocytosis processes. We believe these revisions and additional data address your concerns and enhance the scientific validity of our study.

      (3) RBC retention in the spleen should be measured anyway quantitatively, eg, with proper flow cytometry, to determine whether it is increased or decreased.

      Thank you for your query regarding the quantitative measurement of RBC retention in the spleen, particularly in relation to HH exposure. We have utilized a combination of techniques, including flow cytometry and histological staining, to investigate this aspect comprehensively. Below is a summary of our findings and methodology.

      (1) Flow cytometry analysis of labeled RBCs:

      • Our study employed both NHS-biotin (new Figure 4, A-D) and PKH67 labeling (new Figure 4, E-H) to track RBCs in mice exposed to HH. Flow cytometry results from these experiments (new Figure 4, A-H) showed a decrease in the proportion of labeled RBCs over time, both in the blood and spleen. Notably, there was a significantly greater reduction in the amplitude of fluorescently labeled RBCs after NN exposure compared to the reduced amplitude of fluorescently labeled RBCs observed in blood and spleen under HH exposure. The observed decrease in labeled RBCs was initially counterintuitive, as we expected an increase in RBC retention due to reduced erythrophagocytosis. However, this decrease can be attributed to the significantly increased production of RBCs following HH exposure, diluting the proportion of labeled cells.

      • Specifically, for blood, the biotin-labeled RBCs decreased by 12.06% under NN exposure and by 7.82% under HH exposure, while the PKH67-labeled RBCs decreased by 9.70% under NN exposure and by 4.09% under HH exposure. For spleen, the biotin-labeled RBCs decreased by 3.13% under NN exposure and by 0.46% under HH exposure, while the PKH67-labeled RBCs decreased by 1.16% under NN exposure and by 0.92% under HH exposure. These findings suggest that HH exposure leads to a decrease in the clearance rate of RBCs.

      Author response image 7.

      (2) Detection of erythrophagocytosis in spleen:

      To assess erythrophagocytosis directly, we labeled RBCs with PKH67 and analyzed their uptake by splenic macrophages (F4/80hi) after HH exposure. Our findings (new Figure 5, D-E) indicated a decrease in PKH67-positive macrophages in the spleen, suggesting reduced erythrophagocytosis.

      Author response image 8.

      (3) Flow cytometry analysis of RBC retention:

      Our flow cytometry analysis revealed a decrease in PKH67-positive RBCs in both blood and spleen (Figure S4). We postulated that this was due to increased RBC production after HH exposure. However, this method might not accurately reflect RBC retention, as it measures the proportion of PKH67-labeled RBCs relative to the total number of RBCs, which increased after HH exposure.

      Author response image 9.

      (4) Histological and immunostaining analysis:

      Histological examination using HE staining and band3 immunostaining in situ (new Figure 6, A-D, and G-H) revealed a significant increase in RBC numbers in the spleen after HH exposure. This was further confirmed by detecting retained RBCs in splenic single cells using Wright-Giemsa composite stain (new Figure 6, E and F) and retained PKH67-labelled RBCs in spleen (new Figure 6, I and J).

      Author response image 10.

      (5) Interpreting the data:

      The comprehensive analysis suggests a complex interplay between increased RBC production and decreased erythrophagocytosis in the spleen following HH exposure. While flow cytometry indicated a decrease in the proportion of labeled RBCs, histological and immunostaining analyses demonstrated an actual increase in RBCs retention in the spleen. These findings collectively suggest that while the overall RBCs production is upregulated following HH exposure, the spleen's capacity for erythrophagocytosis is concurrently diminished, leading to increased RBCs retention.

      (6) Conclusion:

      Taken together, our results indicate a significant increase in RBCs retention in the spleen post-HH exposure, likely due to reduced residual RPMs and erythrophagocytosis. This conclusion is supported by a combination of flow cytometry, histological staining, and immunostaining techniques, providing a comprehensive view of RBC dynamics under HH conditions. We think these findings offer a clear quantitative measure of RBC retention in the spleen, addressing the concerns raised in your question.

      (4) Numerous other methodological problems as listed below.

      We appreciate your question, which highlights the importance of using multiple analytical approaches to understand complex physiological processes. Please find below our point-by-point response to the methodological comments.

      Reviewer #1 (Recommendations For The Authors):

      (1) Decreased BM and spleen monocytes d/t increased liver monocyte migration is unclear. there is no evidence that this happens or why it would be a reasonable hypothesis, even in splenectomized mice.

      Thank you for highlighting the need for further clarification and justification of our hypothesized decrease in BM and spleen monocytes due to increased monocyte migration to the liver, particularly in the context of splenectomized mice. Indeed, our study has not explicitly verified an augmentation in mononuclear cell migration to the liver in splenectomized mice.

      Nonetheless, our investigations have revealed a notable increase in monocyte migration to the liver after HH exposure. Noteworthy is our discovery of a significant upregulation in colony stimulating factor-1 (CSF-1) expression in the liver, observed after both 7 and 14 days of HH exposure (data not included). This observation was substantiated through flow cytometry analysis (as depicted in Figure S4), which affirmed an enhanced migration of monocytes to the liver. Specifically, we noted a considerable increase in the population of transient macrophages, monocytes, and Kupffer cells in the liver following HH exposure.

      Author response image 11.

      Considering these findings, we hypothesize that hypoxic conditions may activate a compensatory mechanism that directs monocytes towards the liver, potentially linked to the liver’s integral role in the systemic immune response. In accordance with these insights, we intend to revise our manuscript to reflect the speculative nature of this hypothesis more accurately, and to delineate the strategies we propose for its further empirical investigation. This amendment ensures that our hypothesis is presented with full consideration of its speculative basis, supported by a coherent framework for future validation.

      (2) While F4/80+CD11b+ population is decreased, this is mainly driven by CD11b and F4/80+ alone population is significantly increased. This is counter to the hypothesis.

      Thank you for addressing the apparent discrepancy in our findings concerning the F4/80+CD11b+ population and the increase in the F4/80+ alone population, which seems to contradict our initial hypothesis. Your observation is indeed crucial for the integrity of our study, and we appreciate the opportunity to clarify this matter.

      (1) Clarification of flow cytometry results:

      • In response to the concerns raised, we revisited our flow cytometry experiments with a focus on more clearly distinguishing the cell populations. Our initial graph had some ambiguities in cell grouping, which might have led to misinterpretations.

      • The revised flow cytometry analysis, specifically aimed at identifying red pulp macrophages (RPMs) characterized as F4/80hiCD11blo in the spleen, demonstrated a significant decrease in the F4/80 population. This finding is now in alignment with our immunofluorescence results.

      Author response image 12.

      Author response image 13.

      (2) Revised data and interpretation:

      • The results presented in new Figure 3G and Figure 5 (A and B) consistently indicate a notable reduction in the RPMs population following HH exposure. This supports our revised understanding that HH exposure leads to a decrease in the specific macrophage subset (F4/80hiCD11blo) in the spleen.

      We’ve updated our manuscript to reflect these new findings and interpretations. The revised manuscript details the revised flow cytometry analysis and discusses the potential mechanisms behind the observed changes in macrophage populations.

      (3) HO-1 expression cannot be used as a surrogate to quantify number of macrophages as the expression per cell can decrease and give the same results. In addition, the localization of effect to the red pulp is not equivalent to an assertion that the conclusion applies to macrophages given the heterogeneity of this part of the organ and the spleen in general.

      Thank you for your insightful comments regarding the use of HO-1 expression as a surrogate marker for quantifying macrophage numbers, and for pointing out the complexity of attributing changes in HO-1 expression specifically to macrophages in the splenic red pulp. Your observations are indeed valid and warrant a detailed response.

      (1) Role of HO-1 in macrophage activity:

      • In our study, HO-1 expression was not utilized as a direct marker for quantifying macrophages. Instead, it was considered an indicator of macrophage activity, particularly in relation to erythrophagocytosis. HO-1, being upregulated in response to erythrophagocytosis, serves as an indirect marker of this process within splenic macrophages.

      • The rationale behind this approach was that increased HO-1 expression, induced by erythrophagocytosis in the spleen’s red pulp, could suggest an augmentation in the activity of splenic macrophages involved in this process.

      (2) Limitations of using HO-1 as an indicator:

      • We acknowledge your point that HO-1 expression per cell might decrease, potentially leading to misleading interpretations if used as a direct quantifier of macrophage numbers. The variability in HO-1 expression per cell indeed presents a limitation in using it as a sole indicator of macrophage quantity.

      • Furthermore, your observation about the heterogeneity of the spleen, particularly the red pulp, is crucial. The red pulp is a complex environment with various cell types, and asserting that changes in HO-1 expression are exclusive to macrophages could oversimplify this complexity.

      (3) Addressing the concerns:

      • To address these concerns, we propose to supplement our HO-1 expression data with additional specific markers for macrophages. This would help in correlating HO-1 expression more accurately with macrophage numbers and activity.

      • We also plan to conduct further studies to delineate the specific cell types in the red pulp contributing to HO-1 expression. This could involve techniques such as immunofluorescence or immunohistochemistry, which would allow us to localize HO-1 expression to specific cell populations within the splenic red pulp.

      We’ve revised our manuscript to clarify the role of HO-1 expression as an indirect marker of erythrophagocytosis and to acknowledge its limitations as a surrogate for quantifying macrophage numbers.

      (4) line 63-65 is inaccurate as red cell homeostasis reaches a new steady state in chronic hypoxia.

      Thank you for pointing out the inaccuracy in lines 63-65 of our manuscript regarding red cell homeostasis in chronic hypoxia. Your feedback is invaluable in ensuring the accuracy and scientific integrity of our work. We’ve revised lines 63-65 to accurately reflect the understanding.

      (5) Eryptosis is not defined in the manuscript.

      Thank you for highlighting the omission of a definition for eryptosis in our manuscript. We acknowledge the significance of precisely defining such key terminologies, particularly when they play a crucial role in the context of our research findings. Eryptosis, a term referenced in our study, is a specialized form of programmed cell death unique to erythrocytes. Similar with apoptosis in other cell types, eryptosis is characterized by distinct physiological changes including cell shrinkage, membrane blebbing, and the externalization of phosphatidylserine on the erythrocyte surface. These features are indicative of the RBCs lifecycle and its regulated destruction process.

      However, it is pertinent to note that our current study does not extensively delve into the mechanisms or implications of eryptosis. Our primary focus has been to elucidate the effects of HH exposure on the processes of splenic erythrophagocytosis and the resultant impact on the lifespan of RBCs. Given this focus, and to maintain the coherence and relevance of our manuscript, we have decided to exclude specific discussions of eryptosis from our revised manuscript. This decision aligns with our aim to provide a clear and concentrated exploration of the influence of HH exposure on RBCs dynamics and splenic function.

      We appreciate your input, which has significantly contributed to enhancing the clarity and accuracy of our manuscript. The revision ensures that our research is presented with a focused scope, aligning closely with our experimental investigations and findings.

      (6) Physiologically, there is no evidence that there is any "free iron" in cells, making line 89 point inaccurate.

      Thank you for highlighting the concern regarding the reference to "free iron" in cells in line 89 of our manuscript. The term "free iron" in our manuscript was intended to refer to divalent iron (Fe2+), rather than unbound iron ions freely circulating within cells. We acknowledge that the term "free iron" might lead to misconceptions, as it implies the presence of unchelated iron, which is not physiologically common due to the potential for oxidative damage. To rectify this and provide clarity, we’ve revised line 89 of our manuscript to reflect our meaning more accurately. Instead of "free iron," we use "divalent iron (Fe2+)" to avoid any misunderstanding regarding the state of iron in cells. We also ensure that any implications drawn from the presence of Fe2+ in cells are consistent with current scientific literature and understanding.

      (7) Fig 1f no stats

      We appreciate your critical review and suggestions, which help in improving the accuracy and clarity of our research. We’ve revised statistic diagram of new Figure 1F.

      (8) Splenectomy experiments demonstrate that erythrophagocytosis is almost completely replaced by functional macrophages in other tissues (likely Kupffer cells in the liver). there is only a minor defect and no data on whether it is in fact the liver or other organs that provide this replacement function and makes the assertions in lines 345-349 significantly overstated.

      Thank you for your critical assessment of our interpretation of the splenectomy experiments, especially concerning the role of erythrophagocytosis by macrophages in other tissues, such as Kupffer cells in the liver. We appreciate your observation that our assertions may be overstated and acknowledge the need for more specific data to identify which organs compensate for the loss of splenic erythrophagocytosis.

      (1) Splenectomy experiment findings:

      • Our findings in Figure 2D do indicate that in the splenectomized group under NN conditions, erythrophagocytosis is substantially compensated for by functional macrophages in other tissues. This is an important observation that highlights the body's ability to adapt to the loss of splenic function.

      • However, under HH conditions, our data suggest that the spleen plays an important role in managing erythrocyte turnover, as indicated by the significant impact of splenectomy on erythrophagocytosis and subsequent erythrocyte dynamics.

      (2) Addressing the lack of specific organ identification:

      • We acknowledge that our study does not definitively identify which organs, such as the liver or others, take over the erythrophagocytosis function post-splenectomy. This is an important aspect that needs further investigation.

      • To address this, we also plan to perform additional experiments that could more accurately point out the specific tissues compensating for the loss of splenic erythrophagocytosis. This could involve tracking labeled erythrocytes or using specific markers to identify macrophages actively engaged in erythrophagocytosis in various organs.

      (3) Revising manuscript statements:

      Considering your feedback, we’ve revised the statements in lines 345-349 (lines 378-383 in revised manuscript) to enhance the scientific rigor and clarity of our research presentation.

      (9) M1 vs M2 macrophage experiments are irrelevant to the main thrust of the manuscript, there are no references to support the use of only CD16 and CD86 for these purposes, and no stats are provided. It is also unclear why bone marrow monocyte data is presented and how it is relevant to the rest of the manuscript.

      Thank you for your critical evaluation of the relevance and presentation of the M1 vs. M2 macrophage experiments in our manuscript. We appreciate your insights, especially regarding the use of specific markers and the lack of statistical analysis, as well as the relevance of bone marrow monocyte data to our study's main focus.

      (1) Removal of M1 and M2 macrophage data:

      Based on your feedback and our reassessment, we agree that the results pertaining to M1 and M2 macrophages did not align well with the main objectives of our manuscript. Consequently, we have decided to remove the related content on M1 and M2 macrophages from the revised manuscript. This decision was made to ensure that our manuscript remains focused and coherent, highlighting our primary findings without the distraction of unrelated or insufficiently supported data.

      The use of only CD16 and CD86 markers for M1 and M2 macrophage characterization, without appropriate statistical analysis, was indeed a methodological limitation. We recognize that a more comprehensive set of markers and rigorous statistical analysis would be necessary for a meaningful interpretation of M1/M2 macrophage polarization. Furthermore, the relevance of these experiments to the central theme of our manuscript was not adequately established. Our study primarily focuses on erythrophagocytosis and red pulp macrophage dynamics under hypobaric hypoxia, and the M1/M2 polarization aspect did not contribute significantly to this narrative.

      (2) Clarification on bone marrow monocyte data:

      Regarding the inclusion of bone marrow monocyte data, we acknowledge that its relevance to the main thrust of the manuscript was not clearly articulated. In the revised manuscript, we provide a clearer rationale for its inclusion and how it relates to our primary objectives.

      (3) Commitment to clarity and relevance:

      We are committed to ensuring that every component of our manuscript contributes meaningfully to our overall objectives and research questions. Your feedback has been instrumental in guiding us to streamline our focus and present our findings more effectively.

      We appreciate your valuable feedback, which has led to a more focused and relevant presentation of our research. These changes enhance the clarity and impact of our manuscript, ensuring that it accurately reflects our key research findings.

      (10) Biotinolated RBC clearance is enhanced, demonstrating that RBC erythrophagocytosis is in fact ENHANCED, not diminished, calling into question the founding hypothesis that the manuscript proposes.

      Thank you for your critical evaluation of our data on biotinylated RBC clearance, which suggests enhanced erythrophagocytosis under HH conditions. This observation indeed challenges our founding hypothesis that erythrophagocytosis is diminished in this setting. Below is a summary of our findings and methodology.

      (1) Interpretation of RBC labeling results:

      Both the previous results of NHS-biotin labeled RBCs (new Figure 4, A-D) and the current results of PKH67-labeled RBCs (new Figure 4, E-H) demonstrated a decrease in the number of labeled RBCs with an increase in injection time. The production of RBCs, including bone marrow and spleen production, was significantly increased following HH exposure, resulting in a consistent decrease in the proportion of labeled RBCs via flow cytometry detection both in the blood and spleen of mice compared to the NN group. However, compared to the reduced amplitude of fluorescently labeled RBCs observed in blood and spleen under NN exposure, there was a significantly weaker reduction in the amplitude of fluorescently labeled RBCs after HH exposure. Specifically, for blood, the biotin-labeled RBCs decreased by 12.06% under NN exposure and by 7.82% under HH exposure, while the PKH67-labeled RBCs decreased by 9.70% under NN exposure and by 4.09% under HH exposure. For spleen, the biotin-labeled RBCs decreased by 3.13% under NN exposure and by 0.46% under HH exposure, while the PKH67-labeled RBCs decreased by 1.16% under NN exposure and by 0.92% under HH exposure.

      Author response image 14.

      (2) Increased RBCs production under HH conditions:

      It's important to note that RBCs production, including from bone marrow and spleen, was significantly increased following HH exposure. This increase in RBCs production could contribute to the decreased proportion of labeled RBCs observed in flow cytometry analyses, as there are more unlabeled RBCs diluting the proportion of labeled cells in the blood and spleen.

      (3) Analysis of erythrophagocytosis in RPMs:

      Our analysis of PKH67-labeled RBCs content within RPMs following HH exposure showed a significant reduction in the number of PKH67-positive RPMs in the spleen (new Figure 5). This finding suggests a decrease in erythrophagocytosis by RPMs under HH conditions.

      Author response image 15.

      (4) Reconciling the findings:

      The apparent contradiction between enhanced RBC clearance (suggested by the reduced proportion of labeled RBCs) and reduced erythrophagocytosis in RPMs (indicated by fewer PKH67-positive RPMs) may be explained by the increased overall production of RBCs under HH. This increased production could mask the actual erythrophagocytosis activity in terms of the proportion of labeled cells. Therefore, while the proportion of labeled RBCs decreases more significantly under HH conditions, this does not necessarily indicate an enhanced erythrophagocytosis rate, but rather an increased dilution effect due to higher RBCs turnover.

      (5) Revised interpretation and manuscript changes:

      Given these factors, we update our manuscript to reflect this detailed interpretation and clarify the implications of the increased RBCs production under HH conditions on our observations of labeled RBCs clearance and erythrophagocytosis. We appreciate your insightful feedback, which has prompted a careful re-examination of our data and interpretations. We hope that these revisions provide a more accurate and comprehensive understanding of the effects of HH on erythrophagocytosis and RBCs dynamics.

      (11) Legend in Fig 4c-4d looks incorrect and Fig 4e-4f is very non-specific since Wright stain does not provide evidence of what type of cells these are and making for a significant overstatement in the contribution of this data to "confirming" increased erythrophagocytosis in the spleen under HH exposure (line 395-396).

      Thank you for your insightful observations regarding the data presentation and figure legends in our manuscript, particularly in relation to Figure 4 (renamed as Figure 6 in the revised manuscript) and the use of Wright-Giemsa composite staining. We appreciate your constructive feedback and acknowledge the importance of presenting our data with utmost clarity and precision.

      (1) Amendments to Figure legends:

      We recognize the necessity of rectifying inaccuracies in the legends of the previously labeled Figure 4C and D. Corrections have been meticulously implemented to ensure the legends accurately contain the data presented. Additionally, we acknowledge the error concerning the description of Wright staining. The method employed in our study is Wright-Giemsa composite staining, which, unlike Wright staining that solely stains cytoplasm (RBC), is capable of staining both nuclei and cytoplasm.

      (2) Addressing the specificity of Wright-Giemsa Composite staining:

      Our approach involved quantifying RBC retention using Wright-Giemsa composite staining on single splenic cells post-perfusion at 7 and 14 days post HH exposure. We understand and appreciate your concerns regarding the nonspecific nature of Wright staining. Although Wright stain is a general hematologic stain and not explicitly specific for certain cell types, its application in our study aimed to provide preliminary insights. The spleen cells, devoid of nuclei and thus likely to be RBCs, were stained and observed post-perfusion, indicating RBC retention within the spleen.

      (3) Incorporating additional methods for RBC identification:

      To enhance the specificity of our findings, we integrated supplementary methods for RBC identification in the revised manuscript. We employed band3 immunostaining (in the new Figure 6, C-D and G-H) and PKH67 labeling (Figure 6, I-J) for a more targeted identification of RBCs. Band3, serving as a reliable marker for RBCs, augments the specificity of our immunostaining approach. Likewise, PKH67 labeling affords a direct and definitive means to assess RBC retention in the spleen following HH exposure.

      Author response image 16. same as 10

      (4) Revised interpretation and manuscript modifications:

      Based on these enhanced methodologies, we have refined our interpretation of the data and accordingly updated the manuscript. The revised narrative underscores that our conclusions regarding reduced erythrophagocytosis and RBC retention under HH conditions are corroborated by not only Wright-Giemsa composite staining but also by band3 immunostaining and PKH67 labeling, each contributing distinctively to our comprehensive understanding.

      We are committed to ensuring that our manuscript precisely reflects the contribution of each method to our findings and conclusions. Your thorough review has been invaluable in identifying and rectifying areas for improvement in our research report and interpretation.

      (12) Ferroptosis data in Fig 5 is not specific to macrophages and Fer-1 data confirms the expected effect of Fer-1 but there is no data that supports that Fer-1 reverses the destruction of these cells or restores their function in hypoxia. Finally, these experiments were performed in peritoneal macrophages which are functionally distinct from splenic RPM.

      Thank you for your critique of our presentation and interpretation of the ferroptosis data in Figure 5 (renamed as Figure 9 in the revised manuscript), as well as your observations regarding the specificity of the experiments to macrophages and the effects of Fer-1. We value your input and acknowledge the need to clarify these aspects in our manuscript.

      (1) Clarification on cell type used in experiments:

      • We appreciate your attention to the details of our experimental setup. The experiments presented in Figure 9 were indeed conducted on splenic macrophages, not peritoneal macrophages, as incorrectly mentioned in the original figure legend. This was an error in our manuscript, and we have revised the figure legend accordingly to accurately reflect the cell type used.

      (2) Specificity of ferroptosis data:

      • We recognize that the data presented in Figure 9 need to be more explicitly linked to the specific macrophage population being studied. In the revised manuscript, we ensure that the discussion around ferroptosis data is clearly situated within the framework of splenic macrophages.

      • We also provide additional methodological details in the 'Methods' section to reinforce the specificity of our experiments to splenic macrophages.

      (3) Effects of Fer-1 on macrophage function and survival:

      • Regarding the effect of Fer-1, we agree that while our data confirms the expected effect of Fer-1 in inhibiting ferroptosis, we have not provided direct evidence that Fer-1 reverses the destruction of macrophages or restores their function in hypoxia.

      • To address this, we propose additional experiments to specifically investigate the impact of Fer-1 on the survival and functional restoration of splenic macrophages under hypoxic conditions. This would involve assessing not only the inhibition of ferroptosis but also the recovery of macrophage functionality post-treatment.

      (4) Revised interpretation and manuscript changes:

      • We’ve revised the relevant sections of our manuscript to reflect these clarifications and proposed additional studies. This includes modifying the discussion of the ferroptosis data to more accurately represent the cell types involved and the limitations of our current findings regarding the effects of Fer-1.

      • The revised manuscript presents a more detailed interpretation of the ferroptosis data, clearly describing what our current experiments demonstrate and what remains to be investigated.

      We are grateful for your insightful feedback, which has highlighted important areas for improvement in our research presentation. We think that these revisions will enhance the clarity and scientific accuracy of our manuscript, ensuring that our findings and conclusions are well-supported and precisely communicated.

      Reviewer #2 (Recommendations For The Authors):

      The following questions and remarks should be considered by the authors:

      (1) The methods should clearly state whether the HH was discontinued during the 7 or 14 day exposure for cleaning, fresh water etc. Moreover, how was CO2 controlled? The procedure for splenectomy needs to be described in the methods.

      Thank you for your inquiry regarding the specifics of our experimental methods, particularly the management of HH exposure and the procedure for splenectomy. We appreciate your attention to detail and the importance of these aspects for the reproducibility and clarity of our research.

      (1) HH exposure conditions:

      In our experiments, mice were continuously exposed to HH for the entire duration of 7 or 14 days, without interruption for activities such as cleaning or providing fresh water. This uninterrupted exposure was crucial for maintaining consistent hypobaric conditions throughout the experiment. The hypobaric chamber was configured to ensure a ventilation rate of 25 air exchanges per minute. This high ventilation rate was effective in regulating the concentration of CO2 inside the chamber, thereby maintaining a stable environment for the mice.

      (2) The splenectomy was performed as follows:

      After anesthesia, the mice were placed in a supine position, and their limbs were fixed. The abdominal operation area was skinned, disinfected, and covered with a sterile towel. A median incision was made in the upper abdomen, followed by laparotomy to locate the spleen. The spleen was then carefully pulled out through the incision. The arterial and venous directions in the splenic pedicle were examined, and two vascular forceps were used to clamp all the tissue in the main cadre of blood vessels below the splenic portal. The splenic pedicle was cut between the forceps to remove the spleen. The end of the proximal hepatic artery was clamped with a vascular clamp, and double or through ligation was performed to secure the site. The abdominal cavity was then cleaned to ensure there was no bleeding at the ligation site, and the incision was closed. Post-operatively, the animals were housed individually. Generally, they were able to feed themselves after recovering from anesthesia and did not require special care.

      We hope this detailed description addresses your queries and provides a clear understanding of the experimental conditions and procedures used in our study. These methodological details are crucial for ensuring the accuracy and reproducibility of our research findings.

      (2) The lack of changes in MCH needs explanation? During stress erythropoiesis some limit in iron availability should cause MCH decrease particularly if the authors claim that macrophages for rapid iron recycling are decreased. Fig 1A is dispensable. Fig 1G NN control 14 days does not make sense since it is higher than 7 days of HH.

      Thank you for your inquiry regarding the lack of changes in Mean Corpuscular Hemoglobin (MCH) in our study, particularly in the context of stress erythropoiesis and decreased macrophage-mediated iron recycling. We appreciate the opportunity to provide further clarification on this aspect.

      (1) Explanation for stable MCH levels:

      • Our research identified a decrease in erythrophagocytosis and iron recycling in the spleen following HH exposure. Despite this, the MCH levels remained stable. This observation can be explained by considering the compensatory roles of other organs, particularly the liver and duodenum, in maintaining iron homeostasis.

      • Specifically, our investigations revealed an enhanced capacity of the liver to engulf RBCs and process iron under HH conditions. This increased hepatic erythrophagocytosis likely compensates for the reduced splenic activity, thereby stabilizing MCH levels.

      (2) Role of hepcidin and DMT1 expression:

      Additionally, hypoxia is known to influence iron metabolism through the downregulation of Hepcidin and upregulation of Divalent Metal Transporter 1 (DMT1) expression. These alterations lead to enhanced intestinal iron absorption and increased blood iron levels, further contributing to the maintenance of MCH levels despite reduced splenic iron recycling.

      (3) Revised Figure 1 and data presentation

      To address the confusion regarding the data presented in Figure 1G, we have made revisions in our manuscript. The original Figure 1G, which did not align with the expected trends, has been removed. In its place, we have included a statistical chart of Figure 1F in the new version of Figure 1G. This revision will provide a clearer and more accurate representation of our findings.

      (4) Manuscript updates and future research:

      • We update our manuscript to incorporate these explanations, ensuring that the rationale behind the stable MCH levels is clearly articulated. This includes a discussion on the role of the liver and duodenum in iron metabolism under hypoxic conditions.

      • Future research could explore in greater detail the mechanisms by which different organs contribute to iron homeostasis under stress conditions like HH, particularly focusing on the dynamic interplay between hepatic and splenic functions.

      We thank you for your insightful question, which has prompted a thorough re-examination of our findings and interpretations. We believe that these clarifications will enhance the overall understanding of our study and its implications in the context of iron metabolism and erythropoiesis under hypoxic conditions.

      (3) Fig 2 the difference between sham and splenectomy is really marginal and not convincing. Is there also a difference at 7 days? Why does the spleen size decrease between 7 and 14 days?

      Thank you for your observations regarding the marginal differences observed between sham and splenectomy groups in Figure 2, as well as your inquiries about spleen size dynamics over time. We appreciate this opportunity to clarify these aspects of our study.

      (1) Splenectomy vs. Sham group differences:

      • In our experiments, the difference between the sham and splenectomy groups under HH conditions, though subtle, was consistent with our hypothesis regarding the spleen's role in erythrophagocytosis and stress erythropoiesis. Under NN conditions, no significant difference was observed between these groups, which aligns with the expectation that the spleen's contribution is more pronounced under hypoxic stress.

      (2) Spleen size dynamics and peak stress erythropoiesis:

      • The observed splenic enlargement prior to 7 days can be attributed to a combination of factors, including the retention of RBCs and extramedullary hematopoiesis, which is known to be a response to hypoxic stress.

      • Prior research has elucidated that splenic stress-induced erythropoiesis, triggered by hypoxic conditions, typically attains its zenith within a timeframe of 3 to 7 days. This observation aligns with our Toluidine Blue (TO) staining results, which indicated that the apex of this response occurs at the 7-day mark (as depicted in Figure 1, F-G). Here, the culmination of this peak is characteristically succeeded by a diminution in extramedullary hematopoiesis, a phenomenon that could elucidate the observed contraction in spleen size, particularly in the interval between 7 and 14 days.

      • This pattern of splenic response under prolonged hypoxic stress is corroborated by studies such as those conducted by Wang et al. (2021), Harada et al. (2015), and Cenariu et al. (2021). These references collectively underscore that the spleen undergoes significant dynamism in reaction to sustained hypoxia. This dynamism is initially manifested as an enlargement of the spleen, attributable to escalated erythropoiesis and erythrophagocytosis. Subsequently, as these processes approach normalization, a regression in spleen size ensues.

      We’ve revised our manuscript to include a more detailed explanation of these splenic dynamics under HH conditions, referencing the relevant literature to provide a comprehensive context for our findings. We will also consider performing additional analysis or providing further data on spleen size changes at 7 days to support our observations and ensure a thorough understanding of the splenic response to hypoxic stress over time.

      (4) Fig 3 B the clusters should be explained in detail. If the decrease in macrophages in Fig 3K/L is responsible for the effect, why does splenectomy not have a much stronger effect? How do the authors know which cells died in the calcein stained population in Fig 3D?

      Thank you for your insightful questions regarding the details of our data presentation in Figure 3, particularly about the identification of cell clusters and the implications of macrophage reduction. We appreciate the opportunity to address these aspects and clarify our findings.

      (1) Explanation of cell clusters in Figure 3B:

      • In the revised manuscript, we have included detailed notes for each cell population represented in Figure 3B (Figure 3D in revised manuscript). These notes provide a clearer understanding of the cell types present in each cluster, enhancing the interpretability of our single-cell sequencing data.

      • This detailed annotation will help readers to better understand the composition of the splenic cell populations under study and how they are affected by hypoxic conditions.

      (2) Impact of splenectomy vs. macrophage reduction:

      • The interplay between the reduction in macrophage populations, as evidenced by our single-cell sequencing data, and the ramifications of splenectomy presents a multifaceted scenario. Notably, the observed decline in macrophage numbers following HH exposure does not straightforwardly equate to a comparable alteration in overall splenic function, as might be anticipated with splenectomy.

      • In the context of splenectomy under HH conditions, a significant escalation in the RBCs count was observed, surpassing that in non-splenectomized mice exposed to HH. This finding underscores the spleen's critical role in modulating RBCs dynamics under HH. It also indirectly suggests that the diminished phagocytic capacity of the spleen following HH exposure contributes to an augmented RBCs count, albeit to a lesser extent than in the splenectomy group. This difference is attributed to the fact that, while the number of RPMs in the spleen post-HH is reduced, they are still present, unlike in the case of splenectomy, where they are entirely absent.

      • Splenectomy entails the complete removal of the spleen, thus eliminating a broad spectrum of functions beyond erythrophagocytosis and iron recycling mediated by macrophages. The nuanced changes observed in our study may be reflective of the spleen's diverse functionalities and the organism's adaptive compensatory mechanisms in response to the loss of this organ.

      (3) Calcein stained population in Figure 3D:

      • Regarding the identification of cell death in the calcein-stained population in Figure 3D (Figure 3A in revised manuscript), we acknowledge that the specific cell types undergoing death could not be distinctly determined from this analysis alone.

      • The calcein staining method allows for the visualization of live (calcein-positive) and dead (calcein-negative) cells, but it does not provide specific information about the cell types. The decrease in macrophage population was inferred from the single-cell sequencing data, which offered a more precise identification of cell types.

      (4) Revised manuscript and data presentation:

      • Considering your feedback, we have revised our manuscript to provide a more comprehensive explanation of the data presented in Figure 3, including the nature of the cell clusters and the interpretation of the calcein staining results.

      • We have also updated the manuscript to reflect the removal of Figure 3K/L results and to provide a more focused discussion on the relevant findings.

      We are grateful for your detailed review, which has helped us to refine our data presentation and interpretation. These clarifications and revisions will enhance the clarity and scientific rigor of our manuscript, ensuring that our conclusions are well-supported and accurately conveyed.

      (5) Is the reduced phagocytic capacity in Fig 4B significant? Erythrophagocytosis is compromised due to the considerable spontaneous loss of labelled erythrocytes; could other assays help? (potentially by a modified Chromium release assay?). Is it necessary to stimulated phagocytosis to see a significant effect?

      Thank you for your inquiry regarding the significance of the reduced phagocytic capacity observed in Figure 4B, and the potential for employing alternative assays to elucidate erythrophagocytosis dynamics under HH conditions.

      (1) Significance of reduced phagocytic capacity:

      The observed reduction in the amplitude of fluorescently labeled RBCs in both the blood and spleen under HH conditions suggests a decrease in erythrophagocytosis. This is indicative of a diminished phagocytic capacity, particularly when contrasted with NN conditions.

      (2) Investigation of erythrophagocytosis dynamics:

      To delve deeper into erythrophagocytosis under HH, we employed Tuftsin to enhance this process. Following the injection of PKH67-labeled RBCs and subsequent HH exposure, we noted a significant decrease in PKH67 fluorescence in the spleen, particularly marked after the administration of Tuftsin. This finding implies that stimulated erythrophagocytosis can influence RBCs lifespan.

      (3) Erythrophagocytosis under normal and hypoxic conditions:

      Under normal conditions, the reduction in phagocytic activity is less apparent without stimulation. However, under HH conditions, our findings demonstrate a clear weakening of the phagocytic effect. While we established that promoting phagocytosis under NN conditions affects RBC lifespan, the impact of enhanced phagocytosis under HH on RBCs numbers was not explicitly investigated.

      (4) Potential for alternative assays:

      Considering the considerable spontaneous loss of labeled erythrocytes, alternative assays such as a modified Chromium release assay could provide further insights. Such assays might offer a more nuanced understanding of erythrophagocytosis efficiency and the stability of labeled RBCs under different conditions.

      (5) Future research directions:

      The implications of these results suggest that future studies should focus on comparing the effects of stimulated phagocytosis under both NN and HH conditions. This would offer a clearer picture of the impact of hypoxia on the phagocytic capacity of macrophages and the subsequent effects on RBC turnover.

      In summary, our findings indicate a diminished erythrophagocytic capacity, with enhanced phagocytosis affecting RBCs lifespan. Further investigation, potentially using alternative assays, would be beneficial to comprehensively understand the dynamics of erythrophagocytosis in different physiological states.

      (6) Can the observed ferroptosis be influenced by bi- and not trivalent iron chelators?

      Thank you for your question regarding the potential influence of bi- and trivalent iron chelators on ferroptosis under hypoxic conditions. We appreciate the opportunity to discuss the implications of our findings in this context.

      (1) Analysis of iron chelators on ferroptosis:

      In our study, we did not specifically analyze the effects of bi- and trivalent iron chelators on ferroptosis under hypoxia. However, our observations with Deferoxamine (DFO), a well-known iron chelator, provide some insights into how iron chelation may influence ferroptosis in splenic macrophages under hypoxic conditions.

      (2) Effect of DFO on oxidative stress markers:

      Our findings showed that under 1% O2, there was an increase in Malondialdehyde (MDA) content, a marker of lipid peroxidation, and a decrease in Glutathione (GSH) content, indicative of oxidative stress. These changes are consistent with the induction of ferroptosis, which is characterized by increased lipid peroxidation and depletion of antioxidants. Treatment with Ferrostatin-1 (Fer-1) and DFO effectively reversed these alterations. This suggests that DFO, like Fer-1, can mitigate ferroptosis in splenic macrophages under hypoxia, primarily by impacting MDA and GSH levels.

      Author response image 17.

      (3) Potential role of iron chelators in ferroptosis:

      The effectiveness of DFO in reducing markers of ferroptosis indicates that iron availability plays a crucial role in the ferroptotic process under hypoxic conditions. It is plausible that both bi- and trivalent iron chelators could influence ferroptosis, given their ability to modulate iron availability within cells. Since ferroptosis is an iron-dependent form of cell death, chelating iron, irrespective of its valence state, could potentially disrupt the process by limiting the iron necessary for the generation of reactive oxygen species and lipid peroxidation.

      (4) Additional research and manuscript updates:

      Our study highlights the need for further research to explore the differential effects of various iron chelators on ferroptosis, particularly under hypoxic conditions. Such studies could provide a more comprehensive understanding of the role of iron in ferroptosis and the potential therapeutic applications of iron chelators. We update our manuscript to include these findings and discuss the potential implications of iron chelation in the context of ferroptosis under hypoxic conditions. This will provide a broader perspective on our research and its significance in understanding the mechanisms of ferroptosis.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study provides insights into the IDA peptide with dual functions in development and immunity. The approach used is solid and helps to define the role of IDA in a two-step process, cell separation followed by activation of innate defenses. The main limitation of the study is the lack of direct evidence linking signaling by IDA and its HAE receptors to immunity. As such the work remains descriptive but it will nevertheless be of interest to a wide range of plant cell biologists.

      We thank the reviewers for thoroughly reading our manuscript. We have used their comments and suggestions- to improve the manuscript. Below is a response to the reviewer's comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper titled 'A dual function of the IDA peptide in regulating cell separation and modulating plant immunity at the molecular level' by Olsson Lalun et al., 2023 aims to understand how IDAHAE/HSL2 signalling modulates immunity, a pathway that has previously been implicated in development. This is a timely question to address as conflicting reports exist within the field. IDL6/7 have previously been shown to negatively regulate immune signalling, disease resistance and stress responses in leaf tissue, however IDA has been shown to positively regulate immunity through the shedding of infected tissues. Moreover, recently the related receptor NUT/HSL3 has been shown to positively regulate immune signalling and disease resistance. This work has the potential to bring clarity to this field, however the manuscript requires some additional work to address these questions. This is especially the case as it contracts some previous work with IDL peptides which are perceived by the same receptor complexes.

      Can IDA induce pathogen resistance? Does the infiltration of IDA into leaf tissue enhance or reduce pathogen growth? Previously it has been shown that IDL6 makes plants more susceptible. Is this also true for IDA? Currently cytoplasmic calcium influx and apoplastic ROS as overinterpreted as immune responses - these can also be induced by many developmental cue e.g. CLE40 induced calcium transients. Whilst gene expression is more specific is also true that treatment with synthetic peptides, which are recognised by LRR-RKs, can induce immune gene expression, especially in the short term, even when that is not there in vivo function e.g. doi.org/10.15252/embj.2019103894.

      We thank the reviewer for the concerns raised and agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and we plan for such experiments in future work. We have however, modified the discussion to include the possible role of IDA induced Ca2+ and ROS during development. We have recently published a preprint (accepted for publication in JXB) ( (Galindo-Trigo et al., 2023, https://doi.org/10.1101/2023.09.12.557497)) strengthening the link between IDA and defense by identifying WRKY transcription factors that regulate IDA expression through a Y1H assay.

      This paper shows that receptors other than hae/hsl2 are genetically required to induce defense gene expression, it would have been interesting to see what phenotype would be associated with higher order mutants of closely related haesa/haesa-like receptors. Indeed recently HSL1 has been shown to function as a receptor for IDA/IDL peptides. Could the triple mutant suppress all response? Could the different receptors have distinct outputs? For example for FRK1 gene expression the hae hsl2 mutant has an enhanced response. Could defence gene expression be primarily mediated by HSL1 with subfunctionalisation within this clade?

      We agree that it would be interesting to also include HSL1 in our studies. However, the focus of this study has been on HAE and HSL2 and we wanted to explore their role in IDA induced defense responses. Including HSL1 in these studies will require generation of multiple transgenic lines and repeating most of the experiments and are experiments we will consider in a follow up study together with pathogen assays (that would also address the main concern raised in the comment above). We have however, modified the text to include the known function of HSL1 and discuss the possibility of subfunctionalisation of this receptor clade.

      One striking finding of the study is the strong additive interaction between IDA and flg22 treatment on gene expression. Do the authors also see this for co-treatment of different peptides with flg22, or is this unique function of IDA? Is this receptor dependent (HAE/HSL1/HSL2)?

      This is a good question. Since our study focuses on the IDA signaling pathway we preferentially tested if the additive effect observed between flg22 and mIDA was also observed when mIDA was combined with another peptide involved in defense. The endogenous peptide PIP1, has previously been shown to amplify flg22 signaling (Hou et al 2014, doi:10.1371/journal.ppat.1004331 ). In this study it is shown that co-treatment with flg22 and PIP1 gives increased resistance to Pseudomonas PstDC3000 compared to when plants are treated with each peptide separately. In the same study, the authors also show reduced flg22 induce transcriptional activity of two defense related genes WRKY33 and PR in the receptor like kinase7 (rlk7) mutant (the receptor perceiving PIP1) (). To investigate whether PIP1 would give the same additive effect with mIDA as that observed between flg22 and mIDA, we co-treated seedlings with PIP1 and mIDA. We observed no enhanced transcriptional activity of FRK1, MYB51 and PEP3 in tissue from plants treated with both PIP1 and mIDA peptides compared to single exposure. These results are presented in supplementary figure 11. In conclusion we do not think mIDA acts as a general amplifier of all immune elicitors in plants.

      It is interesting how tissue specific calcium responses are in response to IDA and flg22, suggesting the cellular distribution of their cognate receptors. However, one striking observation made by the authors as well, is that the expression of promoter seems to be broader than the calcium response. Indicating that additional factors are required for the observed calcium response. Could diffusion of the peptide be a contributing factor, or are only some cells competent to induce a calcium response?

      It is interesting that the authors look for floral abscission phenotypes in cngc and rbohd/f mutants to conclude for genetic requirement of these in floral abscission. Do the authors have a hypothesis for why they failed to see a phenotype for the rbohd/f mutant as was published previously? Do you think there might be additional players redundantly mediating these processes?

      It is a possibility that diffusion of the peptide plays a role in the observed response. In a biological context we would assume that the local production of the peptides plays an important role in the cellular responses. In our experimental setup, we add the peptide externally and we can therefore assume that the overlaying cells get in contact with the peptide before cells in the inner tissues and this could be affecting the response recorded However, our results show that there is a differences between flg22 and mIDA induced responses even when the application of the peptides is performed in the same manner, indicating that the difference in the response is not primarily due to the diffusion rate of the peptides but is likely due to different factors being present in different cells. To acquire a better picture of the distribution of receptor expression in the root tissue and to investigate in which cells the receptors have an overlapping expression pattern, we have included results in figure 6 showing plant lines co-expressing transcriptional reporters of FLS2 and HAE or HSL2.

      Can you observe callose deposition in the cotyledons of the 35S::HAE line? Are the receptors expressed in native cotyledons? This is the only phenotype tested in the cotyledons.

      We thank the reviewer for this valuable comment. We have now conducted callose deposition assay on the 35S:HAE line. And Indeed, we observe callose depositions when cotyledons from a 35S:HAE line is treated with mIDA. We have included these results in figure 4 and have adjusted the text regarding the callose assay accordingly. In addition, we have analyzed the promoter activity of pHAE in cotelydons and we observe weak promoter activity. These results are included as supplementary figure 1d.

      Are flg22-induced calcium responses affected in hae hsl2?

      The experiment suggested by the reviewer is an important control to ensure that the hae hsl2-Aeq line can respond to a Ca2+ inducing peptide signaling through a different receptor than HAE or HSL2. One would expect to see a Ca2+ response in this line to the flg22 peptide. We performed this experiment and surprisingly we could not detect a flgg22 induced Ca2+ signal in the hae hsl2 mutnt. As it is unlikely that the Ca2+ response triggered by flg22 is dependent on HAE and HSL2 we have to assume that the lack of response is due to a malfunction of the Aeq sensor in this line. As a control to measure the amount of Aeq present in the cells we treat the Aeq seedlings with 2 M CaCl2 and measure the luminescence constantly for 180 seconds (Ranf et al., 2012, DOI10.1093/mp/ssr064). The CaCl2 treatment disrupts the cells and releases the Aeq sensor into the solution where it will react with Ca2+ and release the total possible response in the sample (Lmax) in form of a luminescent peak. When treating the hae hsl2-Aeq line with CaCl2we observe a luminescent peak, indicating the presence of the sensor, however, the response is reduced compared to WT seedlings expressing Aeq. Given the sensitivity of FLS2 to flg22 one would still expect to see a Ca2+ peak in the hae hsl2-Aeq line even if the amount of sensor is reduced. Given that this is not the case, we have to assume that localization or conformation of the sensor is somehow affected in this line or that there is another biological explanation that we cannot explain at the moment.

      We have therefore opted on omitting the results using the hae hsl2 Aeq lines from the manuscript and are in the process of mutating HAE and HSL2 by CRISPR-Cas9 in the Aeq background to verify that the mIDA triggered Ca2+ response is dependent on HAE and HSL2.

      Reviewer #2 (Public Review):

      Lalun and co-authors investigate the signalling outputs triggered by the perception of IDA, a plant peptide regulating organs abscission. The authors observed that IDA perception leads to a transient influx of Ca2+, to the production of reactive oxygen species in the apoplast, and to an increase accumulation of transcripts which are also responsive to an immunogenic epitope of bacterial flagellin, flg22. The authors show that IDA is transcriptionally upregulated in response to several biotic and abiotic stimuli. Finally, based on the similarities in the molecular responses triggered by IDA and elicitors (such as flg22) the authors proposed that IDA has a dual function in modulating abscission and immunity. The manuscript is rather descriptive and provide little information regarding IDA signalling per se. A potential functional link between IDA signalling and immune signalling remains speculative.

      We thank the reviewer for the concerns raised and agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and plan for such experiments in future work.

      Reviewer #3 (Public Review):

      Previously, it has been shown the essential role of IDA peptide and HAESA receptor families in driving various cell separation processes such as abscission of flowers as a natural developmental process, of leaves as a defense mechanism when plants are under pathogenic attack or at the lateral root emergence and root tip cell sloughing. In this work, Olsson et al. show for the first time the possible role of IDA peptide in triggering plant innate immunity after the cell separation process occurred. Such an event has been previously proposed to take place in order to seal open remaining tissue after cell separation to avoid creating an entry point for opportunistic pathogens.

      The elegant experiments in this work demonstrate that IDA peptide is triggering the defenseassociated marker genes together with immune specific responses including release of ROS and intracellular CA2+. Thus, the work highlights an intriguing direct link between endogenous cell wall remodeling and plant immunity. Moreover, the upregulation of IDA in response to abiotic and especially biotic stimuli are providing a valuable indication for potential involvement of HAE/IDA signalling in other processes than plant development.

      We are pleased that the reviewer finds our findings linking IDA to defense interesting and would like to thank the reviewer for this positive feedback.

      Strengths:

      The various methods and different approaches chosen by the authors consolidates the additional new role for a hormone-peptide such as IDA. The involvement of IDA in triggering of the immunity complex process represents a further step in understanding what happens after cell separation occurs. The Ca2+ and ROS imaging and measurements together with using the haehsl2 and haehsl2 p35S::HAE-YFP genotypes provide a robust quantification of defense responses activation. While Ca2+ and ROS can be detected after applying the IDA treatment after the occurrence of cell separation it is adequately shown that the enzymes responsible for ROS production, RBOHD and RBOHF, are not implicated in the floral abscission.

      Furthermore, IDA production is triggered by biotic and abiotic factors such as flg22, a bacterial elicitor, fungi, mannitol or salt, while the mature IDA is activating the production of FRK1, MYB51 and PEP3, genes known for being part of plant defense process.

      Thank you.

      Weaknesses:

      Even though there is shown a clear involvement of IDA in activating the after-cell separation immune system, the use of p35S:HAE-YFP line represent a weak point in the scientific demonstration. The mentioned line is driving the HAE receptor by a constitutive promoter, capable of loading the plant with HAE protein without discriminating on a specific tissue. Since it is known that IDA family consist of more members distributed in various tissues, it is very difficult to fully differentiate the effects of HAE present ubiquitously.

      We agree on this statement. Nevertheless, it is important to note that the responses we have observed are not detectable in WT plants that do not (over)express the HAE receptors. Suggesting that the ROS and callose deposition are induced by the addition of mIDA peptide and not the potential presence of the endogenous IDL peptides.

      The co-localization of HAE/HSL2 and FLS2 receptors is a valuable point to address since in the present work, the marker lines presented do not get activated in the same cell types of the root tissues which renders the idea of nanodomains co-localization (as hypothetically written in the discussion) rather unlikely.

      Thank you for raising an important aspect of our study. It is true that not all cells in the root which have promoter activity for FLS2 also exhibit promoter activity for either HAE or HSL2. However, we have observed that certain cells in the roots show promoter activity for both receptors. In the revised version of the manuscript, we have included plants expression a transcriptional promoter for both FLS2 and HAE or HSL2 using different fluorescent proteins. We have investigated overlapping promoter activity both at sites of lateral roots, in the tip of the primary root and in the abscission zone. Our results show overlapping expression of the transcriptional reporters in certain cells, indicating that FLS2 and HAE or HSL2 are likely to be found in some of the same cells during plant development. We also observe cells where only one or none of the promoters are active.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Supplementary Figure 3: re-labelling of y axis; 200 than 200,00 for clarity.

      This has been addressed.

      Supplementary Figure 2: It would be good to include the age of the seedlings used to study calcium influx in the legend.

      This has been addressed.

      Supplementary Figure 1: rephrase 'IDA induces ROS production in Arabidopsis'.

      This has been addressed.

      The use of chelating agents to establish the need of calcium from extracellular space is a clear experiment supporting the calcium response phenotype specific to IDA treatment in seedlings. Removing the last asparagine (N) and using it as a peptide that fails to elicit calcium response could simply be because of the peptide is smaller in length or different chemical properties. Therefore, a scrambled sequence would have been a better control.

      We thank the reviewer for the suggestion of using a scrambled peptide as a negative control, however we find it unlikely that mIDA∆N69 could induce any activity based on previous work. Results from crystal structure of mIDA bound to the HAE receptor and ligand-receptor interaction studies (10.7554/eLife.15075 ) show that the last asparagine in the mIDA peptide is essential for detectable binding to the HAE receptor and that a peptide lacking this amino acid does not have any activity. We will however, in future experiments also include a scrambled version of the peptide as an additional control.

      Reviewer #2 (Recommendations For The Authors):

      Please find below specific comments:

      (1) Most of the molecular outputs triggered by IDA can be considered as common molecular marks of plant peptides signalling, they do not represent strong evidences of a potential function of IDA in modulating immunity. For instance, perception of CIF peptides, which control the establishment of the Casparian strips, regulate the production of reactive oxygen species, and the transcription of genes associated with immune responses (Fujita et al., The EMBO Journal 2020). It should also be considered that FRK1, whose function remains unknown, may be involved in both immunity and abscission and that the upregulation of FRK1 upon IDA treatment is not indicative of active modulation of immune signalling by IDA.

      This is a fair point raised by the reviewer and we now address in the manuscript that ROS and Ca2+ are hallmarks of both plant development and defense. The function of FRK1 is not known however, it is unlikely that the upregulation of FRK1 in response to mIDA plays a role in the developmental progression of abscission as it is not temporally regulated during the abscission process, thus making it an unlikely candidate in the regulation of cell separation (Cai & Lashbrook, 2008, https://doi.org/10.1104/pp.107.110908). We do however agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and plan for such experiments in future work.

      (2) It remains unknown whether IDA modulate immunity. For instance, does IDA perception promote resistance to bacteria (bacterial proliferation, disease symptoms)? Is IDA genetically required for plant disease resistance immunity? Is the IDA signalling pathway genetically required for transcriptional changes induced by flg22, such as increase in FRK1 transcripts? In addition, the authors propose that the proposed function of IDA in modulating immune signalling prevents bacterial infection in tissue exposed to stress(es). Does loss of function of IDA or of its corresponding receptors leads to changes in the ability of bacteria to colonise plant root upon stress(es)?

      Please see the comment above regarding pathogen assays.

      (3) Several aspects of the work appear to correspond to preliminary investigation. For instance, the authors analyse loss of function mutant for genes encoding for Ca2+ permeable channels (CNGCs) which are transcriptionally active during the onset of abscission (Sup. Figure 5). None of the single mutants present an abscission defect. These observations provide no information regarding the identity of the channel(s) involved in IDA-induced calcium influx.

      We agree with the reviewer that we have not been able to identify the channels responsible for the IDA-induced calcium influx. Given the redundancy for many of the members of this multigenic family a future approach to identify proteins responsible for the IDA triggered calcium response could be to create multiple KO mutants by CRISPR Cas9.

      (4) Using H2DCF-DA, the authors observed a decrease in ROS accumulation in the abscission zone of rbohd/rbohf double KO line (Sup Figure 5c) but describe in the text that ROS production in this zone does not depend on RBOHD and RBOHF (L220). Please clarify.

      This has now been clarified in the text.

      (5) The authors describe that rbohd/rbohf double KO present a lower petal break-strength, which they describe as an indication of premature cell wall loosening, and that petals of rbohd/rbohf abscised one position earlier than in WT. Yet, the authors postulate that IDA-induced ROS production does not regulate abscission but may regulate additional responses. Instead the data seems to indicate that ROS production by RBOHD and RBOHF regulate the timing of abscission. In addition, it would have been interesting to test whether IDA signalling pathway regulate ROS production in the abscission zone.

      The rbohd and rbohf double mutants show several phenotypes associated to developmental stress, the mild phenotype observed with regards to premature abscission (by one position) could be caused by the phenotype of the double mutant rather than related to ROS production. Indeed, it has been suggested that the lignified brace in the AZ dependent on ROS production by the aforementioned RBOHs in necessary for the correct concentration of cell modifying enzymes (Lee et al., 2018, https://doi.org/10.1016/j.cell.2018.03.060). The precocious abscission in this double mutant clearly shows this not to be the case. We have tried to do a ROS burst assay on AZ tissue/flowers with the mIDA peptide but have not been successful with this approach. A ROS sensor expressed in AZ tissue would be a valuable tool to address whether IDA signalling regulates ROS production in AZs.

      (6) In Sup. Figure5a, it would be of interest to have a direct comparison of the transcript accumulation of the presented CNGCs and RBOHDs with other of these multigenic families.

      The CNGCs and RBOH gene expression profile shown in the figure are the family members expressed during the developmental progress of floral abscission in stamen AZs. Since there is no difference in the temporal expression of the other family members (and most are either not expressed or very weakly expressed in this tissue) it is not possible to do this comparison (Cai & Lashbrook, 2008, https://doi.org/10.1104/pp.107.110908).

      (7) L251-253, since IDAdeltaN69 cannot be perceived by its receptors, the absence of induction of pIDA::GUS by IDAdeltaN69 compared to flg22 cannot be seen as a sign of specificity in peptideinduced increase in IDA promotor activity.

      We have rephased this in the text

      (8) Please provide quantitative and statistical analysis of the calcium measurement presented in sup figure 3.

      This has been addressed.

      (9) L339-341; This sentence is unclear to me, please rephrase.

      We have rephased this in the text

      Reviewer #3 (Recommendations For The Authors):

      (1) In order to assess the role of CNGCs in abscission process, it would be more interesting to see the effect on the Ca2+ pattern and ROS signaling after application of mIDA on cngc and rbohf rbohd mutants.

      We agree in this statement and the studies on mIDA induced ROS and Ca2+ on these mutants will provide valuable information to the regulation of the response. We are in the process of making the lines needed to be able to perform these experiments. However, since it requires crossing of genetically encoded sensors into each mutant, and generation of higher order mutants this is a long process.

      (2) With regard to the ROS production (Sup Fig. 1), the application of mIDA can trigger ROS in p35S::HAE:YFP lines, but not in the wild-type plant, which is according to the text "most likely due to the absence of HAE expression" in leaves. The experiment on callose deposition is performed in wild-type cotyledons where no callose deposition could be observed after mIDA treatment (Fig. 4a,b). The conclusion from text is that IDA "is not involved in promoting deposition of callose as a long-term defence response". It appears more likely that neither ROS nor callose can be observed in wild-type plants due to the lack of HAE expression. Therefore, the callose experiment should include the p35S::HAE:YFP lines. The experiment as it is does not allow to draw any conclusion on HAE/IDA involvement in callose formation.

      We fully agree with this comment, thank you for pinpointing this out. We have now performed the callose experiment with the 35S:HAE lines. Please see our answer to reviewer #1.

      (3) Between Sup Fig. 3 and Sup Fig. 5 two different systems were used to asses the floral stage. An adjustment of the floral stages would be easier to convey the levels of HAE/HSL2 expression and hence potentially with the onset of cell-wall degradation.

      We now used the same system to assess floral stages throughout the whole manuscript.

      (4) For the Fig. 1 and 2, it will be helpful to mention the genotype used for imaging/quantification of Ca2+.

      This has been addressed.

      (5) Some of the abbreviations are not introduced as full-text at their first time use in the text, such as: mIDA (Line 68), Ef-Tu (line 85), NADPH (line 77).

      The abbreviations have now been introduced.

      (6) In the legend of Fig. 5 (lines 897 and 898)- in the figure description, the box plots are identified as light gray and dark gray, while in the panel a of the figure the box plots are colored in red and blue.

      Thank you for pointing this out, this has now been corrected.

      (7) In figure 1 and 2. the authors write that the number of replicates is 10 (n=10) but data represents a single analysis. Please provide the quantitative ROI analysis, demonstrating that the observed example is representative. This is particularly important since the authors claim very specific changes in pattern of Ca signaling between mIDA and FLG22 treatments (Line 148).

      (8) Figure 4: please use alternative scaling on the Y axis instead of breaks.

      This has now been fixed.

      (9) Figure 5: it is not clear what n=4 refers to when the authors state three independent replicates. In figure 6 they state 4 technical reps and 3 biological reps. Please ensure this is similar across all descriptions.

      We have now ensured the correct information in all descriptions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on Legionella pneumophila effector proteins that target host vesicle trafficking GTPases during infection and more specifically modulate ubiquitination of the host GTPase Rab10. The evidence supporting the claims of the authors is solid, although it remains unclear how modification of the GTPase Rab10 with ubiquitin supports Legionella virulence and the impact of ubiquitination during LCV formation. The work will be of interest to colleagues studying animal pathogens as well as cell biologists in general.

      We greatly appreciate the positive and valuable feedback from the editors and the reviewers. According to their suggestions, we added many new experimental data and implications of our findings in Legionella virulence in terms of the biological process of its replication niche. Please find our point-to-point responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Kubori and colleagues characterized the manipulation of the host cell GTPase Rab10 by several Legionella effector proteins, specifically members of the SidE and SidC family. They show that Rab10 undergoes both conventional ubiquitination and noncanonical phosphoribose-ubiquitination, and that this posttranslational modification contributes to the retention of Rab10 around Legionella vacuoles.

      Strengths

      Legionella is an emerging pathogen of increasing importance, and dissecting its virulence mechanisms allows us to better prevent and treat infections with this organism. How Legionella and related pathogens exploit the function of host cell vesicle transport GTPases of the Rab family is a topic of great interest to the microbial pathogenesis field. This manuscript investigates the molecular processes underlying Rab10 GTPase manipulation by several Legionella effector proteins, most notably members of the SidE and SidC families. The finding that MavC conjugates ubiquitin to SdcB to regulate its function is novel, and sheds further light into the complex network of ubiquitin-related effectors from Lp. The manuscript is well written, and the experiments were performed carefully and examined meticulously.

      Weaknesses

      Unfortunately, in its current form this manuscript offers only little additional insight into the role of effector-mediated ubiquitination during Lp infection beyond what has already been published. The enzymatic activities of the SidC and SidE family members were already known prior to this study, as was the importance of Rab10 for optimal Lp virulence. Likewise, it had previously been shown that SidE and SidC family members ubiquitinate various host Rab GTPases, like Rab33 and Rab1. The main contribution of this study is to show that Rab10 is also a substrate of the SidE and SidC family of effectors. What remains unclear is if Rab10 is indeed the main biological target of SdcB (not just 'a' target), and how exactly Rab10 modification with ubiquitin benefits Lp infection.

      Reviewer #1 (Recommendations for The Authors):

      Major points of concern

      (1) The authors show that SdcB increases Rab10 levels on LCVs at later times of infection and conclude that this is its main biological role. An alternative explanation may be that Rab10 is not 'the main' target of SdcB but merely 'a' target, which may explain why the effect of SdcB on Rab10 accumulation on LCV is only detectable after several hours of infection. An unbiased omics-based approach to identify the actual host target(s) of SdcB may be needed to confirm that Rab10 modification by SdcB is biologically relevant.

      We totally agree with your comment that SdcB should have multiple targets considering the abundance of ubiquitin observed on the LCVs when SdcB was expressed (Figure 3). However, the effect of SdcB on Rab10 accumulation at the later time point (7 h) (current Figure 4e) was well supported by the new data showing that the SdcB-mediated ubiquitin conjugation to Rab10 was highly detected at this time point (new Figure 4c). We have tried the comprehensive search of interaction partners of the ANK domain of SdcB. This analysis is planned to be included in our on-going study. We therefore decided not to add the data in this manuscript.

      (2) The authors show that Rab10 within cell lysate is ubiquitinated and conclude that ubiquitination of Rab10 is directly responsible for its retention on the LCV. What is the underlying molecular mechanism for this retention? Are GAP proteins prevented from binding and deactivating Rab10. This may be worth testing.

      It would be a fantastic hypothesis that a Rab10GAP is involved in the regulation of Rab10 localization on the LCV. However, as far as we know, GAP proteins against Rab10 have not been identified yet. It should be an important issue to be addressed when a Rab10GAP will be found.

      (3) Related to this, an alternative explanation would be that Rab10 retention is an indirect effect where inactivators of Rab10, such as host cell GAP proteins, are the main target of SidE/C family members and sent for degradation (see point #1). Can the authors show that Rab10 on the LCV is indeed ubiquitinated?

      The possible involvement of a putative Rab10GAP is currently untestable as it is not known. To address whether Rab10 located on the LCV is ubiquitinated nor not, we conducted the critical experiments using active Rab10 (QL) and inactive Rab10 (TN) (new Figure 4a, new Figure 4-figure supplement 1). As revealed for Rab1 (Murata et al., Nature Cell Biol. 2006; Ingmundson et al., Nature 2007), Rab10 is expected to be recruited to the LCV as a GDPbound inactive form and converted to a GTP-bound active form on the LCV. The new results clearly demonstrated that GTP-locked Rab10QL is preferentially ubiquitinated upon infection, strongly supporting the model; Rab10 is ubiquitinated “on the LCV” by the SidE and SidC family ligases.

      (4) Also, on what residue(s) is Rab10 ubiquitinated? Jeng et. al. (Cell Host Microbe, 2019, 26(4): 551-563)) suggested that K102, K136, and K154 of Rab10 are modified during Lp infection. How does substituting those residues affect the residency of Rab10 on LCVs? Addressing these questions may ultimately help to uncover if the growth defect of a sidE gene cluster deletion strain is due to its inability to ubiquitinate and retain Rab10 on the LCV.

      Thank you for the suggestion. We conducted mutagenesis of the three Lys residues of Rab10 and applied the derivative on the ubiquitination analysis (new Figure 1-figure supplement 1). The Lys substitution to Ala residues did not abrogate the ubiquitination upon Lp infection. This result indicates that ubiquitination sites are present in the other residue(s) including the PR-ubiquitination site(s), raising possibility that disruption of sidE genes would be detrimental for intracellular growth of L. pneumophila because of failure of Rab10 retention.

      (5) The authors proposed that "the SidE family primarily contributes towards ubiquitination of Rab10". In this case, what is the significance of SdcB-mediated ubiquitination of Rab10 during Lp infection?

      We found that the major contribution of SdcB is retention of Rab10 until the late stage of infection. This claim was supported by our new data (new Figure 4c) as mentioned above (response to comment #1).

      (6) The contribution of SdcB to ubiquitination of Rab10 relative to SidC and SdcA is unclear. SidC is shown to be unaffected by MavC. In this case, SidC can ubiquitinate Rab10 regardless of the regulatory mechanism of SdcB by MavC. This is not further being examined or discussed in the manuscript.

      The effect of intrinsic MavC is apparent at the later stage (9 h) of infection (Figure 7c) when SdcB gains its activity (see above). We therefore do not think that the contribution of MavC on the SidC/SdcA activities, which are effective in the early stage, would impact on Rab10 localization. However, without specific experiments addressing this issue, possible MavC effects on SidC/SdcA would be beyond the scope in this manuscript.

      (7) When is Rab10 required during Lp infection? The authors showed that Rab10 levels at LCV are rather stable from 1hr to 7hr post infection. If MavC regulates the activity of SdcB, when does this occur?

      While the Rab10 levels on the LCV (~40 %) are stable during 1-7 h post infection (Figure 2b), it reduced to ~20% at 9 h after infection (Figure 7c) (the description was added in lines 304-306). Rab10 seems to be required for optimal LCV biogenesis over the early to late stages, but may not be required at the maturation stage (9 h). We validated the effect of MavC on the Rab10 localization at this time point (Figure 7c). These observations allowed us to build the scheme described in Figure 7d. We revised the illustration in new Figure 7d according to the helpful suggestions from both the reviewers.

      (8) Previous analyses by MS showed that ubiquitination of Rab10 in Lp-infected cells decreases over time (from 1 hpi to 8 hpi - Cell Host Microbe, 2019, 26(4): 551-563). How does this align with the findings made here that Rab10 levels on the LCV and likely its ubiquitination levels increase over time?

      We carefully compared the Rab10 ubiquitination at 1 h and 7 h after infection (new Figure 1figure supplement 1b). This analysis showed that the level of its ubiquitination decreased over time in agreement with the previous report. Nevertheless, Rab10 was still significantly ubiquitinated at 7 h, which we believe to cause the sustained retention of Rab10 on the LCV at this time point. We added the observation in lines 146-148.

      (9) Polyubiquitination of Rab10 was not detected in cells ectopically producing SdcB and SdeA lacking its DUB domain (Figure 7 - figure supplement 2). Does SdcB actually ubiquitinate Rab10 (see also point #5)? Along the same line, it is curious to find that the ubiquitination pattern of Rab10 is not different for LpΔsidC/ΔsdcA compared to LpΔsidC/dsdcA/dsdcB (Figure 1C). The actual contribution of SdcB to ubiquitinating Rab10 compared to SidC/SdcA thus needs to be clarified.

      Thank you for the important point. We currently hypothesize that SidC/SdcA/SdcB-mediated ubiquitin conjugation can occur only in the presence of PR-ubiquitin on Rab10 (either directly on the PR-ubiquitin or on other residue(s) of Rab10). Failure to detect the polyubiquitination in the transfection condition (Figure 7-figure supplement 2) suggests that this specific ubiquitin conjugation can occur in the restricted condition, i.e. only “on the LCV”. We added this description in the discussion section (lines 334-335). No difference between the ΔsidCΔsdcA and ΔsidCΔsdcAΔsdcB strains (Figure 1C, 1h infection) can be explained by the result that SdcB gains activity at the later stages (see above).

      Minor comments In Figure 4b and 7b, the authors show a quantification of "Rab10-positive LCVs/SdcBpositive LCVs". Whys this distinction? It begs the question what the percentile of Rab10positive/SdcB-negative LCVs might be?

      We took this way of quantification as we just wanted to see the effect of SdcB on the Rab10 localization. To distinguish between SdcB-positive and negative LCVs, we would need to rely on the blue color signals of DAPI to visualize internal bacteria, which we thought to be technically difficult in this specific analysis.

      The band of FLAG-tagged SdcB was not detected by immunoblot using anti-FLAG antibody (Figure 5). The authors hypothesized that "disappearance of the SdcB band can be caused by auto-ubiquitination, as SdcB has an ability to catalyze auto-ubiquitination with a diverse repertoire of E2 enzymes. This can be easily confirmed by using MG-132 to inhibit proteasomal degradation of polyubiquitinated substrates.

      We conducted the experiment using MG-132 as suggested and found that proteasomal degradation is not the cause of the disappearance of the band (new Figure 5-figure supplement 2, added description in lines 228-233). SdcB is actually not degraded. Instead, its polyubiquitination causes its apparent loss by distributing the SdcB bands in the gel.

      In Figure 5F, the authors mentioned that "HA-UbAA did not conjugate to SdcB", whereas "shifted band detected by FLAG probing plausibly represents conjugation of cellular intrinsic Ub". The same argument was made in Figure 6B. These claims should be confirmed by immunoblot using anti-Ub antibody.

      Thank you. We added the data using anti-Ub antibody (P4D1) (Figure 6f, new third panel).

      Figure 7A: In cell producing MavC, SdcB is clearly present on LCV. However, in Figure 5A, SdcB was not detected by immunoblot in cells ectopically expressing MavC-C74A. What is the interpretation for these results?

      SdcB was not degraded in the cells, but just its apparent molecular weight shift occurred by polyubiquitination (see above). The detection of SdcB in the IF images (Figure 7a) supported this claim.

      Reviewer #2 (Public Review):

      This manuscript explores the interplay between Legionella Dot/Icm effectors that modulate ubiquitination of the host GTPase Rab10. Rab10 undergoes phosphoribosyl-ubiquitination (PR-Ub) by the SidE family of effectors which is required for its recruitment to the Legionella containing vacuole (LCV). Through a series of elegant experiments using effector gene knockouts, co-transfection studies and careful biochemistry, Kubori et al further demonstrate that:

      (1) The SidC family member SdcB contributes to the polyubiquitination (poly-Ub) of Rab10 and its retention at the LCV membrane.

      (2) The transglutaminase effector, MavC acts as an inhibitor of SdcB by crosslinking ubiquitin at Gln41 to lysine residues in SdcB.

      Some further comments and questions are provided below.

      (1) From the data in Figure 1, it appears that the PR-Ub of Rab10 precedes and in fact is a prerequisite for poly-Ub of Rab10. The authors imply this but there's no explicit statement but isn't this the case?

      Yes, we think that it is the case. We revised the description in the text accordingly (lines 326327).

      (2) The complex interplay of Legionella effectors and their meta-effectors targeting a single host protein (as shown previously for Rab1) suggests the timing and duration of Rab10 activity on the LCV is tightly regulated. How does the association of Rab10 with the LCV early during infection and then its loss from the LCV at later time points impact LCV biogenesis or stability? This could be clearer in the manuscript and the summary figure does not illustrate this aspect.

      Thank you for pointing the important issue. Association of Rab10 with the LCV is thought to be beneficial for L. pneumophila as it is the identified factor which supports bacterial growth in cells (Jeng et al., 2019). We speculate that its loss from the LCV at the later stage of infection would also be beneficial, since the LCV may need to move on to the maturation stage in which a different membrane-fusion process may proceed. As this is too speculative, we gave a simple modification on the part of discussion section (lines 356-358). We also modified the summary figure (revised Figure 7d) as illustrated with the time course.

      (3) How do the activities of the SidE and SidC effectors influence the amount of active Rab10 on the LCV (not just its localisation and ubiquitination)

      We agree that it is an important point. We tested the active Rab10 (QL) and inactive Rab10 (TN) for their ubiquitination and LCV-localization profiles (new Figure 4ab, new Figure 4figure supplement 1 and 2). These analyses led us to the unexpected finding that the active form of Rab10 is the preferential target of the effector-mediated manipulation. See also our response to Reviewer 1’s comment #3. Thank you very much for your insightful suggestion.

      (4) What is the fate of PR-Ub and then poly-Ub Rab10? How does poly-Ub of Rab10 result in its persistence at the LCV membrane rather than its degradation by the proteosome?

      We have not revealed the molecular mechanism in this study. We believe that it is an important question to be solved in future. We added the sentence in the discussion section (lines 376378).

      (5) Mutation of Lys518, the amino acid in SdcB identified by mass spec as modified by MavC, did not abrogate SdcB Ub-crosslinking, which leaves open the question of how MavC does inhibit SdcB. Is there any evidence of MavC mediated modification to the active site of SdcB?

      The active site of SdcB (C57) is required for the modification (Figure 5b), but it is not likely to be the target residue, as the MavC transglutaminase activity restricts the target residues to Lys. It would be expected that multiple Lys residues on SdcB can be modified by MavC to disturb the catalytic activity.

      (6) I found it difficult to understand the role of the ubiquitin glycine residues and the transglutaminase activity of MavC on the inhibition of SdcB function. Is structural modelling using Alphafold for example helpful to explain this?

      We conducted the Alphafold analysis of SdcB-Ub. Unfortunately, when the Glycine residues of Ub was placed to the catalytic pocket of SdcB, Q41 of Ub did not fit to the expected position of SdcB (K518). Probably, the ternary complex (MavC-Ub-SdcB) would cause the change of their entire conformation. A crystal structure analysis or more detailed molecular modeling would be required to resolve the issue.

      (7) Are the lys mutants of SdbB still active in poly-Ub of Rab10?

      We performed the experiment and found that K518R K891R mutant of SdcB still has the E3 ligase activity of similar level with the wild-type upon infection (new Figure 6-figure supplement 2) (lines 283-284). The level was actually slightly higher than that of the wildtype. This result may suggest that the blocking of the modification sites can rescue SdcB from MavC-mediated down regulation.

      Reviewer #2 (Recommendations For The Authors):

      see above

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study applies voltage clamp fluorometry to provide new information about the function of serotonin-gated ion channels 5-HT3AR. The authors convincingly investigate structural changes inside and outside the orthosteric site elicited by agonists, partial agonists, and antagonists, helping to annotate existing cryo-EM structures. This work confirms that the activation of 5-HT3 receptors is similar to other members of this well-studied receptor superfamily. The work will be of interest to scientists working on channel biophysics but also drug development targeting ligand-gated ion channels.

      Public Reviews:

      All reviewers agreed that these results are solid and interesting. However, reviewers also raised several concerns about the interpretation of the data and some other aspects related to data analysis and discussion that should be addressed by the authors. Essential revisions should include:

      (1) Please try to explicitly distinguish between a closed pore and a resting or desensitized state of the pore, to help in clarity.

      (2) Add quantification of VCF data (e.g. sensor current kinetics, as suggested by reviewer #2) or better clarify/discuss the VCF quantitative aspects that are taken into account to reach some conclusions (reviewer #3).

      (3) Review and add relevant foundational work relevant to this study that is not adequately cited.

      (4) Revise the text according to all recommendations raised by the reviewers and listed in the individual reviews below.

      We have revised the text to address all four points. See the answers to referees’ recommendations.

      Reviewer #1 (Public Review):

      Summary:

      This study brings new information about the function of serotonin-gated ion channels 5-HT3AR, by describing the conformational changes undergoing during ligands binding. These results can be potentially extrapolated to other members of the Cys-loop ligand-gated ion channels. By combining fluorescence microscopy with electrophysiological recordings, the authors investigate structural changes inside and outside the orthosteric site elicited by agonists, partial agonists, and antagonists. The results are convincing and correlate well with the observations from cryo-EM structures. The work will be of important significance and broad interest to scientists working on channel biophysics but also drug development targeting ligand-gated ion channels.

      Strengths:

      The authors present an elegant and well-designed study to investigate the conformational changes on 5-HT3AR where they combine electrophysiological and fluorometry recordings. They determined four positions suitable to act as sensors for the conformational changes of the receptor: two inside and two outside the agonist binding site. They make a strong point showing how antagonists produce conformational changes inside the orthosteric site similarly as agonists do but they failed to spread to the lower part of the ECD, in agreement with previous studies and Cryo-EM structures. They also show how some loss-of-function mutant receptors elicit conformational changes (changes in fluorescence) after partial agonist binding but failed to produce measurable ionic currents, pointing to intermediate states that are stabilized in these conditions. The four fluorescence sensors developed in this study may be good tools for further studies on characterizing drugs targeting the 5-HT3R.

      Weaknesses:

      Although the major conclusions of the manuscript seem well justified, some of the comparison with the structural data may be vague. The claim that monitoring these silent conformational changes can offer insights into the allosteric mechanisms contributing to signal transduction is not unique to this study and has been previously demonstrated by using similar techniques with other ion channels.

      The referee emphasizes that “some of the comparison with the structural data may be vague”. To better illustrate the structural reorganizations seen in the cryo-EM structures and that are used for VCF data interpretation, we added a new supplementary figure 3. It shows a superimposition of Apo, setron and 5-HT bond structures, with reorganization of loop C and Cys-loop consistent with VCF data.

      Reviewer #2 (Public Review):

      Summary:

      This study focuses on the 5-HT3 serotonin receptor, a pentameric ligand-gated ion channel important in chemical neurotransmission. There are many cryo-EM structures of this receptor with diverse ligands bound, however assignment of functional states to the structures remains incomplete. The team applies voltage-clamp fluorometry to measure, at once, both changes in ion channel activity, and changes in fluorescence. Four cysteine mutants were selected for fluorophore labeling, two near the neurotransmitter site, one in the ECD vestibule, and one at the ECD-TMD junction. Agonists, partial agonists, and antagonists were all found to yield similar changes in fluorescence, a proxy for conformational change, near the neurotransmitter site. The strength of the agonist correlated to a degree with propagation of this fluorescence change beyond the local site of neurotransmitter binding. Antagonists failed to elicit a change in fluorescence in the vestibular the ECD-TMD junction sites. The VCF results further turned up evidence supporting intermediate (likely pre-active) states.

      Strengths:

      The experiments appear rigorous, the problem the team tackles is timely and important, the writing and the figures are for the most part very clear. We sorely need approaches orthogonal to structural biology to annotate conformational states and observe conformational transitions in real membranes- this approach, and this study, get right to the heart of what is missing.

      Weaknesses:

      The weaknesses in the study itself are overall minor, I only suggest improvements geared toward clarity. What we are still missing is application of an approach like this to annotate the conformation of the part of the receptor buried in the membrane; there is important debate about which structure represents which state, and that is not addressed in the current study.

      Reviewer #3 (Public Review):

      Summary:

      The authors have examined the 5-HT3 receptor using voltage clamp fluorometry, which enables them to detect structural changes at the same time as the state of receptor activation. These are ensemble measurements, but they enable a picture of the action of different agonists and antagonists to be built up.

      Strengths:

      The combination of rigorously tested fluorescence reporters with oocyte electrophysiology is a solid development for this receptor class.

      Weaknesses:

      The interpretation of the data is solid but relevant foundational work is ignored. Although the data represent a new way of examining the 5-HT3 receptor, nothing that is found is original in the context of the superfamily. Quantitative information is discussed but not presented.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions that may help to improve the manuscript: - Page 6, point 2), typo: "L131W is positioned more profound in each ECD, its side chain (...)"

      “profound” have been corrected into “profoundly”

      • Fig 1C: Why not compare 5-HT responses for the four sensors studied? If the reason is the low currents elicited by 5-HT on I160C/Y207W sensor, could you comment on this effect that is not observed for the other full agonist tested (mCPBG)?

      The point of this figure (Fig 1G) is to show currents that desensitize to follow the evolution of the fluorescence signal during desensitization, that’s why for the I160C/Y207W sensor where 5-HT become a partial agonist we have judge more appropriate to use mCPBG acting as a more potent agonist to elicit currents with clear desensitization component. We have added a sentence in the legend of the figure to explain this choice more clearly.

      • Page 9, paragraph 2: "However, concentration-response curves on V106C/L131W show a small yet visible decorrelation of fluorescence and current (...)" Statistical analysis on EC50c and EC50f will help to see this decorrelation.

      Statistical analysis (unpaired t test) has been added to figure 3 panel A.

      • Page 10, paragraph 1: the authors describe how "different antagonists promote different degrees of local conformational changes". Does it have any relation to the efficacy or potency of these antagonists? Is there any interpretation for this result?

      Since setrons are competitive antagonists, the concept of efficacy of these molecules is unclear. Concerning potency, no correlation between affinity and fluorescence variation is observed. For instance, ondansetron and alosetron bind with similar nanomolar affinity to the 5-HT3R (Thompson & Lummis Curr Pharm Des. 2006;12(28):3615-30) but elicit different fluorescence variations on both S204C and I160C/Y207W sensors.

      • Fig. 1 panel A, graph to far right: axis label is cut ("current (uA)/..."). Colors of graph A - right are not clearly distinguishable e.g. cyan from green.

      The fluorescent green color that describes the mutant has been changed into limon color which is more clearly distinguishable from cyan.

      • Why is R219C/F142W not selected in the study? Are the signals comparable to the chosen R219C/F142W?

      We have chosen not to select R219C/F142W because the current elicited by this construct was lower than the current elicited by the construct R219C/Y140W. Moreover, the residue F142 belongs to the FPF motif from Cys-loop that is essential for gating (Polovinkin et al, 2018, Nature).

      • Fig. 1 legend typo: "mutated in tryptophan”

      “in” has been changed by “into”

      • Fig. 2: yellow color (graphs in panel B) is very hard to read.

      Yellow color has been darkened to yellow/brown to allow easy reading.

      • Fig. 4 is too descriptive and undermines the information of the study. It could be improved e.g. by representing specific structures or partial structures involved. As an additional minor comment, some colors in the figure are hard to differentiate, e.g. magenta and purple.

      We have added relevant specific structures involved, namely loop C, the Cys-loop and pre-M1 loop to clarify. The intensity of magenta and purple has been increased to help differentiate the two sensor positions.

      • Fig S1C: it is confusing to see the same color pattern for the single mutants without the W. I would recommend to label each trace to make it clearer.

      Labelling of the traces corresponding to the single mutants has been added.

      • Fig S2: Indicating the statistical significance in the graph for the mutants with different desensitization properties compared to the WT receptor will help its interpretation.

      The statistical significance of the difference in the desensitization properties has been added to Figure S2.

      Reviewer #2 (Recommendations For The Authors):

      Overall comments for the authors:

      Selection of cysteine mutants and engineered Trp sites is clear and logical. VCF approach with controls for comparing the functionality of WT vs. mutants, and labeled with unlabeled receptor, is well explained and satisfying. The finding that desensitization involves little change in ECD conformation makes sense. It is somewhat surprising, at least superficially, to find that competitive antagonists promote changes in fluorescence in the same 'direction' and amplitude as strong agonists, however, this is indeed consistent with the structural biology, and with findings from other groups testing different labeling sites. Importantly, the team finds that antagonist-binding changes in deltaF do not spread beyond the region near the neurotransmitter site. The finding that most labeling sites in the ECD, in particular those not in/near the neurotransmitter site, fail to report measurable fluorescence changes, is noteworthy. It contrasts with findings in GlyR, as noted by the authors, and supports a mechanism where most of each subunit's ECD behaves as a rigid body.

      Specific questions/comments:

      I am confused about the sensor current kinetics. Results section 2) states that all sensors share the same current desensitization kinetics, while Results section 5) states that the ECD-TMD site and the vestibule site sensors exhibit faster desensitization. SF1C, right-most panel of R219C suggests the mutation and/or labeling here dramatically changes apparent activation and deactivation rates measured by TEVC. Both activation and deactivation upon washout appear faster in this one example. Data for desensitization are not shown here but are shown in aggregate in earlier panels. It is a bit surprising that activation and deactivation would both change but no effect on desensitization. Indeed, it looks like, in Fig. 1G, that desensitization rate is not consistent across all constructs. Can you please confirm/clarify?

      TEVC and VCF recordings in this study show a significant variability concerning both the apparent desensitization and desactivation kinetics. This is illustrated concerning desensitization in TEVC experiments in figure S2, where the remaining currents after 45 secondes of 5-HT perfusion and the rate constants of desensitization are measured on different oocytes from different batches. Therefore, the differences in desensitization kinetics shown in fig 1.G are not significant, the aim of the figure being solely to illustrate that no variation of fluorescence is observed during the desensitization phase. A sentence in the legend of fig 1.G has been added to precise this point. We also revised the first paragraph of result section 5, clearly stating that the slight tendency of faster desensitization of V106C/L131W and R219C/Y140W sensors is not significant.

      An alternative to the conclusion-like title of Results section 2) is that the ECD (and its labels) does not undergo notable conformational changes between activated and desensitized states.

      This is a good point and we have added a sentence at the end of results section 2 to present this idea.

      I find the discussion paragraph on partial agonist mechanisms, starting with "However," to be particularly important but at times hard to follow. Please try to revise for clarity. I am particularly excited to understand how we can understand/improve assignments of cryo-EM structures using the VCF (or other) approaches. As examples of where I struggled, near the top of p. 11, related to the partial agonist discussion, there is an assumption about the pore being either activated, or resting. Is it not also possible that partial agonists could stabilize a desensitized state of the pore? Strictly speaking, the labeling sites and current measurements do not distinguish between pre-active resting and desensitized channel conformations/states. However, the cryo-EM structures can likely help fill in the missing information there- with all the normal caveats. Please try to explicitly distinguish between a closed pore and a resting or desensitized state of the pore, to help in clarity.

      We have revised the section, and hope it is clearer now. We notably state more explicitly the argument for annotation of partial agonist bound closed structures as pre-active, mainly from kinetic consideration of VCF experiments. We also mention and cite a paper by the Chakrapani group published the 4th of January 2024 (Felt et al, Nature Communication), where they present the structures of the m5HT3AR bound to partial agonists, with a set of conformations fully consistent with our VCF data.

      This statement likely needs references: "...indirect experiments of substituted cysteine accessibility method (SCAM) and VCF experiments suggested that desensitization involves weak reorganizations of the upper part of the channel that holds the activation gate, arguing for the former hypothesis."

      Reference Polovinkin et al, Nature, 2018, has been added.

      I respectfully suggest toning down this language a little bit: "VCF allowed to characterize at an unprecedented resolution the mechanisms of action of allosteric effectors and allosteric mutations, to identify new intermediate conformations and to propose a structure-based functional annotation of known high-resolution structures." This VCF stands strongly without unclear claims about unprecedented resolution. What impresses me most are the findings distinguishing how agonists/partial agonists/antagonists share a conserved action in one area and not in another, the observations consistent with intermediate states, and the efforts to integrate these simultaneous current and conformation measurements with the intimidating array of EM structures.

      We thank the referee for his positive comments. We have removed “unprecedented resolution” and revised the sentences.

      It is beyond the scope of the current study, but I am curious what the authors think the hurdles will be to tracking conformation of the pore domain- an area where non-cryo-EM based conformational measurements are sorely needed to help annotate the EM structures.

      We fully agree with the referee that structures of the TMD are very divergent between the various conditions depending on the membrane surrogate. We are at the moment working on this region by VCF, incorporating the fluorescent unnatural amino acid ANAP.

      Minor:

      (1) P. 5, m5-HT3R: Please clarify that this refers to the mouse receptor, if that is correct.

      OK, “mouse” has been added.

      (2) Fig. 1D, I suggest moving the 180-degree arrow to the right so it is below but between the two exterior and vestibular views.

      Ok, it has been done.

      (3) Please add a standard 2D chemical structure of MTS-TAMRA, and TAMRA attached to a cysteine, to Fig 1.

      A standard chemical structure has been added for the two isomers of MTS-TAMRA.

      (4) Please label subpanels in Fig. 1G with the identity of the label site.

      The subpanels have been labelled.

      Reviewer #3 (Recommendations For The Authors):

      This is solid work but I mainly have suggestions about placing it in context.

      (1) Abstract "Data show that strong agonists promote a concerted motion of all sensors during activation, "

      The concept of sensors here is the fluorescent labels? I did not find this meaningful until I read the significance statement.

      We have specified “fluorescently-labelled” before sensors in the abstract.

      (2) p4 "each subunit in the 5-HT3A pentamer...." this description would be identical for any pentameric LGIC so the authors should beware of a misleading specificity. This goes for other phrases in this paragraph. However, the summary of the 5HT specific results is very good.

      About the description of the structure, we added “The 5-HT3AR displays a typical pLGIC structure, where….”.

      (3) This paper is very nicely put together and generally explains itself well. The work is rigorous and comprehensive. But the meaning of quenching (by local Trp) seems straightforward, but it is not made explicit in the paper. Why doesn't simple labelling (single Cys) at this site work? And can we have a more direct demonstration of the advantage of including the Trp (not in the supplementary figure?) All this information is condensed into the first part of figure 1 (the graph in Figure 1A). Figure 1 could be split and the principle of the introduced quenching could be more clearly shown

      detailed in a few more sentences the principle of the TrIQ approach. In addition, to be more explicit, the significative differences of fluorescence comparing sensors with and without tryptophan have been added in Figure 1, panel screening and a sentence have been added in the legend of this figure.

      (4) p10 "VCF measurements are also remarkably coherent with the atomic structures showing an open pore (so called F, State 2 and 5-HT asymmetric states), "

      This statement is intriguing. What do these names or concepts represent? Are they all the same thing? Where do the names come from? What is meant here? Three different concepts, all consistent? Or three names for the same concept?

      We have tried to clarify the statement by making reference to the PDB of the structures.

      (5) "Fluorescence and VCF studies identified similar intermediate conformations for nAChRs, ⍺1-GlyRs and the bacterial homolog GLIC(21,32-35). "

      Whilst this is true, the motivation for such ideas came from earlier work identifying intermediates from electrophysiology alone (such as the flip state (Burzomato et al 2004), the priming state (Mukhatsimova 2009) and the conformational wave in ACh channels grosman et al 2000). It would be appropriate to mention some of this earlier work.

      We have incorporated and described these references in the discussion. Of note, we fully quoted these references in our previous papers on the subject (Menny 2017, Lefebvre 2021, Shi 2023), but the referee is right in asking to quote them again.

      (6) "A key finding of the study is the identification of pre-active intermediates that are favored upon binding of partial agonists and/or in the presence of loss-of-function mutations. "

      Even more fundamental, the idea of a two-state equilibrium for neurotransmitter receptors was discarded in 1957 according to the action of partial agonists.

      DEL CASTILLO J, KATZ B (1957) Interaction at end-plate receptors between different choline derivatives. Proc R Soc Lond B Biol Sci

      So to discover this "intermediate" - that is, bound but minimal activity - in the present context seems a bit much. It is a big positive of this paper that the results are congruent with our expectations, but I cannot see value in posing the results as an extension of the 2-state equilibrium (for which there are anyway other objections).

      As for intermediates being favoured by loss of function mutations, this concept is already well established in glycine receptors (Plested et al 2007, Lape et al 2012) and doubtless in other cases too.

      I do get the point that the authors want to establish a basis in 5-HT3 receptors, but these previous works suggest the results are somewhat expected. This should be commented on.

      We also agree. We replace “key finding” by “key observation”, quote most of the references proposed, and explicitly conclude that “The present work thus extends this idea to the 5HT3AR, together with providing structural blueprints for cryo-EM structure annotation”.

      (7) "In addition, VCF data allow a quantitative estimate of the complex allosteric action of partial agonists, that do not exclusively stabilize the active state and document the detailed phenotypes of various allosteric mutations."

      Where is this provided? If the authors are not motivated to do this, I have some doubts that others will step in. If it is not worth doing, it's probably not worth mentioning either.

      Language has been toned down by “In addition, VCF data give insights in the action of partial agonists, that do not exclusively stabilize the active state and document the phenotypes of various allosteric mutations."

      (8) Figure 1G please mark which construct is which.

      This has been added into Figure 1G

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their insightful comments and recommendations. We have extensively revised the manuscript in response to the valuable feedback. We believe the results is a more rigorous and thoughtful analysis of the data. Furthermore, our interpretation and discussion of the findings is more focused and highlights the importance of the circuit and its role in the response to stress. Thank you for helping to improve the presented science.

      Key changes made in response to the reviewers comments include:

      • Revision of statistical analyses for nearly all figures, with the addition of a new table of summary statistics to include F and/or t values alongside p-values.

      • Addition of statistical analyses for all fiber photometry data.

      • Examination of data for possible sex dependent effects.

      • Clarification of breeding strategies and genotype differences, with added details to methods to improve clarity.

      • Addressing concerns about the specificity of virus injections and the spread, with additional details added to methods.

      • Modification of terminology related to goal-directed behavior based on reviewer feedback, including removal of the term from the manuscript.

      • Clarification and additional data on the use of photostimulation and its effects, including efforts to inactivate neurons for further insight, despite technical challenges.

      • Correction of grammatical errors throughout the manuscript.

      Reviewer 1:

      Despite the manuscript being generally well-written and easy to follow, there are several grammatical errors throughout that need to be addressed.

      Thank you for highlighting this issue. Grammatical errors have been fixed in the revised version of the manuscript.

      Only p values are given in the text to support statistical differences. This is not sufficient. F and/or t values should be given as well.

      In response to this critique and similar comments from Reviewer 2, we re-evaluated our approach to statistical analyses and extensively revised analyses for nearly all figures. We also added a new table of summary statistics (Supplemental Table 1) containing the type of analysis, statistic, comparison, multiple comparisons, and p value(s). For Figures 4C-E, 5C, 6C-E, 7H-I, and 8H we analyzed these data using two-way repeated measures (RM) ANOVA that examined the main effect of time (either number of sessions or stimulation period) in the same animal and compared that to the main effect of genotype of the animal (Cre+ vs Cre-), and if there was an interaction. For Supplemental Figure 7A we also conducted a two-way RM ANOVA with time as a factor and activity state (number of port activations in active vs inactive nose port) as the other in Cre+ mice. For Figures 5D-E we conducted a two-way mixed model ANOVA that accounted and corrected for missing data. In figures that only compared two groups of data (Figures 5F-L, 6F, 8C-D, 8I, and Supp 6F-G) we used two-tailed t-test for the analysis. If our question and/or hypothesis required us to conduct multiple comparisons between or within treatments, we conducted Bonferroni’s multiple comparisons test for post hoc analysis (we note which groups we compared in Supplemental Table 1). For figures that did or did not show a change in calcium activity (Figure 3G, 3I-K, 7B, 7D-E, 8E-F), we compared waveform confidence intervals (Jean-Richard-Dit-Bressel, Clifford, McNally, 2020). The time windows we used as comparison are noted in Supplemental Table 1, and if the comparisons were significant at 95%, 99%, and 99.9% thresholds.

      None of prior comparisons in prior analyses that were significant were found to have fallen below thresh holds for significance. Of those found to be not significantly different, only one change was noted. In Figure 6E there was now a significant baseline difference between Cre+ and Cre- mice with Cre- mice taking longer to first engage the port compared to Cre+ mice (p=0.045). Although the more rigorous approach the statistical analyses did not change our interpretations we feel the enhanced the paper and thank the reviewer for pushing this improvement.

      Moreover, the fibre photometry data does not appear to have any statistical analyses reported - only confidence intervals represented in the figures without any mention of whether the null hypothesis that the elevations in activity observed are different from the baseline.

      This is particularly important where there is ambiguity, such as in Figure 3K, where the spontaneous activity of the animal appears to correlate with a spike in activity but the text mentions that there is no such difference. Without statistics, this is difficult to judge.

      Thank you for highlighting this critical point and providing an opportunity to strengthen our manuscript. We added statistical analyses of all fiber photometry data using a recently described approach based on waveform confidence intervals (Jean-Richard-Dit-Bressel, Clifford, McNally, 2020). In the statistical summary (Supplemental Table 1) we note the time window that we used for comparison in each analysis and if the comparisons were significant at 95%, 99%, and 99.9% thresholds. Thank you from highlighting this and helping make the manuscript stronger.

      With respect to Figure 3K, we are not certain we understood the spike in activity the reviewer referred to. Figure 3J and K include both velocity data (gold) and Ca2+ dependent signal (blue). We used episodes of velocity that were comparable to the avoidance respond during the ambush test and no significant differences in the Ca2+ signal when gating around changes in velocity in the absence of stressor (Supplemental Table1). This is in contrast to the significant change in Ca2+ signal following a mock predator ambush (Figure 3J). We interpret these data together to indicate that locomotion does not correlate with an increase in calcium activity in SuMVGLUT2+::POA neurons, but that coping to a stressor does. This conclusion is further examined in supplemental Figure 5, including examining cross-correlation to test for temporally offset relationship between velocity and Ca2+ signal in SUMVGLUT2+::POA neurons.

      The use of photostimulation only is unfortunate, it would have been really nice to see some inactivation of these neurons as well. This is because of the well-documented issues with being able to determine whether photostimulation is occurring in a physiological manner, and therefore makes certain data difficult to interpret. For instance, with regards to the 'active coping' behaviours - is this really the correct characterisation of what's going on? I wonder if the mice simply had developed immobile responding as a coping strategy but when they experience stimulation of these neurons that they find aversive, immobility is not sufficient to deal with the summative effects of the aversion from the swimming task as well as from the neuronal activation? An inactivation study would be more convincing.

      We agree with the point of the reviewer, experiments demonstrating necessity of SUMVGLUT2+::POA neurons would have added to the story here. We carried out multiple experiments aimed at addressing questions about necessity of SuMVGLUT2+::POA neurons in stress coping behaviors, specifically the forced swim assay. Efforts included employing chemogenetic, optogenetic, and tetanus toxin-based methods. We observed no effects on locomotor activity or stress coping. These experiments are both technically difficult and challenging to interpret. Interpretation of negative results, as we obtained, is particularly difficult because of potential technical confounds. Selective targeting of SuMVGLUT2+::POA neurons for inhibition requires a process requiring three viral injections and two recombination steps, increasing variability and reducing the number of neurons impacted. Alternatively, photoinhibition targeting SuMVGLUT2+::POA cells can be done using Retro-AAV injected into POA and a fiber implant over SuM. We tried both approaches. Data obtained were difficult to interpret because of questions about adequate coverage of SuMVGLUT2+::POA population by virally expressed constructs and/or light spread arose. The challenge of adequate coverage to effectively prevent output from the targeted population is further confounded by challenges inherent in neural inhibition, specifically determining if the inhibition created at the cellular level is adequate to block output in the context of excitatory inputs or if neurons must be first engaged in a particular manner for inhibition to be effective. Baseline neural activity, release probability, and post-synaptic effects could all be relevant, which photo-inhibition will potentially not resolve. So, while the trend is to always show “necessary and sufficient” effects, we’ve tried nearly everything, and we simply cannot conclude much from our mixed results. There are also wellestablished problems with existing photo-inhibition methods, which while people use them and tout them, are often ignored. We have a lot of expertise in photo-inhibition optogenetics, and indeed have used it with some success, developed new methods, yet in this particular case we are unable to draw conclusions related to inhibition. People have experienced similar challenges in locus coeruleus neurons, which have very low basal activity, and inhibition with chemogenetics is very hard, as well as with optogenetic pump-based approaches, because the neurons fire robust rebound APs. We have spent almost 2.5 years trying to get this to work in this circuit because reviews have been insistent on this result for the paper to be conclusive. Unfortunately, it simply isn’t possible in our view until we know more about the cell types involved. This is all in spite of experience using the approach in many other publications.

      We also employed less selective approaches, such as injecting AAV-DIO-tetanus toxin light chain (Tettox) constructs directly into SuM VGLUT2-Cre mice but found off target effects impacting animal wellbeing and impeding behavioral testing due viral spread to surrounding areas.

      While we are disappointed for being unable to directly address questions about necessity of SuMVGLUT2+::POA neurons in active coping with experimental data, we were unable to obtain results allowing for clear interpretation across numerous other domains the reviewers requested. We also feel strongly that until we have a clear picture of the molecular cell type architecture in the SuM, and Cre-drivers to target subsets of neurons, this question will be difficult to resolve for any group. We are working now on RNAseq and related spatial transcriptomics efforts in the SuM and examining additional behavioral paradigm to resolve these issues, so stay tuned for future publications.

      Accordingly, we avoid making statements relating to necessity in the manuscript. In spite of having several lines of physiological data with strong robust correlations behavior related to the SuMVGLUT2+::POA circuit.

      Nose poke is only nominally instrumental as it cannot be shown to have a unique relationship with the outcome that is independent of the stimuli-outcome relationships (in the same way that a lever press can, for example). Moreover, there is nothing here to show that the behaviours are goal-directed.

      Thank you for highlighting this point. Regarding goal-direct terminology, we removed this terminology from the manuscript. Since the mice perform highly selective (active vs inactive) port activation robustly across multiple days of training the behavior likely transitions to habitual behavior. We only tested the valuation of stimuli termination of the final day of training with time limited progressive ratio test. With respect to lever press versus active port activation, we are unclear how using a lever in this context would offer a different interpretation. Lever pressing may be more sensitive to changes in valuation when compared to nose poke port activation (Atalayer and Rowland 2008); however, in this study the focus of the operant behavior is separating innate behaviors for learned action–outcome instrumental learned behaviors for threat response (LeDoux and Daw 2018). The robust highly selective activation of the active port illustrated in Figure 6 fits as an action–outcome instrumental behavior wherein mice learn to engage the active but not inactive port to terminate photostimulation. The first activation of the port occurs through exploration of the arena but as demonstrated by the number of active port activations and the decline in time of the first active port engagement, mice expressing ChR2eYFP learn to engage the port to terminate the stimulation. To aid in illustrating this point we have added Supplemental Figure 7 showing active and inactive port activations for both Cre+ and Cre- mice. This adds clarity to high rate of selective port activation driven my stimulation of SUMVGLUT2+::POA neurons compared to controls. The elimination of goal directed and providing additional data narrows and supports one of the key points of the operant experiment.

      With regards to Figure 1: This is a nice figure, but I wonder if some quantification of the pathways and their density might be helpful, perhaps by measuring the intensity of fluorescence in image J (as these are processes, not cell bodies that can be counted)? Mind you, they all look pretty dense so perhaps this is not necessary! However, because the authors are looking at projections in so-called 'stress-engaged regions', the amygdala seems conspicuous by its absence. Did the authors look in the amygdala and find no projections? If so it seems that this would be worth noting.

      This is an interesting question but has proven to be a very technically challenging question. We consulted with several leaders who routinely use complimentary viral tracing methods in the field. We were unable to devise a method to provide a satisfactorily meaningful quantitative (as opposed to qualitative) approach to compare SUMVGLUT2+::POA to SuMVGLUT2+ projections. A few limitations are present that hinder a meaningful quantitative approach. One limitation was the need for different viral strategies to label the two populations. Labeling SuMVGLUT2+::POA neurons requires using VGLUT2-Flp mice with two injections into the POA and one into SuM. Two recombinase steps were required, reducing efficiency of overlap. This combination of viral injections, particularly the injections of RetroAAVs in the POA, can induce significant quantitative variability due to tropism, efficacy, and variability of retro-viral methods, and viral infection generally. These issues are often totally ignored in similar studies across the “neural circuit” landscape, but it doesn’t make them less relevant here.

      Although people do this in the field, and show quantification, we actually believe that it can be a quite misleading read-out of functionally relevant circuitry, given that neurotransmitter release ultimately is amplified by receptors post-synaptically, and many examples of robust behavioral effects have been observed with low fiber tracing complimentary methods (McCall, Siuda et al. 2017). In contrast, the broader SuMVGLUT2+ population was labeled using a single injection into the SuM. This means there like more efficient expression of the fluorophore. Additionally, in areas that contain terminals and passing fibers understanding and interpreting fluorescent signal is challenging. Together, these factors limit a meaningful quantitative comparison and make an interpretation difficult to make. In this context, we focused on a conservative qualitative presentation to demonstrate two central points. That 1) SuMVGLUT2+::POA neurons are subset of SuMVGLUT2+ neurons that project to specific areas and that exclude dentate gyrus, and they 2) arborize extensively to multiple areas which have be linked to threat responses. We agree that there is much to be learned about how different populations in SuM connect to targets in different regions of the brain and to continue to examine this question with different techniques. A meaningful quantitative study comparing projections is technically complex and, we feel, beyond our ability for this study.

      Also, for the reasons above we do not believe that quantification provides exceptional clarity with respect to the putative function of the circuit, glutamate released, or other cotransmitters given known amplification at the post-synaptic side of the circuit.

      With regard to the amygdala, other studies on SuM projections have found efferent projections to amygdala (Ottersen, 1980; Vertes, 1992). In our study we were unable to definitively determine projections from SuMVGLUT2+::POA neurons to amygdala, which if present are not particularly dense. For this reason we were conservative and do not comment on this particular structure.

      I would suggest removing the term goal-directed from the manuscript and just focusing on the active vs. passive distinction.

      We removed the use of goal-directed. Thank you for helping us clarify our terminology.

      The effect observed in Figure 7I is interesting, and I'm wondering if a rebound effect is the most likely explanation for this. Did the authors inhibit the VGAT neurons in this region at any other times and observe a similar rebound? If such a rebound was not observed it would suggest that it is something specific about this task that is producing the behaviour. I would like it if the authors could comment on this.

      We agree that results showing the change in coping strategy (passive to active) in forced swim after but not during stimulation of SuMVGAT+ neurons is quite interesting (Figure 7I). This experiment activated SuMVGAT+ neurons during a section of the forced swim assay and mice showed a robust shift to mobility after the stimulation of SuMVGAT+ neurons stopped. We did not carry out inhibition of SuMVGAT+ neurons in this manuscript. As the reviewer suggested, strong inhibition of local SuM neurons, including SUMVGLUT2+::POA neurons, could lead to rebound activity that may shift coping behaviors in confusing ways. We agree this is an interesting idea but do not have data to support the hypothesis further at this time.

      Reviewer 2

      (1) These are very difficult, small brain regions to hit, and it is commendable to take on the circuit under investigation here. However, there is no evidence throughout the manuscript that the authors are reliably hitting the targets and the spread is comparable across experiments, groups, etc., decreasing the significance of the current findings. There are no hit/virus spread maps presented for any data, and the representative images are cropped to avoid showing the brain regions lateral and dorsal to the target regions. In images where you can see the adjacent regions, there appears expression of cell bodies (such as Supp 6B), suggesting a lack of SuM specificity to the injections.

      We agree with the reviewer that the areas studied are small and technically challenging to hit. This was one of driving motivations for using multiple tools in tandem to restrict the area targeted for stimulation. Approaches included using a retrograde AAVs to express ChR2eFYP in SUMVGLUT2+::POA neurons; thereby, restricting expression to VGLUT2+ neurons that project to the POA. Targeting was further limited by placement of the optic fiber over cell bodies on SuM. Thus, only neurons that are VGLUT2+, project to the POA, and were close enough to the fiber were active by photostimulation. Regrettably, we were not able to compile images from mice where the fiber was misplaced leading to loss of behavioral effects. We would have liked to provide that here to address this comment. Unfortunately, generating heat maps for injections is not possible for anatomic studies that use unlabeled recombinase as part of an intersectional approach. Also determining the point of injection of a retroAAV can be difficult to accurately determine its location because neurons remote to injection site and their processes are labeled.

      Experiments described in Supplemental Figure 6B on VGAT neurons in SuM were designed and interpreted to support the point that SUMVGLUT2+::POA neurons are a distinct population that does not overlap with GABAergic neurons. For this point it is important that we targeted SuM, but highly confined targeting is not needed to support the central interpretation of the data. We do see labeling in SuM in VGAT-Cre mice but photo stimulation of SuMVGAT+ neurons does not generate the behavioral changes seen with activation of SUMVGLUT2+::POA neurons. As the reviewer points out, SuM is small target and viral injection is likely to spread beyond the anatomic boundaries to other VGAT+ neurons in the region, which are not the focus here. The activation would be restricted by the spread of light from the fiber over SuM (estimated to be about a 200um sphere in all directions). We did not further examine projections or localization of VGAT+ neurons in this study but focused on the differential behavioral effects of SUMVGLUT2+::POA neurons.

      (2) In addition, the whole brain tracing is very valuable, but there is very little quantification of the tracing. As the tracing is the first several figures and supp figure and the basis for the interpretation of the behavior results, it is important to understand things including how robust the POA projection is compared to the collateral regions, etc. Just a rep image for each of the first two figures is insufficient, especially given the above issue raised. The combination of validation of the restricted expression of viruses, rep images, and quantified tracing would add rigor that made the behavioral effects have more significance.

      For example, in Fig 2, how can one be sure that the nature of the difference between the nonspecific anterograde glutamate neuron tracing and the Sum-POA glutamate neuron tracing is real when there is no quantification or validation of the hits and expression, nor any quantification showing the effects replicate across mice? It could be due to many factors, such as the spread up the tract of the injection in the nonspecific experiment resulting in the labeling of additional regions, etc.

      Relatedly, in Supp 4, why isn’t C normalized to DAPI, which they show, or area? Similar for G what is the mcherry coverage/expression, and why isn’t Fos normalized to that?

      Thank you for highlighting the importance of anatomy and the value of anatomy. Two points based on the anatomic studies are central to our interpretation of the experimental data. First, SUMVGLUT2+::POA are a distinct population within the SuM. We show this by demonstrating they are not GABAergic and that they do not project to dentate gyrus. Projections from SuM to dentate gyrus have been described in multiple studies (Boulland et al., 2009; Haglund et al., 1987; Hashimotodani et al., 2018; Vertes, 1992) and we demonstrate them here for SuMVGLUT2+ cells. Using an intersectional approach in VGLUT2-Flp mice we show SUMVGLUT2+::POA neurons do not project to dentate gyrus. We show cell bodies of SUMVGLUT2+::POA neurons located in SuM across multiple figures including clear brain images. Thus, SUMVGLUT2+::POA neurons are SuM neurons that do not project to dentate gyrus, are not GABAergic, send projections to a distinct subset of targets, most notably excluding dentate gyrus. Second, SUMVGLUT2+::POA neurons arborize sending projections to multiple regions. We show this using a combinatorial genetic and viral approach to restrict expression of eYFP to only neurons that are in SuM (based on viral injection), project to the POA (based on retrograde AAV injection in POA), and VGLUT2+ (VGLUT2-Flp mice). Thus, any eYFP labeled projection comes from SUMVGLUT2+::POA neurons. We further confirmed projections using retroAAV injection into areas identified using anterograde approaches (Supplemental Figure 2). As discussed above in replies to Reviewer 1, we feel limitations are present that preclude meaningful quantitative analysis. We thus opted for a conservative interpretation as outlined.

      Prior studies have shown efferent projections from SuM to many areas, and projections to dentate gyrus have received substantial attention (Bouland et al., 2009; Haglund, Swanson, and Kohler, 1984; Hashimotodani et al., 2018; Soussi et al., 2010; Vertes, 1992; Pan and McNaugton, 2004). We saw many of the same projections from SuMVGLUT2+ neurons. We found no projections from SUMVGLUT2+::POA neurons to dentate gyrus (Figure 2). Our description of SuM projection to dentate gyrus is not new but finding a population of neurons in SuM that does not project to dentate gyrus but does project to other regions in hippocampus is new. This finding cannot be explained by spread of the virus in the tract or non-selective labeling.

      (3) The authors state that they use male and female mice, but they do not describe the n’s for each experiment or address sex as a biological variable in the design here. As there are baseline sex differences in locomotion, stress responses, etc., these could easily factor into behavioral effects observed here.

      Sex specific effects are possible; however, the studies presented here were not designed or powered to directly examine them. A point about experimental design that helps mitigate against strong sex dependent effect is that often the paradigm we used examined baseline (pre-stimulation) behavior, how behavior changed during stimulation, and how behavior returned (or not) to baseline after stimulation. Thus, we test changes in individual behaviors. Although we had limited statistical power, we conducted analyses to examine the effects of sex as variable in the experiments and found no differences among males and females.

      (4) In a similar vein as the above, the authors appear to use mice of different genotypes (however the exact genotypes and breeding strategy are not described) for their circuit manipulation studies without first validating that baseline behavioral expression, habituation, stress responses are not different. Therefore, it is unclear how to interpret the behavioral effects of circuit manipulation. For example in 7H, what would the VGLUT2-Cre mouse with control virus look like over time? Time is a confound for these behaviors, as mice often habituate to the task, and this varies from genotype to genotype. In Fig 8H, it looks like there may be some baseline differences between genotypes- what is normal food consumption like in these mice compared to each other? Do Cre+ mice just locomote and/or eat less? This issue exists across the figures and is related to issues of statistics, potential genotype differences, and other experimental design issues as described, as well as the question about the possibility of a general locomotor difference (vs only stress-induced). In addition, the authors use a control virus for the control groups in VGAT-Cre manipulation studies but do not explain the reasoning for the difference in approach.

      Thank you for highlighting the need for greater clarity about the breeding strategies used and for these related questions. We address the breeding strategy and then move to address the additional concerns raised. We have added details to the methods section to address this point. For VGLUT2-Cre mice we use litter mates controls from Cre/WT x WT/WT cross. The VGLUT2-Cre line (RRID:IMSR_JAX:028863) (Vong L , et al. 2011) used here been used in many other reports. We are not aware of any reports indicating a phenotype associated with the addition of the IRES-Cre to the Slc17a6 loci and there is no expected impact of expression of VGLUT2. Also, we see in many of the experiments here that the baseline (Figures 4, 5, and 7) behaviors are not different between the Cre+ and Cre- mice. For VGAT-Cre mice we used a different breeding strategy that allowed us to achieve greater control of the composition of litters and more efficient cohorts cohort. A Cre/Cre x WT/WT cross yielded all Cre/WT litters. The AAV injected, ChR2eYFP or eYFP, allowed us to balance the cohort.

      Regarding Figure 7H, which shows time immobile on the second day of a swim test, data from the Cre- mice demonstrate the natural course of progression during the second day of the test. The control mice in the VGAT-Cre cohort (Figure 7I) have similar trend. The change in behavior during the stimulation period in the Cre+ mice is caused by the activation of SUMVGLUT2+::POA neurons. The behavioral shift largely, but not completely, returns to baseline when the photostimulation stops. We have no reason to believe a VGLUT2-Cre+ mouse injected with control AAV to express eYFP would be different from WT littermate injected with AVV expressing ChR2eYFP in a Cre dependent manner.

      Turning to concerns related to 8H, which shows data from fasted mice quantify time spent interacting with food pellet immediately after presentation of a chow pellet, we found no significant difference between the control and Cre+ mice. We unaware of any evidence indicating that the two groups should have a different baseline since the Cre insertion is not expected to alter gene expression and we are unaware of reports of a phenotype relating to feeding and the presence of the transgene in this mouse line. Even if there were a small baseline shift this would not explain the large abrupt shift induced by the photostimulation. As noted above, we saw shifts in behavior abruptly induced by the initiation of photostimulation when compared to baseline in multiple experiments. This shift would not be explained by a hypothetical difference in the baseline behaviors of litter mates.

      (5) The statistics used throughout are inappropriate. The authors use serial Mann-Whitney U tests without a description of data distributions within and across groups. Further, they do not use any overall F tests even though most of the data are presented with more than two bars on the same graph. Stats should be employed according to how the data are presented together on a graph. For example, stats for pre-stim, stim, and post-stim behavior X between Cre+ and Cre- groups should employ something like a two-way repeated measures ANOVA, with post-hoc comparisons following up on those effects and interactions. There are many instances in which one group changes over time or there could be overall main effects of genotype. Not only is serially using Mann-Whitney tests within the same panel misleading and statistically inaccurate, but it cherry-picks the comparisons to be made to avoid more complex results. It is difficult to comprehend the effects of the manipulations presented without more careful consideration of the appropriate options for statistical analysis.

      We thank the reviewer for pointing this out and suggesting alterative analyses, we agree with the assessment on this topic. Therefore, we have extensively revised the statical approach to our data using the suggested approach. Reviewer 1 also made a similar comment, and we would like to point to our reply to reviewer 1’s second point in regard to what we changed and added to the new statistical analyses. Further, we have added a full table detailing the statical values for each figure to the paper.

      Conceptual:

      (6) What does the signal look like at the terminals in the POA? Any suggestion from the data that the projection to the POA is important?

      This is an interesting question that we will pursue in future investigations into the roles of the POA. We used the projection to the POA from SuM to identify a subpopulation in SuM and we were surprised to find the extensive arborization of these neurons to many areas associated with threat responses. We focused on the cell bodies as “hubs” with many “spokes”. Extensive studies are needed to understand the roles of individual projections and their targets. There is also the hypothetical technical challenge of manipulating one projection without activating retrograde propagation of action potentials to the soma. At the current time we have no specific insights into the roles of the isolated projection to POA. Interpretation of experiments activating only “spoke” of the hub would be challenging. Simple terminal stimulation experiments are challenged by the need to separate POA projections from activation of passing fibers targeting more anterior structures of the accumbens and septum.

      (7) Is this distinguishing active coping behavior without a locomotor phenotype? For example, Fig. 5I and other figure panels show a distance effect of stimulation (but see issues raised about the genotype of comparison groups). In addition, locomotor behavior is not included for many behaviors, so it is hard to completely buy the interpretation presented.

      We agree with the reviewer and thank them for highlighting this fundamental challenge in studies examining active coping behaviors in rodents, which requires movement. Additionally, actively responding to threatening stressors would include increased locomotor activity. Separation of movement alone from active coping can be challenging. Because of these concerns we undertook experiments using diverse behavioral paradigms to examine the elicited behaviors and the recruitment of SuMVGLUT2+::POA neurons to stressors. We conducted experiments to directly examine behaviors evoked by photoactivation of SuMVGLUT2+::POA. In these experiments we observed a diversity of behaviors including increased locomotion and jumping but also treading/digging (Figure 4). These are behaviors elicited in mice by threatening and noxious stimuli. An Increase of running or only jumping could signify a specific locomotor effect, but this is not what was observed. Based on these behaviors, we expected to find evidence of increase movement in open field (Figure 5G-I) and light dark choice (Figure 5J-L) assays. For many of the assays, reporting distance traveled is not practical. An important set of experiments that argues against a generic increase in locomotion is the operant behavior experiments, which require the animal to engage in a learned behavior while receiving photostimulation of SuMVGLUT2+::POA neurons (Figure 6). This is particularly true for testing using a progressive ratio when the time of ongoing photostimulation is longer, yet animals actively and selectively engage the active port (Figure 6G-H). Further, we saw a shift in behavioral strategy induce by photoactivation in forced swim test (Figure 7H). Thus, activation of SUMVGLUT2+::POA neurons elicited a range of behaviors that included swimming, jumping, treading, and learned response, not just increased movement. Together these data strongly argue that SuMVGLUT2+::POA neurons do not only promote increased locomotor behavior. We interpret these data together with the data from fiber photometry studies to show SuMVGLUT2+::POA neurons are recruited during acute stressors, contribute to aversive affective component of stress, and promote active behaviors without constraining the behavioral pattern.

      Regarding genotype, we address this in comments above as well but believe that clarifying the use of litter mates, the extensive use of the VGLUT2-Cre line by multiple groups, and experimental design allowing for comparison to baseline, stimulation evoked, and post stimulation behaviors within and across genotypes mitigate possible concerns relating to the genotype.

      (8) What is the role of GABA neurons in the SuM and how does this relate to their function and interaction with glutamate neurons? In Supp 8, GABA neuron activation also modulates locomotion and in Fig 7 there is an effect on immobility, so this seems pretty important for the overall interpretation and should probably be mentioned in the abstract.

      Thank you for noting these interesting findings. We added text to highlight these findings to the abstract. Possible roles of GABAergic neurons in SuM extend beyond the scope of the current study particularly since SuM neurons have been shown to release both GABA and glutamate (Li Y, Bao H, Luo Y, et al. 2020, Root DH, Zhang S, Barker DJ et al. 2018). GABAergic neurons regulate dentate gyrus (Ajibola MI, Wu JW, Abdulmajeed WI, Lien CC 2021), REM sleep (Billwiller F, Renouard L, Clement O, Fort P, Luppi PH 2017), and novelty processing Chen S, He L, Huang AJY, Boehringer R et al. 2020). The population of exclusively GABAergic vs dual neurotransmitter neurons in SuM requires further dissection to be understood. How they may relate to SUMVGLUT2+::POA neurons require further investigation.

      Questions about figure presentation:

      (9) In Fig 3, why are heat maps shown as a single animal for the first couple and a group average for the others?

      Thank you for highlighting this point for further clarification. We modified the labels in the figure to help make clear which figures are from one animal across multiple trials and those that are from multiple animals. In the ambush assay each animal one had one trial, to avoid habituation to the mock predator. Accordingly, we do not have multiple trials for each animal in this test. In contrast, the dunk assay (10 trial/animal) and the shock (5 trials/animal) had multiple trials for each animal. We present data from a representative animal when there are multiple trials per animal and the aggerate data.

      Why is the temporal resolution for J and K different even though the time scale shown is the same?

      Thank you for noticing this error carried forward from a prior draft of the figure so we could correct it. We replaced the image in 3J with a more correctly scaled heatmap.

      What is the evidence that these signal changes are not due to movement per se?

      Thank you for the question. There are two points of evidence. First, all the 465 nm excitation (Ca2+ dependent) data was collected in interleaved fashion with 415 nm (isosbestic) excitation data. The isosbestic signal is derived from GCaMP emission but is independent of Ca2+ binding (Martianova E, Aronson S, Proulx CD. 2019). This approach, time-division multiplexing, can correct calcium-dependent for changes in signal most often due to mechanical change. The second piece of evidence is experimental. Using multiple cohorts of mice, we examined if the change in Ca2+ signal was correlated with movement. We used the threshold of velocity of movement seen following the ambush. We found no correlation between high velocity movements and Ca2+ signal (Figure 3K) including cross correlational analysis (Supplemental figure 5). Based on these points together we conclude the change in the Ca2+ signal in SUMVGLUT2+::POA neurons is not due to movement induced mechanical changes and we find no correlation to movement unless a stressor is present, i.e. mock predator ambush or forced swim. Further, the stressors evoke very different locomotor responses fleeing, jumping, or swimming.

      (10) In Fig 4, the authors carefully code various behaviors in mice. While they pick a few and show them as bars, they do not show the distribution of behaviors in Cre- vs Cre+ mice before manipulation (to show they have similar behaviors) or how these behaviors shift categories in each group with stimulation. Which behaviors in each group are shifting to others across the stim and post-stim periods compared to pre-stim?

      This is an important point. We selected behaviors to highlight in Figure4 C-E because these behaviors are exhibited in response to stress (De Boer & Koolhaas, 2003; van Erp et al., 1994). For the highlighted behaviors, jumping, treading/digging, grooming, we show baseline (pre photostimulation), stimulation, and post stimulation for Cre+ and Cre- mice with the values for each animal plotted. We show all nine behaviors as a heat map in Figure 4B. The panels show changes that may occur as a function of time and show changes induced by photostimulation.

      The heatmaps demonstrate that photostimulation of SUMVGLUT2+::POA neurons causes a suppression of walking, grooming, and immobile behaviors with an increase in jumping, digging/treading, and rapid locomotion. After stimulation stops, there is an increase in grooming and time immobile. The control mice show a range of behaviors with no shifts noted with the onset or termination of photostimulation.

      Of note, issues of statistics, genotype, and SABV are important here. For example, the hint that treading/digging may have a slightly different pre-stim basal expression, it seems important to first evaluate strain and sex differences before interpreting these data.

      We examined the effects of sex as a biological variable in the experiments reported in the manuscript and found no differences among males and females in any of the experiments where we had enough animals in each sex (minimum of 5 mice) for meaningful comparisons. We did this by comparing means and SEM of males and females within each group (e.g. Cre+ males vs Cre+ female, Cre- males vs Cre- females) and then conducted a t-test to see if there was a difference. For figures that show time as a variable (e.g Figure 6C-E), we compared males and females with time x sex as main factors and compared them (including multiple comparisons if needed). We found no significant main effects or interactions between males and females. Because of this, and to maximize statistical power, we decided to move forward to keep males and females together in all the analyses presented in the manuscript. It is worth noting also that the core of the experimental design employed is a change in behavior caused by photostimulation. The mice are also the same strain with only difference being the modification to add an IRES and sequence for Cre behind the coding sequence of the Slc17A6 (VGLUT2) gene.

      (11) Why do the authors use 10 Hz stimulation primarily? is this a physiologically relevant stim frequency? They show that they get effects with 1 Hz, which can be quite different in terms of plasticity compared to 10 Hz.

      Thank you for the raising this important question. Because tests like open field and forced swim are subject to habituation and cannot be run multiple times per animal a test frequency was needed to use across multiple experiments for consistency. The frequency of 10Hz was selected because it falls within the rate of reported firing rates for SuM neurons (Farrel et al., 2021; Pedersen et al., 2017) and based on the robust but sub maximal effects seen in the real-time place preference assays. Identification of the native firing rates during stress response would be ideal but gathering this data for the identified population remains a dauting task.

      (12) In Fig 5A-F, it is unclear whether locomotion differences are playing a role. Entrances (which are low for both groups) are shown but distance traveled or velocity are not.

      In B, there is no color in the lower left panel. where are these mice spending their time? How is the entirety of the upper left panel brighter than the lower left? If the heat map is based on time distribution during the session, there should be more color in between blue and red in the lower left when you start to lose the red hot spots in the upper left, for example. That is, the mice have to be somewhere in apparatus. If the heat map is based on distance, it would seem the Cre- mice move less during the stim.

      We appreciate the opportunity to address this question, and the attention to detail the reviewer applied to our paper. In the real time place preference test (RTPP) stimulation would only be provided while the animal was on the stimulation side. Mice quickly leave the stimulation side of the arena, as seen in the supplemental video, particularly at the higher frequencies. Thus, the time stimulation is applied is quite low. The mice often retreat to a corner from entering the stimulation side during trials using higher frequency stimulation. Changing locomotor activity along could drive changes in the number entrances but we did not find this. In regard to the heat map, the color scale is dynamically set for each of the paired examples that are pulled from a single trial. To maximize the visibility between the paired examples the color scale does not transfer between the trials. As a result, in the example for 10 Hz the mouse spent a larger amount of time in the in the area corresponding to the lower right corner of the image and the maximum value of the color scale is assigned to that region. As seen in the supplemental video, mice often retreated to the corner of the non-stimulation side after entering the stimulation side. The control animal did not spend a concentrated amount of time in any one region, thus there is a lack of warmer colors. In contrast the baseline condition both Cre+ and Cre- mice spent time in areas disturbed on both sides of arena, as expected. As a result, the maximum value in the heat map is lower and more area are coded in warmer colors allowing for easier visual comparison between the pair. Using the scale for the 10 Hz pair across all leads to mostly dark images. We considered ways to optimized visualization across and within pairs and focused on the within pair comparison for visualization.

      (13) By starting with 1 hz, are the experimenters inducing LTD in the circuit? what would happen if you stop stimming after the first epoch? Would the behavioral effect continue? What does the heat map for the 1 hz stim look like?

      Relatedly, it is a lot of consistent stimulation over time and you likely would get glutamate depletion without a break in the stim for that long.

      Thank you for the opportunity to add clarity around this point regarding the trials in RTPP testing. Importantly, the trials were not carried out in order of increasing frequency of stimulation, as plotted. Rather, the order of trials was, to the extent possible with the number of mice, counterbalanced across the five conditions. Thus, possible contribution of effects of one trial on the next were minimized by altering the order of the trials.

      We have added a heat map for the 1 Hz condition to figure 5B.

      For experiments on RTPP the average stimulation time at 10Hz was less than 10 seconds per event. As a result, the data are unlikely to be affected by possible depletion of synaptic glutamate. For experiments using sustained stimulation (open field or light dark choice assays) we have no clear data to address if this might be a factor where 10Hz stimulation was applied for the entire trial.

      (14) In Fig 6, the authors show that the Cre- mice just don't do the task, so it is unclear what the utility of the rest of the figure is (such as the PR part). Relatedly, the pause is dependent on the activation, so isn't C just the same as D? In G and H, why ids a subset of Cre+ mice shown?

      Why not all mice, including Cre- mice?

      Thank you for the opportunity to improve the clarity of this section. A central aspect of the experiments in Figure 6 is the aversiveness of SUMVGLUT2+::POA neuron photostimulation, as shown in Figure 5B-F. The aversion to photostimulation drives task performance in the negative reinforcer paradigm. The mice perform a task (active port activation) to terminate the negative reinforcer (photostimulation of SuMVGLUT2+::POA neurons). Accordingly, control mice are not expected to perform the task because SuMVGLUT2+::POA neurons are not activated and, thus the mice are not motivated to perform the task.

      A central point we aim to covey in this figure is that while SuMVGLUT2+::POA neurons are being stimulated, mice perform the operant task. They selectively activated the active port (Supplemental Figure 7). As expected, control mice activate the active port at a low level in the process of exploring the arena. This diminishes on subsequent trials as mice habituate to the arena (Figure 6D). The data in Figures 6 C and D are related but can be divergent. Each pause in stimulation requires a port activation of a FR1 test but the number of port activations can exceed the pauses, which are 10 seconds long, if the animal continues to activate the port. Comparing data in Figures 6 C and D revels that mice generally activated the port two to three times for each pause earned with a trend towards greater efficiency on day 4 with more rewards and fewer activations.

      The purpose of the progressive ratio test is to examine if photostimulation of SuMVGLUT2+::POA continues to drive behavior as the effort required to terminate the negative stimuli increases. As seen in Figures 6 G and H, the stimulation of SuMVGLUT2+::POA neurons remains highly motivating. In the 20-minute trial we did not find a break point even as the number of port activations required to pause the stimulation exceed 50. We do not show the Cre- mice is Figure 6G and H because they did not perform the task, as seen in Figure 6F. For technical reasons in early trials, we have fully timely time stamped data for rewards and port activations from a subset of the Cre+ mice. Of note, this contains both the highest and lowest performing mice from the entire data set.

      Taken together, we interpret the results of the operant behavioral testing as demonstrating that SuMVGLUT2+::POA neuron activation is aversive, can drive performance of an operant tasks (as opposed to fixed escape behaviors), and is highly motivating.

      (15) In Fig 7, what does the GCaMP signal look like if aligned to the onset of immobility? It looks like since the hindpaw swimming is short and seems to precede immobility, and the increase in the signal is ramping up at the onset of hindpaw swimming, it may be that the calcium signal is aligned with the onset of immobility.

      What does it look like for swimming onset?

      In I, what is the temporal resolution for the decrease in immobility? Does it start prior to the termination of the stim, or does it require some elapsed time after the termination, etc?

      Thank for the opportunity to addresses these points and improve that clarity of our interpretation of the data. Regarding aligning the Ca2+ signal from fiber photometry recordings to swimming onset and offset, it is important to note that the swimming bouts are not the same length. As a result, in the time prior to alignment to offset of behaviors animals will have been swimming for different lengths of time. In Figure 7 C, we use the behavioral heat map to convey the behavioral average. Below we show the Ca2+ dependent signal aligned at the offset of hindpaw swim for an individual mouse (A) and for the total cohort (B). This alignment shows that the Ca2+ dependent signal declines corresponding to the termination of hindpaw swimming. Because these bouts last less than the total the widow shown, the data is largely included in Figure 7 C and D, which is aligned to onset. Due to the nuance of the difference is the alignment and the partial redundancy, we elected to include the requested alignment to swimming offset in the reply rather in primary figure.

      Author response image 1.

      Turning to the question regarding swimming onset, the animals started swimming immediately when placed in the water and maintained swimming and climbing behaviors until shifting behaviors as illustrated in Figure 7A and B. During this time the Ca2+-dependent signal was elevated but there is only one trial per animal. This question can perhaps be better addressed in the dunk assay presented in Figure 3C, F and G and Supplemental Figure 4 H and I. Here swimming started with each dunk and the Ca2+ signal increased.

      Regarding the question for about figure 7I. We scored for entire periods (2 mins) in aggerate. We noted in videos of the behavior test that there was an abrupt decrease in immobility tightly corresponding to the end of stimulation. In a few animals this shift occurred approximately 15-20s before the end of stimulation. This may relate to the depletion of neurotransmitter as suggested by the reviewer.

      Reviewer 3

      Major points

      (1) Results in Figure 1 suggested that SuM-Vglu2::POA projected not only POA but also to the diverse brain regions. We can think of two models which account for this. One is that homogeneous populations of neurons in SuM-Vglu2::POA have collaterals and innervated all the efferent targets shown in Figure 1. Another is to think of distinct subpopulations of neurons projecting subsets of efferent targets shown in Figure 1 as well as POA. It is suggested to address this by combining approaches taken in experiments for Figure 1 and Supplemental Figure 2.

      Thank you for raising this interesting point. We have attempted combining retroAAV injections into multiple areas that receive projections from SUMVGLUT2+::POA neurons. However, we have found the results unsatisfactory for separating the two models proposed. Using eYFP and tdTomato expressing we saw some overlapping expressing in SuM. We are not able to conclude if this indicates separate populations or partial labeling of a homogenous populations. A third option seems possible as well. There could be a mix of neurons projecting to different combinations of downstream targets. This seems particularly difficult to address using fluorophores. We are preparing to apply additional methodologies to this question, but it extends beyond the scope of this manuscript.

      (2) Since the authors drew a hypothetical model in which the diverse brain regions mediate the effect of SuM-Vglu2::POA activation in behavioral alterations at least in part, examination of the concurrent activation of those brain regions upon photoactivation of SuM-Vglu2::POA. This must help the readers to understand which neural circuits act upon the induction of active coping behavior under stress.

      Thank you for raising this important point. We agree that activating glutamatergic neurons should lead to activation of post synaptic neurons in the target regions. Delineating this in vivo is less straight forward. Doing so requires much greater knowledge of post synaptic partners of SUMVGLUT2+::POA neurons. There are a number of issues that would need to be accounted for. Undertaking two color photo stimulation plus fiber photometry is possible but not a technical triviality. Further, it is possible that we would measure Ca2+ signals in neurons that have no relevant input or that local circuits in a region may shape the signal. We would also lack temporal resolution to identify mono-postsynaptic vs polysynaptic connections. Thus, we would struggle to know if the change in signal was due to the excitatory input from SuM or from a second region. At present, we remain unclear on how to pursue this question experimentally in a manner that is likely to generate clearly interpretable results.

      (3) In Figure 4, "active coping behaviors" must be called "behaviors relevant to the active behaviors" or "active coping-like behaviors", since those behaviors were in the absence of stressors to cope with.

      Thank you for the suggestion on how to clarify our terminology. We have adopted the active coping-like term.

      (4) For the Dunk test, it is suggested to describe the results and methods more in detail, since the readers would be new to it. In particular, the mice could change their behavior between dunks under this test, although they still showed immobility across trials as in Supplemental Figure 4I. Since neural activity during the test was summarized across trials as in Figure 3, it is critical to examine whether the behavior changes according to time.

      Thank you for identifying this opportunity to improve our manuscript. We have expanded and added a detailed description of the dunk test in the methods section.

      As for Supplemental Figure 4I, we apologize for the confusion because the purpose of this figure is to show that mice remained mobile for the entire 30-second dunk trial. This did not appreciably change over the 10 trials. We have revised this figure to plot both immobile and mobile time to achieve greater clarity on this point.

      Minor points

      Typos

      In Figure 1, please add a serotype of AAVs to make it compatible with other figures and their legends.

      In the main text and Figure 2K, the authors used MHb/LHb and mHb/lHb in a mixed fashion. Please make them unified.

      In the figure legend of Figure 6, change "SuMVGLUT2+::POA neurons drive" to "SuMVGLUT2+::POA neurons " in the title.

      In line 86, please change "Retro-AAV2-Nuc-flox(mCherry)-eGFP" to "AAV5-Nuc-flox(mCherry)eGFP".

      In line 80, please change "Positive controls" to "As positive controls, ".

      Thank you for taking the time and making the effort to identify and call these out. We have corrected them.

    1. Metadata is information about some data. So we often think about a dataset as consisting of the main pieces of data (whatever those are in a specific situation), and whatever other information we have about that data (metadata).

      When you consider the quantity of information that can be obtained from a single post, it is mind-boggling. Metadata is a strong tool that may be used by a large number of individuals. I believe that it has the potential to do a great deal of damage if it is misused. The information obtained from the post may be used by criminals to commit crimes. One example of this is the singer and artist Pop Smoke, who passed away recently. During his stay in Los Angeles, Pop Smoke shared a photo on his Instagram account that included the time and location of his location. In less than forty-eight hours, a bunch of criminals were successful in locating him and tragically ended his life.

    1. Author Response

      The following is the authors’ response to the original reviews.

      General remarks for the Editor and the Reviewers

      We would like to thank the Editor and the Reviewers for their feedback. Below we address their comments and present our point-by-point responses as well as the related changes in the manuscript.

      In addition to these changes, in a few cases we have found it necessary to move some texts and provide some additional explanations within the manuscript. We emphasize that these amendments have been made for only technical reasons, and do not alter the results and conclusions of the paper, but may help to render the text more coherent and understandable to readers with little knowledge of the subject.

      These minor corrections are:

      • We extended the Introduction section by a sentence (lines 40-42) that is intended to fit the proposed template directed, non-enzymatic replication mechanism into a more general prebiotic evolutionary context, thus emphasizing its biological relevance. This sentence includes an additional reference (Rosenberger et al., 2021).

      • Two very methodologically oriented and repeated descriptions of random sequence generation have been moved to the Methods section (lines 178-185) from the Results section (lines 336-339 and lines 351-354).

      • We complemented the Data availability statement with licensing information (lines 684-685).

      • Further minor changes (also indicated by red texts) have been implemented to remedy logical and grammatical glitches.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Szathmary and colleagues explore the parabolic growth regime of replicator evolution. Parabolic growth occurs when nucleic acid strain separation is the rate-limiting step of the replication process which would have been the case for non-enzymatic replication of short oligonucleotide that could precede the emergence of ribozyme polymerases and helicases. The key result is that parabolic replication is conducive to the maintenance of genetic diversity, that is, the coexistence of numerous master sequences (the Gause principle does not apply). Another important finding is that there is no error threshold for parabolic replication except for the extreme case of zero fidelity.

      Strengths:

      I find both the analytic and the numerical results to be quite convincing and well-described. The results of this work are potentially important because they reveal aspects of a realistic evolutionary scenario for the origin of replicators.

      Weaknesses:

      There are no obvious technical weaknesses. It can be argued that the results represent an incremental advance because many aspects of parabolic replication have been explored previously (the relevant publications are properly cited). Obviously, the work is purely theoretical, experimental study of parabolic replication is due. In the opinion of this reviewer, though, these are understandable limitations that do not actually detract from the value of this work.

      We are grateful that this Reviewer appreciates our work. We completely agree that the ultimate validation must come from experiments. It is important to stress that in this field theory often preceded experimental work by decades, and the former often guided the latter. We hope that for the topic of the present paper experiments will follow considerably faster.

      Reviewer #2 (Public Review):

      Summary:

      A dominant hypothesis concerning the origin of life is that, before the appearance of the first enzymes, RNA replicated non-enzymatically by templating. However, this replication was probably not very efficient, due to the propensity of single strands to bind to each other, thus inhibiting template replication. This phenomenon, known as product inhibition, has been shown to lead to parabolic growth instead of exponential growth. Previous works have shown that this situation limits competition between alternative replicators and therefore promotes RNA population diversity. The present work examines this scenario in a model of RNA replication, taking into account finite population size, mutations, and differences in GC content. The main results are (1) confirmation that parabolic growth promotes diversity, but that when the population size is small enough, sequences least efficient at replicating may nevertheless go extinct; (2) the observation that fitness is not only controlled by the replicability of sequences, but also by their GC content; (3) the observation that parabolic growth attenuates the impact of mutations and, in particular, that the error threshold to which exponentially growing sequences are subject can be exceeded, enabling sequence identity to be maintained at higher mutation rates.

      Strengths:

      The analyses are sound and the observations are intriguing. Indeed, it has been noted previously that parabolic growth promotes coexistence, its role in mitigating the error threshold catastrophe - which is often presented as a major obstacle to our understanding of the origin of life - had not been examined before.

      Weaknesses:

      Although all the conclusions are interesting, most are not very surprising for people familiar with the literature. As the authors point out, parabolic growth is well known to promote diversity (SzathmaryGladkih 89) and it has also been noted previously that a form of Darwinian selection can be found at small population sizes (Davis 2000).

      Given that under parabolic growth, no sequence is ever excluded for infinite populations, it is also not surprising to find that mutations have a less dramatic exclusionary impact.

      In the two articles cited (Szathmary-Gladkih 1989 and Davis 2000) the subexponentiality of the system was implemented in a mechanistic way, by introducing the exponent 0 < 𝑝 < 1. Although the behaviour of these models is more or less consistent with experimental findings (von Kiedrowski, 1986; Zielinski and Orgel, 1987), the divergence of per capita growth rates (𝑥̇/𝑥) at very low concentrations–which guarantees the ability to maintain unlimited diversity in the case of infinite population sizes–makes this formal approach partly unrealistic.

      To avoid the possible artefacts of this mechanistic approach, and as there are no previous studies analysing the diversity maintaining ability of finite populations of parabolic replicators in an individual-based model context, we implemented a simplified template replication mechanism leading to parabolic growth and analysed the dynamics in an individual-based stochastic model context. The key point of our investigation is that considerable diversity can be maintained in the system even when the population size is quite small.

      Regarding the Reviewer’s comment on selection: Darwinian selection can only occur in a simple subexponential dynamics if the ratio of replicabilities diverges, cf. Eq. (8) and the preceding paragraph in Davis, 2000.

      Our results also show (Figs. 4B and 4C) that high mutation rates and the error threshold problem can still be considered as a major limiting factor for parabolically replicating systems in terms of their diversity-maintaining ability. In the light of the above, potential mechanisms to relax the error threshold in such systems, one of which is demonstrated in the present study, seem to be important steps to account for the sequence diversification and increase in molecular complexity during the early evolution of RNA replicators.

      A general weakness is the presentation of models and parameters, whose choices often appear arbitrary. Modeling choices that would deserve to be further discussed include the association of the monomers with the strands and the ensuing polymerization, which are combined into a single association/polymerization reaction (see also below), or the choice to restrict to oligomers of length L = 10. Other models, similar to the one employed here, have been proposed that do not make these assumptions, e.g. Rosenberger et al. Self-Assembly of Informational Polymers by Templated Ligation, PRX 2021. To understand how such assumptions affect the results, it would be helpful to present the model from the perspective of existing models.

      The assumption of one-step polymerization reactions that we used here is a common technique for modelling template replication of sequence-represented replicators [see, e.g., Fontana and Schuster, 1998 (10.1126/science.280.5368.1451), Könnyű et al., 2008 (10.1186/1471-2148-8267), Vig-Milkovics et al, 2019 (10.1016/j.jtbi.2018.11.020) or Szilágyi et al., 2020 (10.1371/journal.pgen.1009155)]. This is because assuming base-to-base polymerisation of the copy would lead to a very large number of different types of intermediates, which a Gillespietype stochastic simulation algorithm could not handle in reasonable computation times, even if the sequences were relatively short. For comparison, in our model, where polymerization is one-step, the characteristic time of a simulation for 𝐿 = 10, 𝑁 = 105 and 𝛿 = 0.01 was 552 hours.

      Note that in Rosenberg et al. (PRX 2021), in contrast to a pioneering work [Fernando et al, 2007 (10.1007/s00239-006-0218-4)], sequences of replicators are not represented, which makes this approach completely inapplicable to our case, in which sequence defines the fitness. In sum, we suggest that this valid criticism points to possible future work.

      The values of the (many) parameters, often very specific, also very often lack justifications. For example, why is the "predefined error factor" ε = 0.2 and not lower or higher? How would that affect the results?

      A general remark. For the more important parameters , several values were used to test the behaviour of the model (see Table 1), but due to the considerable number of parameters, it is impossible to examine all possible combinations. 𝑐+ = 1 fixes the timescale, 𝐿 is set to 10 to obtain reasonable running times (see above).

      𝜀 characterizes how replicability decreases as the number of mutations increases. In the manuscript we used the following default vector: 𝜀 = (0.05, 0.2, 1) in which the third element corresponds to the mutation-free sequence, so it must to be 1. The first element determines the baseline replicability (see Methods), which we preferred not to change because it would fundamentally alter the ratio of replication propensities to association and dissociation propensities (as the substantial amount of complementary sequences of the master sequences are of baseline replicability) and thus would alter the reaction kinetics to an extent that it is not comparable with the original results. Therefore, only the second element can be adjusted. Accordingly, we have analysed the behaviour of the model in the cases of a steeper and a more gradual loss of replicability using the following two vectors, respectively: 𝜀, = (0.05, 𝟎. 𝟎𝟓, 1) and 𝜀,, = (0.05, 𝟎. 𝟓, 1). The choice of 𝜀, is chemically more plausible, since for very short oligomers the loss of chemical activity and replicability as a function of the number of mutations can be very sharp. We performed a series of simulations with all possible combinations of 𝛿 = 0.001, 0.005, 0.1 and 𝑁 = 103, 104, 105 for 𝜀′ and 𝜀,,in the constant population and chemostat model context (36 different runs). For other parameters, we took the default values, see Table 1. These values also correspond to the parameters we used in Figures 2 and 6. The results show that the steeper loss of replicability (𝜀,) slightly increases the diversity maintaining ability of the system, whereas the more gradual loss of replicability (𝜀,,) moderately decreases the diversity-maintaining ability of the system, and that these shifts are more pronounced in the constant population size model (Author response image 1) than in the chemostat model (Author response image 2). Altogether, these results confirm that the qualitative outcome of the model is robust in a wide range of loss of replicability (𝜀 vector) values.

      Author response image 1.

      Replicator coexistence in the constant population model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 2A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Author response image 2.

      Replicator coexistence in the chemostat model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 6A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Similarly, in equation (11), where does the factor 0.8 come from?

      This factor scales the decay rate of duplex sequences (𝑐"!") as the function of the binding energy

      (𝐸b). The value of 0.8 is an arbitrary choice, the value should be in the interval (0,1) and is only relevant in the chemostat model. It is expected to have a similar effect on the dynamics as the duplex decay factor parameter 𝑓, which we have investigated in a wide range of different values (cf. Table 1, Fig. 6), although 𝑓 is independent of the binding energy (𝐸/): increasing/decreasing the 0.8 factor is expected to decrease/increase the average total population size. We have investigated the diversity maintaining ability of the system at smaller (0.6) and larger (0.9) parameter values at different population sizes (𝑁 ≈ 103, 104 and 105) and at different replicability distances (δ = 0.001, 0.005 and 0.01) as shown in Fig. 6. We have found that the number of coexisting master types changes very little in response to changes in this factor. Only two shifts could be detected (underlined): factor 0.9 combined with 𝑁 ≈ 104 and 𝛿 = 0.001 caused the number of surviving master types to decrease by one, while factor 0.9 combined with 𝑁 ≈ 103 and 𝛿 = 0.01 caused the number of surviving master types to increase by one (Author response table 1). Factor 0.6 produced the same number of surviving types as the default (Author response table 1). In summary, the model shows marked robustness to changes in the values of this parameter.

      Author response table 1.

      Number of coexisting master types in the chemostat model with different binding energy dependent duplex decay rates. Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different factor values: 0.6, 0.8 (the original) and 0.9 for comparability.

      Why is the kinetic constant for duplex decay reaction 1.15e10−8?

      Note that this value is the minimum of the duplex decay rate, Table 1 correctly shows the interval of this kinetic constant as: [1.15 ⋅ 10-8, 6.4 ⋅ 10-5]. Both values are derived from the basic parameters of the system and can be computed according to Eq. (11). The minimum: as the parameter set corresponding to this value is: . The maximum: with .

      Are those values related to experiments, or are they chosen because specific behaviors can happen only then?

      See above.

      The choice of the model and parameters potentially impact the two main results, the attenuation of the error threshold and the role of GC content:

      Regarding the error threshold, it is also noted (lines 379-385) that it disappears when back mutations are taken into account. This suggests that overcoming the error threshold might not be as difficult as suggested, and can be achieved in several ways, which calls into question the importance of the particular role of parabolic growth. Besides, when the concentration of replicators is low, product inhibition may be negligible, such that a "parabolic replicator" is effectively growing exponentially and an error catastrophe may occur. Do the authors think that this consideration could affect their conclusion? Can simulations be performed?

      The assumption of back mutation only provides a theoretical solution to the error threshold problem: back mutation guarantees a positive (non-zero) concentration of a master type, but, since the probability of back mutation is generally very low, this equilibrium concentration may be extremely low, or negligible for typical system sizes. Consequently, back mutation alone does not solve the problem of the error catastrophe: in our system back mutation is present (the probability that a sequence with 𝑘 errors mutates back to a master sequence is 𝜇k(1−𝜇)L-k), and the diversity-maintaining ability is limited. The effect of back mutation decreases exponentially with increasing sequence length.

      Regarding the role of the GC content, GC-rich oligomers are found to perform the worst but no rationale is provided.

      For GC-rich oligonucleotides the dissociation probability of a template-copy complex is relatively low (cf. Eqs. (9, 10)), thus they have a relatively low number of offspring, cf. lines 557-561: “a relatively high dissociation probability and the consequential higher propensity of being in a simple stranded form provides an advantage for sequences with relatively low GC content in terms of their replication affinity, that is, the expected number of offspring in case of such variants will be relatively high.”. Note that the simulation results shown in Fig. 3A, demonstrate the realization of this effect with prepared sequences (along a GC content gradient).

      One may assume that it happens because GC-rich sequences are comparatively longer to release the product. However, it is also conceivable that higher GC content may help in the polymerization of the monomers as the monomers attach longer on the template (as described in Eq. (9)). This is an instance where the choice to pull into a single step the association and polymerization reactions are pulled into a single step independent of GC content may be critical.

      It would be important to show that the result arises from the actual physics and not from this modeling choice.

      Some more specific points that would deserve to be addressed:

      • Line 53: it is said that p "reflects how easily the template-reaction product complex dissociates". This statement is not correct. A reaction order p<1 reflects product inhibition, the propensity of templates to bind to each other, not slow product release. Product release can be limiting, yet a reaction order of 1 can be achieved if substrate concentrations are sufficiently high relative to oligomer concentrations (von Kiedrowski et al., 1991).

      We think the key reference is Von Kiedrowski (1993) in this case. Other things being equal, his Table 1 on p. 134 shows that a sufficient increase in 𝐾4, i.e., the stability of the duplex (template and copy) (association rate divided by dissociation rate) throws the system into the parabolic regime. This is what we had in mind. In order to clarify this, we modified the quoted sentence thus: “In this kinetics, the growth order is equal or close to 0.5 (i.e., the dynamics is sub-exponential) because increased stability of the template-copy complex (rate of association divided by dissociation) promotes parabolic growth (von Kiedrowski et al., 1991; von Kiedrowski & Szathmáry, 2001).”

      • Population size is a key parameter, and a comparison is made between small (10^3) and large (10^5) populations, but without explaining what determines the scale (small/large relative to what?).

      The “small” value (103) corresponds to the smallest meaningful population size, significantly smaller population sizes (e.g. 102) cannot maintain the 10 master types (or any subset of them) and are chemically unrealistic. The “large value” (105) is the largest population size for which simulation times are still acceptable, in the case of 106 the runtimes are in the order of months.

      • In the same vein, we might expect size not to be the only important parameter, but also concentration.

      With constant volume population size and concentration are strictly coupled.

      • Lines 543-546: if understanding correctly, the quantitative result is that the error threshold rises from 0.1 in the exponential case to 0.196 in the parabolic. Are the authors suggesting that a factor of 2 is a significant difference?

      In this paragraph we compared the empirical error threshold of our system (which is close to 𝑝"#$ = 0.15) with the error threshold of the well-known single peak fitness landscape (which can be approximated by ) as a reference case. To make the message even clearer we have extended the last sentence (lines 596-597) as follows: “but note that applying this approach to our system is a serious oversimplification”. The 0.196 is simply the probability of error-free replication of a sequence when , but we have removed this sentence (“corresponding to the replication accuracy of a master sequence”) from the manuscript as it seems to be confusing.

      • Figure 3C: this figure shows no statistically significant effect?

      Thank you for pointing out this. We statistically tested the hypothesis that the GC content between the survived and the extinct master subsets are different. This analysis revealed that the differences between these two groups are statistically significant, which we now included in the manuscript at lines 380-390: “A direct investigation of whether the sequence composition of the master types is associated with their survival outcome was conducted using the data from the constant population model simulation results (Figure 2). In these data, the average GC content was measured to be lower in the surviving master subpopulations than in the extinct subpopulations (Figure 3C). To determine whether this difference was statistically significant, nonparametric, two-sample Wilcoxon rank-sum tests (Hollander & Wolfe, 1999) were performed on the GC content of the extinct-surviving master subsets. The GC content was significantly different between these two groups in all nine investigated parameter combinations of population size (N) and replicability distance (δ) at p<0.05 level, indicating a selective advantage for a lower GC content in the constant population model context. The exact p values obtained from this analysis are shown in Figure 3C.”

      • line 542: "phase transition-like species extension (Figure 4B)": such a clear threshold is not apparent.

      Thank you for pointing out the incorrect phrasing. As there is no clear threshold in the number of coexisting types as a function of the mutation rate, we removed the “phase transition-like” expression: “However, when finite population sizes and stochastic effects are taken into account, at the largest investigated per-base mutation rate (𝑝mut = 0.15), the summed relative steady-state master frequencies approach zero (Figure 4C) with accelerating species extinction (Figure 4B), indicating that this value is close to the system׳s empirical error threshold.” (lines 589-594).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On the whole, the work is well done and presented, there are no major recommendations. It seems a good idea to cite and briefly discuss this recent paper: https://pubmed.ncbi.nlm.nih.gov/36996101/ which develops a symbiotic scenario of the coevolution of primordial replicators and reproducers that appears to be fully compatible with the results of the current work.

      Thank you for bringing this article to our attention. We have inserted the following sentence at lines 621-624: “The demonstrated diversity-maintaining mechanism of finite parabolic populations can be used as a plug-in model to investigate the coevolution of naked and encapsulated molecular replicators (e.g., Babajanyan et al., 2023).”

      The manuscript is well written, but there are some minor glitches that merit attention. For example:

      l. 5 "carriers presents a problem, because product formation and mutual hybridization" - "mutual" is superfluous here, delete

      l. 13 "amplification. In addition, sequence effects (GC content) and the strength of resource" - hardly "effects" - should be 'features' or 'properties'

      l. 41 "If enzyme-free replication of oligomer modules with a high degree of sequence" - "modules" here is only confusing - simply, "oligomers"

      l. 44 "under ecological competition conditions with which distinct replicator types with different" - delete "with" etc, there are many such minor glitches that are best corrected.

      Thank you for pointing out, we have corrected! Other drafting errors, glitches, superfluous sentences have also been corrected.

      Reviewer #2 (Recommendations For The Authors):

      None

      Editor (Recommendations For The Authors):

      In the manuscript, it appears that coexistence is assessed at a given point in time, while figures seem to show that it remains time-dependent. It would be great if the authors could clarify this and/or discuss this.

      We appreciate you bringing this to our attention, as we have indeed missed to elaborate on this important point. The steady state characteristic of the coexistence is assessed in our model in the following way: the relative frequency of each master sequence is tested for the condition of ≥ 100- (cut-off relative frequency for survival) in every 2,000th replication step in the interval between 10,000 replication steps before termination and actual termination (10= replication steps). If the above condition is true more than once, we consider the master type in question as survived (we have included this explanation in the Methods section: lines 258-268). Although this relatively narrow time interval can still be regarded as a snapshot of the state of the system, according to our numerical experiences, the resulting measure is a reliable quantitative indicator of the apparent stability of species coexistence in the parabolic dynamics.

    1. Author Response

      eLife assessment

      In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organization in the actin cortex. While the theoretical work is solid, experimental evidence in support of the model assumptions remains incomplete. The presentation could be improved to enhance accessibility for readers without a strong background in hydrodynamic and nematic theories.

      To address the weaknesses identified in this assessment, we plan to expand the description of the theoretical model to make it more accessible to a broader spectrum of readers. We will discuss in more detail the relation between the different mathematical terms and physical processes at the molecular scale, as well as the experimental evidence supporting the model assumptions. We will also discuss more explicitly how our results are relevant to different systems exhibiting actomyosin nematic bundles beyond stress fibers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.

      Strengths:

      The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.

      We thank the referee for these comments.

      Weaknesses:

      This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative.

      We agree with the referee that the manuscript will benefit from a better description of the theoretical model and the results in relation with specific molecular and cellular mechanisms. We will further emphasize how a number of experimental observations in the literature support our model assumptions and can be explained by our results. A quantitative comparison is difficult for several reasons. First, many of the parameters in our theory have not been measured, and in fact estimates in the literature often rely on comparison with hydrodynamic models such as ours. Second, the effective physical properties of actomyosin gels can vary wildly between cells, which may explain the diversity of forms, dynamics and functions. For these reasons, we chose to delineate regimes leading to qualitatively different emerging architectures and dynamics. In the revised manuscript, we will make this point clearer and will further study the literature to seek quantitative comparison.

      It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination.

      We thank the referee for this comment. Our theory is applicable to actomyosin in living cells. To our knowledge, reconstituted actomyosin gels currently lack the ability to sustain the dynamical steady-states involved in the proposed self-organization mechanism, which balance actin flows with turnover. In addition to actomyosin gels in living cells, in vitro systems based on encapsulated cell extracts can also sustain such dynamical steady states [e.g. https://doi.org/10.1038/s41567-018-0413-4], and therefore our theory may be applicable to these systems as well. Of course, with advancements in the field of reconstituted systems, this may change in the near future. We will explicitly discuss this point in the revised manuscript.

      The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns.

      We agree and will avoid the term “sarcomeric”.

      Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim.

      We plan to clarify this point by representing in a main figure the stress profiles across dense nematic structures (currently in Supp Fig 2), along with a more detailed description. In short, depending on the parameter regime, the competition between active and viscous stresses in the actin gel determine whether the emergent structures are extensile or contractile. In our system tension is positive in all directions at all times. However, in “contractile” structures, tension is larger along the bundle, whereas in “extensile” structures, tension is larger perpendicular to the bundle. This is consistent with the common expression for active stress of incompressible nematic systems [see e.g. https://doi.org/10.1038/s41467-018-05666-8], that takes the form –zQ, where z is positive for an extensile system, showing that in this case active tension is negative along the nematic direction. This point, also been raised by another referee, will be clarified and connected to existing literature.

      Additionally, its unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase.

      In the present work, we focus on the self-organization of a periodic patch of actomyosin gel. However, in adherent cells boundary conditions play an essential role, e.g. with inflow at the cell edge as a result of polymerization and exclusion at the nucleus. In ongoing work, we are studying with the present model the dynamics of assembly and reconfiguration of dense nematic structures in domains with boundary conditions mimicking in adherent cells, as suggested by the referee. We would like to note, however, that the prominent stress fibers in cells adhered to stiff substrates, so abundantly reported in the literature, are not the only instance of dense nematic actin bundles, and may not be representative of physiologically relevant situations. In the present manuscript, we emphasize the relation of the predicted organizations with those found in different in vivo contexts not related to stress fibers, such as the aligned patterns of bundles in insects (trachea, scales in butterfly wings), in hydra, or in reproductive organs of C elegans; the highly dynamical network of bundles observed in C elegans early embryos; or the labyrinth patters of micro-ridges in the apical surface of epidermal cells in fish. We will further emphasize these points in the revised manuscript.

      Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.

      We thank the referee for raising this question, which needs further clarification. The goal of the microscopic model is not to reproduce the self-organized patterns predicted by the active gel theory. The microscopic model lacks essential ingredients, notably a realistic description of hydrodynamics and turnover. Our goal with the agent-based simulations is to extract the relation between nematic order and active stresses for a small homogeneous sample of the network. This small domain is meant to represent the homogeneous active gel prior to pattern formation, and it allows us to substantiate key assumptions of the continuum model leading to pattern formation, notably the dependence of isotropic and deviatoric components of the active stress on density and nematic order (Eq. 7) and the active generalized stress promoting ordering.

      We should mention that reproducing the range of out-of-equilibrium mesoscale architectures predicted by our active gel model with agent-based simulations seems at present not possible, or at least significantly beyond the state-of-the-art. We note for instance that parameter regimes in which agent-based simulations of actin gels display extended contractile steady-states are non-generic, as these simulations often lead to irreversible clumping (as do many reconstituted contractile systems), see e.g. https://doi.org/10.1038/ncomms10323 or https://doi.org/10.1371/journal.pcbi.1005277. Very few references report sustained actin flows or the organization of a few bundles (https://doi.org/10.1371/journal.pcbi.1009506). While agent-based cytoskeletal simulations are very attractive because they directly connect with molecular mechanisms, active gel continuum models are better suited to describe out-ofequilibrium emergent hydrodynamics at a mesoscale. We believe that these two complementary modeling frameworks are rather disconnected in the literature, and for this reason, we have attempted substantiate our continuum modeling with discrete simulations. In the revised manuscript, we will better frame the relationship between them.

      Reviewer #2 (Public Review):

      Summary:

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article.

      We thank the referee for these comments. In the revised manuscript, we will highlight the novelty of the paper in terms of the theoretical model, the mechanism of patterning of dense nematic structures, the nature and dynamics of the resulting architectures, their relation with the experimental record, and the connection with microscopic models.

      We will emphasize the fact that nematic architectures in the actin cytoskeleton are characterized by a co-localization of order and density (and strong variations in each of these fields), that recent work shows that isotropic and nematic organizations coexist and are part of a single heterogeneous network, that the emergence and maintenance of nematic order requires active contraction, and that the assembly and maintenance of dense nematic bundles involves convergent flows. None of these key features can be described by the common incompressible models of active nematics. To address this, we develop here a compressible and density dependent model for an active nematic gel. We will carefully justify that the proposed model is meaningful for actomyosin gels, and we will highlight the commonalities and differences with previous models of active nematics.

      Strengths:

      (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network

      Weaknesses:

      Not placed in the context or literature on active nematics.

      We agree with the referee that the manuscript requires a better contextualization of the work in relation with the very active field of active nematics. In the revised manuscript, we will clearly describe the relation of our model with existing ones.

      Reviewer #3 (Public Review):

      The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripelike patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the conclusions drawn from continuum simulations.

      The paper is well written, figures are mostly clear and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not yet stated this way by the authors, I would argue that combining these two is of the key ingredients that distinguishes this theoretical paper from similar ones. The diversity of patterning processes experimentally observed is nicely elaborated on in the introduction of the paper, though other closely related previous work could also have been included in these references (see below for examples).

      We thank the referee for these comments and for the suggestion to emphasize the interplay of isotropic and anisotropic active tension, which is possible only in a compressible gel. We thank the suggestions of the referee to better connect with existing literature.

      To introduce the continuum model, the authors exclusively cite their own, unpublished pre-print, even though the final equations take the same form as previously derived and used by other groups working in the field of active hydrodynamics (a certainly incomplete list: Marenduzzo et al (PRL, 2007), Salbreux et al (PRL, 2009, cited elsewhere in the paper), Jülicher et al (Rep Prog Phys, 2018), Giomi (PRX, 2015),...). To make better contact with the broad active liquid crystal community and to delineate the present work more compellingly from existing results, it would be helpful to include a more comprehensive discussion of the background of the existing theoretical understanding on active nematics. In fact, I found it often agrees nicely with the observations made in the present work, an opportunity to consolidate the results that is sometimes currently missed out on. For example, it is known that self-organised active isotropic fluids form in 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019), just as shown and discussed in Fig. 2. It is also known that extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis (the other way around for \kappa>0, see e.g. Doostmohammadi et al, Nat Comm, 2018 "Active Nematics" for a review that makes this point), consistent with all relative nematic director/flow orientations shown in Figs. 2 and 3 of the present work.

      We thank the referee for these suggestions. Indeed, in the original submission we had outsourced much of the justification of the model and the relevant literature to a related pre-print, but this is not reasonable. In the revised manuscript, we will discuss our model in the context of the state-of-the-art, emphasizing connections with existing results.

      The results of numerical simulations are well-presented. Large parts of the discussion of numerical observations - specifically around Fig. 3 - are qualitative and it is not clear why the analysis is restricted to \kappa<0. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (https://arxiv.org/abs/2309.04224). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.

      We thank the referee for these comments. We will expand the description of the results around Figure 3. We are reluctant to extend the detailed analysis of emergent architectures and dynamics to the case \kappa > 0 as it leads to architectures not observed, to our knowledge, in actin networks. We will expand the characterization of emergent contractile/extensile networks by describing the distribution of the different components of the stress tensor across the bundles and will place our results in the context of related recent work.

      I compliment the authors for trying to gain further mechanistic insights into this conclusion with microscopic filament simulations that are diligently performed. It is rightfully stated that these simulations only provide plausibility tests and, within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 (which is dropped ad-hoc from Fig. 3 onward) microscopically, in which the continuum theory does also predict the formation of stripe patterns - besides the short comment at the very end? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa?

      We thank the referee for this compliment. We think that the point raised by the referee is very interesting. It is reasonable to expect that the sign of \kappa will not be a constant but rather depend on S and \rho. Indeed, for a sparse network with low order, the progressive bundling by crosslinkers acting on nearby filaments is likely to produce a large active stress perpendicular to the nematic direction, whereas in a dense and highly ordered region, myosin motors are more likely to effectively contract along the nematic direction whereas there is little room for additional lateral contraction by additional bundling. In the revised manuscript, we envision to further deepen in this issue in two ways. First, we plan to perform additional agent-based simulations in a regime leading to kappa > 0. Second, we will modify the active gel model such that kappa < 0 for low density/order, so that a fibrillar pattern is assembled, and kappa > 0 for high density/order, so that the emergent fibers are highly contractile.

      Overall, the paper represents a valuable contribution to the field of active matter and, if strengthened further, might provide a fruitful basis to develop new hypothesis about the dynamic self-organisation of dense filamentous bundles in biological systems.

    1. Author Response

      We would like to thank the editorial board and the reviewers for their assessment of our manuscript and their constructive feedback that we believe will make our manuscript stronger and clearer. Please find below our provisional response to the public reviews; these responses outline our plan to address the concerns of the reviewers for a planned resubmission. Our responses are written in red.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Misic et al showed that white matter properties can be used to classify subacute back pain patients that will develop persisting pain.

      Strengths:

      Compared to most previous papers studying associations between white matter properties and chronic pain, the strength of the method is to perform a prediction in unseen data. Another strength of the paper is the use of three different cohorts. This is an interesting paper that provides a valuable contribution to the field.

      We thank the reviewer for emphasizing the strength of our paper and the importance of validation on multiple unseen cohorts.

      Weaknesses:

      The authors imply that their biomarker could outperform traditional questionnaires to predict pain: "While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain and provide easy-to-use brief questionnaires-based tools, (21, 25) parameters often explain no more than 30% of the variance (28-30) and their prognostic accuracy is limited.(31)". I don't think this is correct; questionnaire-based tools can achieve far greater prediction than their model in about half a million individuals from the UK Biobank (Tanguay-Sabourin et al., A prognostic risk score for the development and spread of chronic pain, Nature Medicine 2023).

      We agree with the reviewer that we might have under-estimated the prognostic accuracy of questionnaire-based tools, especially, the strong predictive accuracy shown by Tangay-Sabourin 2023. In the revised version, we will change both the introduction and the discussion to reflect the the questionnaires based prognostic accuracy reported in the seminal work by TangaySabourin. We do note here, however, that the latter paper while very novel is unique in showing the power of questionnaires. In addition, the questionnaires we have tested in our cohort did not show any baseline differences suggestive of prognostic accuracy.

      Moreover, the main weakness of this study is the sample size. It remains small despite having 3 cohorts. This is problematic because results are often overfitted in such a small sample size brain imaging study, especially when all the data are available to the authors at the time of training the model (Poldrack et al., Scanning the horizon: towards transparent and reproducible neuroimaging research, Nature Reviews in Neuroscience 2017). Thus, having access to all the data, the authors have a high degree of flexibility in data analysis, as they can retrain their model any number of times until it generalizes across all three cohorts. In this case, the testing set could easily become part of the training making it difficult to assess the real performance, especially for small sample size studies.

      The reviewer raises a very important point of limited sample size and of the methodology intrinsic of model development and testing. We acknowledge the small sample size in the “Limitations” section of the discussion. In the resubmission, we will acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we will also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site. Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on crossvalidation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. Finally, as discussed by Spisak et al., 1 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship” which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      Even if the performance was properly assessed, their models show AUCs between 0.65-0.70, which is usually considered as poor, and most likely without potential clinical use. Despite this, their conclusion was: "This biomarker is easy to obtain (~10 min 18 of scanning time) and opens the door for translation into clinical practice." One may ask who is really willing to use an MRI signature with a relatively poor performance that can be outperformed by self-report questionnaires?

      The reviewer is correct, the model performance is poor to fair which limits its usefulness for clinical translation. We wanted to emphasize that obtaining diffusion images can be done in a short period of time and, hence, as such models predictive accuracy improves, clinical translation becomes closer to reality. In addition, our findings are based on old diffusion data and limited sample size coming from different sites and different acquisition sequences. This by itself would limit the accuracy especially that evidence shows that sample size affect also model performance (i.e. testing AUC)1. In the revision, we will re-word the sentence mentioned by the reviewer to reflect the points discussed here. This also motivates us to collect a more homogeneous and larger sample.

      Overall, these criticisms are more about the wording sometimes used and the inference they made. I think the strength of the evidence is incomplete to support the main claims of the paper.

      Despite these limitations, I still think this is a very relevant contribution to the field. Showing predictive performance through cross-validation and testing in multiple cohorts is not an easy task and this is a strong effort by the team. I strongly believe this approach is the right one and I believe the authors did a good job.

      We thank the reviewer for acknowledging that our effort and approach were the right ones.

      Minor points:

      Methods:

      I get the voxel-wise analysis, but I don't understand the methods for the structural connectivity analysis between the 88 ROIs. Have the authors run tractography or have they used a predetermined streamlined form of 'population-based connectome'? They report that models of AUC above 0.75 were considered and tested in the Chicago dataset, but we have no information about what the model actually learned (although this can be tricky for decision tree algorithms).

      We apologize for the lack of clarity; we did run tractography and we did not use a predetermined streamlined form of the connectome. We will clarify this point in the methods section.

      Finding which connections are important for the classification of SBPr and SBPp is difficult because of our choices during data preprocessing and SVC model development: (1) preprocessing steps which included TNPCA for dimensionality reduction, and regressing out the confounders (i.e., age, sex, and head motion); (2) the harmonization for effects of sites; and (3) the Support Vector Classifier which is a hard classification model2. Such models cannot tell us the features that are important in classifying the groups. Our model is considered a black-box predictive model like neural networks.

      Minor:

      What results are shown in Figure 7? It looks more descriptive than the actual results.

      The reviewer is correct; Figure 7 and supplementary Figure 4 are both qualitatively illustrating the shape of the SLF.

      Reviewer #2 (Public Review):

      The present study aims to investigate brain white matter predictors of back pain chronicity. To this end, a discovery cohort of 28 patients with subacute back pain (SBP) was studied using white matter diffusion imaging. The cohort was investigated at baseline and one-year follow-up when 16 patients had recovered (SBPr) and 12 had persistent back pain (SBPp). A comparison of baseline scans revealed that SBPr patients had higher fractional anisotropy values in the right superior longitudinal fasciculus SLF) than SBPp patients and that FA values predicted changes in pain severity. Moreover, the FA values of SBPr patients were larger than those of healthy participants, suggesting a role of FA of the SLF in resilience to chronic pain. These findings were replicated in two other independent datasets. The authors conclude that the right SLF might be a robust predictive biomarker of CBP development with the potential for clinical translation.

      Developing predictive biomarkers for pain chronicity is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are convincing, and the interpretation is adequate. A particular strength of the study is the discovery-replication approach with replications of the findings in two independent datasets.

      We thank reviewer 2 for pointing to the strength of our study.

      The following revisions might help to improve the manuscript further.

      Definition of recovery. In the New Haven and Chicago datasets, SBPr and SBPp patients are distinguished by reductions of >30% in pain intensity. In contrast, in the Mannheim dataset, both groups are distinguished by reductions of >20%. This should be harmonized. Moreover, as there is no established definition of recovery (reference 79 does not provide a clear criterion), it would be interesting to know whether the results hold for different definitions of recovery. Control analyses for different thresholds could strengthen the robustness of the findings.

      The reviewer raises an important point regarding the definition of recovery. To address the reviewers concern we will add a supplementary figure showing the results in the Mannheim data set if a 30% reduction is used as a recovery criterion. We would like to emphasize here several points that support the use of different recovery thresholds between New Haven and Mannheim. The New Haven primary pain ratings relied on visual analogue scale (VAS) while the Mannheim data relied on the German version of the West-Haven-Yale Multidimensional Pain Inventory. In addition, the Mannheim data was pre-registered with a definition of recovery at 20% and is part of a larger sub-acute to chronic pain study with prior publications from this cohort using the 20% cut-off3. Finally, a more recent consensus publication4 from IMMPACT indicates that a change of at least 30% is needed for a moderate improvement in pain on the 0-10 Numerical Rating Scale but that this percentage depends on baseline pain levels.

      Analysis of the Chicago dataset. The manuscript includes results on FA values and their association with pain severity for the New Haven and Mannheim datasets but not for the Chicago dataset. It would be straightforward to show figures like Figures 1 - 4 for the Chicago dataset, as well.

      We welcome the reviewer’s suggestion; we will therefore add these analyses to the results section of our manuscript upon resubmission

      Data sharing. The discovery-replication approach of the present study distinguishes the present from previous approaches. This approach enhances the belief in the robustness of the findings. This belief would be further enhanced by making the data openly available. It would be extremely valuable for the community if other researchers could reproduce and replicate the findings without restrictions. It is not clear why the fact that the studies are ongoing prevents the unrestricted sharing of the data used in the present study.

      Reviewer #3 (Public Review):

      Summary:

      Authors suggest a new biomarker of chronic back pain with the option to predict the result of treatment. The authors found a significant difference in a fractional anisotropy measure in superior longitudinal fasciculus for recovered patients with chronic back pain.

      Strengths:

      The results were reproduced in three different groups at different studies/sites.

      Weaknesses:

      The number of participants is still low.

      We have discussed this point in our replies to reviewer number 1.

      An explanation of microstructure changes was not given.

      The reviewer points to an important gap in our discussion. While we cannot do a direct study of actual tissue micro-structure, we will explore further the changes observed in the SLF by calculating diffusivity measures and discuss possible explanations of these changes.

      Some technical drawbacks are presented.

      We are uncertain if the reviewer is suggesting that we have acknowledged certain technical drawbacks and expects further elaboration on our part. We kindly request that the reviewer specify what particular issues they would like us to address so that we can respond appropriately.

      (1) Spisak T, Bingel U, Wager TD. Multivariate BWAS can be replicable with moderate sample sizes. Nature 2023;615:E4-E7.

      (2) Liu Y, Zhang HH, Wu Y. Hard or Soft Classification? Large-margin Unified Machines. J Am Stat Assoc 2011;106:166-177.

      (3) Loffler M, Levine SM, Usai K, et al. Corticostriatal circuits in the transition to chronic back pain: The predictive role of reward learning. Cell Rep Med 2022;3:100677.

      (4) Smith SM, Dworkin RH, Turk DC, et al. Interpretation of chronic pain clinical trial outcomes: IMMPACT recommended considerations. Pain 2020;161:2446-2461.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper, Misic et al showed that white matter properties can be used to classify subacute back pain patients that will develop persisting pain.

      Strengths:

      Compared to most previous papers studying associations between white matter properties and chronic pain, the strength of the method is to perform a prediction in unseen data. Another strength of the paper is the use of three different cohorts. This is an interesting paper that provides a valuable contribution to the field.

      Weaknesses:

      The authors imply that their biomarker could outperform traditional questionnaires to predict pain: "While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain and provide easy-to-use brief questionnaires-based tools, (21, 25) parameters often explain no more than 30% of the variance (28-30) and their prognostic accuracy is limited.(31)". I don't think this is correct; questionnaire-based tools can actually achieve far greater prediction than their model in about half a million individuals from the UK Biobank (Tanguay-Sabourin et al., A prognostic risk score for the development and spread of chronic pain, Nature Medicine 2023).

      Moreover, the main weakness of this study is the sample size. It remains small despite having 3 cohorts. This is problematic because results are often overfitted in such a small sample size brain imaging study, especially when all the data are available to the authors at the time of training the model (Poldrack et al., Scanning the horizon: towards transparent and reproducible neuroimaging research, Nature Reviews in Neuroscience 2017). Thus, having access to all the data, the authors have a high degree of flexibility in data analysis, as they can retrain their model any number of times until it generalizes across all three cohorts. In this case, the testing set could easily become part of the training making it difficult to assess the real performance, especially for small sample size studies.

      Even if the performance was properly assessed, their models show AUCs between 0.65-0.70, which is usually considered as poor, and most likely without potential clinical use. Despite this, their conclusion was: "This biomarker is easy to obtain (~10 min 18 of scanning time) and opens the door for translation into clinical practice." One may ask who is really willing to use an MRI signature with a relatively poor performance that can be outperformed by self-report questionnaires?

      Overall, these criticisms are more about the wording sometimes used and the inference they made. I think the strength of the evidence is incomplete to support the main claims of the paper.

      Despite these limitations, I still think this is a very relevant contribution to the field. Showing predictive performance through cross-validation and testing in multiple cohorts is not an easy task and this is a strong effort by the team. I strongly believe this approach is the right one and I believe the authors did a good job.

      Minor points:

      Methods:

      I get the voxel-wise analysis, but I don't understand the methods for the structural connectivity analysis between the 88 ROIs. Have the authors run tractography or have they used a predetermined streamlined form of 'population-based connectome'? They report that models of AUC above 0.75 were considered and tested in the Chicago dataset, but we have no information about what the model actually learned (although this can be tricky for decision tree algorithms).

      Minor:<br /> What results are shown in Figure 7? It looks more descriptive than the actual results.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      One important question is needed to further clarify the mechanisms of aberrant Ca2+ microwaves as described below.

      Synapsin promoter labels both excitatory pyramidal neurons and inhibitory neurons. To avoid aberrant Ca2+ microwave, a combination of Flex virus and CaMKII-Cre or Thy-1-GCaMP6s and 6f mice were tested. However, all these approaches limit the number of infected pyramidal neurons. While the comprehensive display of these results is appreciated, a crucial question remains unanswered. To distinguish whether the microwave of Ca2+ is caused selectively via the abnormality of interneurons, or just a matter of pyramidal neuron density, testing Flex-GCaMP6 in interneuron specific mouse lines such as PV-Cre and SOM-Cre will be critical.

      We agree that unravelling the role of interneurons is important to the understanding of the cellular mechanisms. However, the primary goal of this preprint was to alert the field and those embarking on in vivo Ca2+ imaging to AAV transduction induced artefacts mediated by one of the most widely used viral constructs for Ca2+ imaging in the field. It was important to us to distribute this finding among the community in a timely manner to avoid the unnecessary waste of resources.

      We consider a thorough understanding of cell-type specific mechanisms interesting. However, the biological relevance of the Ca2+ waves is as yet unclear and to disentangle exactly which cellular and subcellular factors that drive the aberrant phenomenon will require a large systematic effort which goes beyond our resources. For instance, it will be technically not trivial to separate biologically relevant contributions from technical differences. For instance, the absence of Ca2+ waves under the principal neuron promotor CaMKII may suggest the involvement of interneurons. However, alternate possibilities are a reduced density of expression across principal neurons or that the expression levels between the 2 promoters is different.

      The important, take-home message of the preprint, in our opinion, is that users check carefully their viral protocols, adjust the protocols for their specific scientific question and report any issues. We now emphasise the fact that although Ca2+ waves were not observed following conditional expression of syn.GCaMP with CaMKII.cre, this may not be due to a requirement for interneuronal expression but simply reflect differences in final GCaMP expression density and levels between the two transduction procedures (P12, L298-303).

      Reviewer #2 (Public Review):

      Weaknesses:

      Whether micro-waves are associated with the age of mice was not quantified. This would be good to know and the authors do have this data.

      We plotted the animal age at the time of injection for all injections of Syn.GCaMP6 into CA1/CA3 and found no correlation in either the occurrence of Ca2+ waves nor the frequency of Ca2+ waves during the age period between 5 – 79 wks (see reviewer Fig1; linear regression fit to the Ca2+ wave frequency against age was not significant: intercept = 1.37, slope = -0.007, p=0.62, n = 14; and generalized linear model relating Ca2+ wave ~ age was not significant: z score = 0.19, deviance above null = 0.04, p = 0.85, n=24). We have now added a statement to this in the revised manuscript (P14 L354-359) and for the reviewers we have added the plots below.

      Author response image 1.

      Plot of Ca2+ micro-wave frequency (left: number of Ca2+ waves/min) or occurrence (right: yes/no) against the animal age at the time of viral injection. Blue line is linear (left) or logistic (right) fit to the data with 95% confidence level.

      The effect of micro-waves on single cell function was not analyzed. It would be useful, for example, if we knew the influence of micro-waves on place fields. Can a place cell still express a place field in a hippocampus that produces micro-waves? What effect might a microwave passing over a cell have on its place field? Mice were not trained in these experiments, so the authors do not have the data.

      We agree that these are interesting questions; however, the preprint is focused on describing the GECI expression conditions prone to generating these artefacts. Studying the effects of Ca2+ micro-waves on the circuitry are scientific questions, and would require an experimental framework of testing the aberrant activity on a specific physiological function e.g. place activity or specific oscillations (e.g. sharp-wave activity). Ca2+ microwaves, as the ones described here, have not been reported under physiological conditions or pathophysiological conditions and studying the effects of such artefactual waves on the circuit was not our intention.

      With respect to place cell activity, specifically, it is intuitive that during the Ca2+ micro-wave the participating cell’s place field activity would be obscured by the artefactual activity. Cell activity appears to return immediately following the wave suggesting that the cells could exhibit place activity outside their participation in the Ca2+ micro-waves. However, we do not know if the Ca2+ micro-wave activity disrupts the generation or maintenance of place fields. We have now added a brief reference to possible effects on place coding to the paper (P12, L315-317).

      The CaMKII-Cre approach for flexed-syn-GCaMP expression shows no micro-waves and is convincing, but it is only from 2 animals, even though both had no micro-waves. In light of the reviewer’s comment, we have added a further 3 animals with conditional expression of GCaMP6m from the DZNE to complement the current dataset with conditional expression of GCaMP6s from UoB (P10, L236 & 239 and revised table 1). Although Ca2+ waves were not observed in any of the in total 5 animals, we still do not know with all certainty whether this approach is completely safe. Time will show if researchers still encounter the phenotype under certain conditions when using this conditional approach.

      The authors state in their Discussion that even without observable microwaves, a syn-Ca2+-indicator transduction strategy could still be problematic. This may be true, but they do not check this in their analysis, so it remains unknown

      We agree with the reviewer and have now made this point clearer in the revised discussion (P11, L257-258)

      Reviewer #3 (Public Review):

      Weaknesses:

      I believe that the weaknesses of the manuscript are appropriately highlighted by the authors themselves in the discussion. I would, however, like to emphasize several additional points.

      As the authors state, the exact conditions that lead to Ca2+ micro-waves are unclear from this manuscript. It is also unclear if Ca2+ micro-waves are specific to GECI expression or if high-titer viral transduction of other proteins such as genetically encoded voltage indicators, static fluorescent proteins, recombinases, etc could also cause Ca2+ micro-waves.

      The high expression of other proteins has been shown to result in artefactual phenomenon such as toxicity or fluorescent puncta (for GFP see Hechler et al. 2006; Katayama et al. 2008 for GEVI see Rühl et al. 2021), but we are not aware of reports of micro-waves. Although it is certainly possible that high expression levels of other proteins could lead to waves, we suspect the Ca2+ micro-waves observed in this preprint result from a dysregulation of Ca2+ homeostasis. This is not to suggest that voltage indicators could not result in micro-waves (e.g. Ca2+ homeostasis may be indirectly affected).

      The authors almost exclusively tested high titer (>5x10^12 vg/mL) large volume (500-1000 nL) injections using the synapsin promoter and AAV1 serotypes. It is possible that Ca2+ micro-waves are dramatically less frequent when titers are lowered further but still kept high enough to be useful for in vivo imaging (e.g. 1x10^12 vg/mL) or smaller injection volumes are used. It is also possible that Ca2+ micro-waves occur with high titer injections using other viral promoter sequences such as EF1α or CaMKIIα. There may additionally be effects of viral serotype on micro-wave occurrence.

      We agree with all points raised by the reviewer. Notably, we used viral transduction protocols with titers and volumes within in the range of those previously used for viral transduction of GCaMP under the synapsin promoter (see P11 L269-275) and we observed Ca2+ micro-waves. As the reviewer suggested, we did find that lowering the titer is an important factor in reducing these Ca2+ micro-waves and there is likely a wide range of approaches that avoid the phenomenon. With regards to viral serotype, we show that micro-waves occurred across AAV1 and 9, but it is possible that other serotypes may avoid the phenomenon.

      We reiterate in the abstract of the revised manuscript that expression level is a crucial factor (P2, L40 and P2, L44-45) and now mention that other promoters and induction protocols that result in high Ca2+ indicator expression may result in Ca2+ micro-waves (P12, L291-294.

      The number of animals in any particular condition are fairly low (Table 1) with the exception of V1 imaging and thy1-GCaMP6 imaging. This prohibits rigorous comparison of the frequency of pathological calcium activity across conditions.

      We have now added 3 more animals with conditional GCaMP6 expression. In total, the study contains 34 animals with viral injection into the hippocampus from different laboratories and under different conditions resulting in multiple groups. As such we are cognizant of the resulting limitations for statistical evaluation.

      However, in light of the reviewer’s comment, we have now employed a generalized linear model tested on all the data to examine the relationship between the Ca2+ micro-wave incidence and the different factors. The multivariate GLM did find a significant relationship between Ca2+ micro-wave incidence and both viral dilution and weeks post injection (see below and revised manuscript P8, L189-193).

      For injections into CA1 in the hippocampus (n=28), a GLM found no relationship between Ca2+ micro-waves and each of the individual variables x (Ca-wave ~ x) ; viral dilution: z score = 1.14, deviance above null = 1.31, p = 0.254; post injection weeks: : z score = 1.18, deviance above null = 1.44, p = 0.239; injection volume: : z score = -0.76, deviance above null = 0.59, p = 0.45; construct: : z score = 1.18, difference in deviance above null = 1.44, p = 0.239)

      However, a multivariable logistic GLM relating dilution and post injection weeks (Ca-wave ~ dilution + p.i_wks) showed that together both variables were significantly related to Ca2+ micro-waves (Deviation above null = 7.5; Dilution: z score = 2.18, p < 0.05; p.i_wks : z score = 2.22, p < 0.05).

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Results are straightforward and convincing. While a couple of ways to reduce the aberrant microwaves of calcium responses were demonstrated, delving into the functions of interneurons is crucial for a more comprehensive understanding of cellular causality.

      As mentioned in the public response, disentangling cellular mechanism from technical requirements will need a large and systematic study. To determine the contribution from interneurons, the use of specific interneuron promoters would be required, and viral titers systematically varied to result in similar cellular GCaMP expression levels as seen under the synapsin promoter condition.

      Reviewer #2 (Recommendations For The Authors):

      Do the authors think the cells are firing when they participate in a micro-wave, or do they think the calcium influx is due to something else? A discussion point on this would be good.

      This is an excellent point raised by the reviewer. We do not know if the elevated cellular Ca2+ during the artifactual Ca2+ micro-wave reflects action potential firing or an increase of Ca2+ from intracellular stores. As already described in the text of the preprint, their optical spatiotemporal profile neither fits with known microseizure progression patterns, nor with spreading depolarization/depression. We have adopted the reviewer’s suggestion and added the following point to the discussion section in the revised preprint (P12, L308-315):

      In a limited dataset, we attempted to detect the Ca2+ micro-waves by hippocampal LFP recordings (using a conventional insulated Tungsten wire, diameter ~110µm). We could not identify a specific signature, e.g. ictal activity or LFP depression, which may correspond to these Ca2+ micro-waves. The crucial shortcoming of this experiment of course is that with these LFP recordings, we could not simultaneous perform hippocampal 2-photon microscopy. Thus, it is uncertain if the Ca2+ micro-waves indeed occurred in proximity to our electrode.

      The results seem to suggest that micro-waves may involve interneurons as their CaMKII-Cre strategy avoids waves - possibly due to a lack of expression of GECIs in interneurons. It would be great to hear the author's thoughts on this and add a brief discussion point.

      As mentioned in public response to Reviewer 1, it is difficult to disentangle cellular mechanisms from technical requirements, and the exact requirements for the Ca2+ micro-waves to occur are still not fully clear. The absence of Ca2+ micro-waves in our CaMKII-Cre dataset may indeed reflect the requirement of interneurons. However, it could just as well be due to a sparse labelling of principle cells or simply reflect differences in the expression levels of GCaMP under the different promotors.

      All in all, a more complete understanding of the requirements of such Ca2+ micro-waves will require a community effort. Therefore, it is important that each group check the safety profile of their GECI and report problems to the community.

      We have added these points to the revised preprint (P12, L291 and P12, L298)

      Plotting the incidence of micro-waves as a function of the age of mice would be a nice addition (the authors have the data).

      There was no relationship of Ca2+ micro-wave occurrence or frequency with age over the range of 5-79 wks (see public response) and this has been added to the preprint (P14, L354)

      Reviewer #3 (Recommendations For The Authors):

      I appreciate the authors raising the awareness of this issue. I had personally observed micro-waves in my own data as well. In agreement with their findings, I found that the occurrence of micro-waves was dramatically lower when I reduced the viral titer. Anecdotally, I also observed voltage micro-waves when virally transducing genetically encoded voltage indicators at similar titers. For that reason, I am skeptical that this issue is exclusive to GECIs.

      We find it interesting that the reviewer has also seen artefactual micro-waves following viral transduction of genetically encoded voltage indicators. Without seeing the voltage waves the referee is referring to or the conditions, it is of course difficult to compare with the Ca2+ micro-waves we report. However, this comment again raises the question of mechanism. We believe that in the GECI framework, Ca2+ homeostatic aspects are important. Voltage indicators are based on different sensor mechanisms, and expressed in the cell membrane, but it may very well be that there are overlapping factors between Ca2+ and voltage indicators that could trigger a similar, or even the same phenomenon in the end.

      Minor comments:

      (1) Line 131-132: I believe the authors only tested for micro-waves in V1. This should be made clear in the results. It could be that micro-waves could occur in other parts of cortex with the same viral titers.

      Both V1 and somatosensory cortex were tested as described in the methods (P15, L395-397), we have made this clearer in the revised preprint (P6, L138).

      (2) There are no statistics associated with the data from Fig 1e.

      We have now added statistics (P5, L126).

      (3) The authors may be able to make a stronger claim about the pathological nature of the micro-waves if there are differences in the histology between the injected and non-injected hemispheres. For example, is there evidence of widespread cell death in the injected hemisphere (e.g. lower cell count, smaller hippocampal volume, caspase staining, etc).

      We found no evidence of gross morphological changes to the hippocampus following viral transduction with no changes in CA1 pyramidal cell layer thickness or CA1 thickness (pyramidal cell layer thickness: 49 ± 12.5 µm ipsilateral and 50.3 ± 11.1 µm contralateral, n=4, Student’s t-test p=0.89; CA1 thickness: 553.3 ± 14 µm ipsilateral and 555.8 ± 62 µm contralateral, n = 4, Student’s t-test p=0.94; 48 ± 13 weeks post injection at time of perfusion).

      We have added this to the preprint (P5, L117-122)

      (4) The broader micro-waves in the stratum oriens versus the stratum pyramidale are likely due to the spread of the basal dendrites of pyramidal cells. If the typical size of the basal dendritic arbor of CA1 pyramidal neurons is taken into account, does this explain the wider calcium waves in this layer.

      Absolutely, great point, yes, we completely agree on this. It is likely the active neuropil (including dendritic arbour) are contributing to the apparent broader diameter. In addition, as evident in the video 5 cell somata in the stratum Oriens (possibly interneurons) are active and their processes also contribute.

      We have now mentioned these points in the revised preprint (P5, L132)

      (5) Lines 179-181: Is the difference in the prevalence of micro-waves between viral titers statistically significant?

      Although we have a large number of animals in total (n=34) with viral injection into the hippocampus, the number of animals in each condition, given the many factors, is low. We therefore used a generalized linear model to test the relationship between the Ca2+ micro-waves and the variables.

      We have now added this analysis to the revised preprint (P8, L189-193)

      (6) Lines 200-203: The CA3 micro-waves were only observed at one institution. The current wording is slightly misleading.

      We agree and have changed this to be clearer (P9 L216)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Methods, please state the sex of the mice.

      This has now been added to the methods section:

      “Three to nine month old Thy1-GCaMP6S mice (Strain GP4.3, Jax Labs), N=16 stroke (average age: 5.4 months; 13 male, 3 female), and 5 sham (average age: 6 months; 3 male, 2 female), were used in this study.”

      (2) The analysis in Fig 3B-D, 4B-C, and 6A, B highlights the loss of limb function, firing rate, or connections at 1 week but this phenomenon is clearly persisting longer in some datasets (Fig. 3 and 6). Was there not a statistical difference at weeks 2,3,4,8 relative to "Pre-stroke" or were comparisons only made to equivalent time points in the sham group? Personally, I think it is useful to compare to "pre-stroke" which should be more reflective of that sample of animals than comparing to a different set of animals in the Sham group. A 1 sample t-test could be used in Fig 4 and 6 normalized data.

      On further analysis of our datasets, normalization throughout the manuscript was unnecessary for proper depiction of results, and all normalized datasets have been replaced with nonnormalized datasets. All within group statistics are now indicated within the manuscript.

      (3) Fig 4A shows a very striking change in activity that doesn't seem to be borne out with group comparisons. Since many neurons are quiet or show very little activity, did the authors ever consider subgrouping their analysis based on cells that show high activity levels (top 20 or 30% of cells) vs those that are inactive most of the time? Recent research has shown that the effects of stroke can have a disproportionate impact on these highly active cells versus the minimally active ones.

      A qualitative analysis supports a loss of cells with high activity at the 1-week post-stroke timepoint, and examination of average firing rates at 1-week shows reductions in the animals with the highest average rates. However, we have not tracked responses within individual neurons or quantitatively analyzed the data by subdividing cells into groups based on their prestroke activity levels. We have amended the discussion of the manuscript with the following to highlight the previous data as it relates to our study:

      “Recent research also indicates that stroke causes distinct patterns of disruption to the network topology of excitatory and inhibitory cells [73], and that stroke can disproportionately disrupt the function of high activity compared to low activity neurons in specific neuron sub-types [61]. Mouse models with genetically labelled neuronal sub-types (including different classes of inhibitory interneurons) could be used to track the function of those populations over time in awake animals.”

      (4) Fig 4 shows normalized firing rates when moving and at rest but it would be interesting to know what the true difference in activity was in these 2 states. My assumption is that stroke reduces movement therefore one normalizes the data. The authors could consider putting non-normalized data in a Supp figure, or at least provide a rationale for not showing this, such as stating that movement output was significantly suppressed, hence the need for normalization.

      On further analysis of our datasets, normalization throughout the manuscript was unnecessary for proper depiction of results, and all normalized datasets have been replaced with nonnormalized datasets.

      (5) One thought for the discussion. The fact that the authors did not find any changes in "distant" cortex may be specific to the region they chose to sample (caudal FL cortex). It is possible that examining different "distant" regions could yield a different outcome. For example, one could argue that there may have been no reason for this area to "change" since it was responsive to FL stimuli before stroke. Further, since it was posterior to the stroke, thalamocortical projects should have been minimally disturbed.

      We would like to thank the reviewer for this comment. We have amended the discussion with the following:

      “Our results suggest a limited spatial distance over which the peri-infarct somatosensory cortex displays significant network functional deficits during movement and rest. Our results are consistent with a spatial gradient of plasticity mediating factors that are generally enhanced with closer proximity to the infarct core [84,88,90,91]. However, our analysis outside peri-infarct cortex is limited to a single distal area caudal to the pre-stroke cFL representation. Although somatosensory maps in the present study were defined by a statistical criterion for delineating highly responsive cortical regions from those with weak responses, the distal area in this study may have been a site of activity that did not meet the statistical criterion for inclusion in the baseline map. The lack of detectable changes in population correlations, functional connectivity, assembly architecture and assembly activations in the distal region may reflect minimal pressure for plastic change as networks in regions below the threshold for regional map inclusion prior to stroke may still be functional in the distal cortex. Thus, threshold-based assessment of remapping may further overestimate the neuroplasticity underlying functional reorganization suggested by anaesthetized preparations with strong stimulation. Future studies could examine distal areas medial and anterior to the cFL somatosensory area, such as the motor and pre-motor cortex, to further define the effect of FL targeted stroke on neuroplasticity within other functionally relevant regions. Moreover, the restriction of these network changes to peri-infarct cortex could also reflect the small penumbra associated with photothrombotic stroke, and future studies could make use of stroke models with larger penumbral regions, such as the middle cerebral artery occlusion model. Larger injuries induce more sustained sensorimotor impairment, and the relationship between neuronal firing, connectivity, and neuronal assemblies could be further probed relative to recovery or sustained impairment in these models.”

      Minor comments:

      Line 129, I don't necessarily think the infarct shows "hyper-fluorescence", it just absorbs less white light (or reflects more light) than blood-rich neighbouring regions.

      Sentence in the manuscript has been changed to:

      “Resulting infarcts lesioned this region, and borders could be defined by a region of decreased light absorption 1 week post-stroke (Fig 1D, Top).”

      Line 130-132: the authors refer to Fig 1D to show cellular changes but these cannot be seen from the images presented. Perhaps a supplementary zoomed-in image would be helpful.

      As changes to the morphology of neurons are not one of the primary objectives of this study, and sampled resolution was not sufficiently high to clearly delineate the processes of neurons necessary for morphological assessment, we have amended the text as follows:

      “Within the peri-infarct imaging region, cellular dysmorphia and swelling was visually apparent in some cells during two photon imaging 1-week after stroke, but recovered over the 2 month poststroke imaging timeframe (data not shown). These gross morphological changes were not visually apparent in the more distal imaging region lateral to the cHL.”

      Lines 541-543, was there a rationale for defining movement as >30mm/s? Based on a statistical estimate of noise?

      Text has been altered as follows:

      “Animal movement within the homecage during each Ca2+ imaging session was tracked to determine animal speed and position. Movement periods were manually annotated on a subset of timeseries by co-recording animal movement using both the Mobile Homecage tracker, as well as a webcam (Logitech C270) with infrared filter removed. Movement tracking data was low pass filtered to remove spurious movement artifacts lasting below 6 recording frames (240ms). Based on annotated times of animal movement from the webcam recordings and Homecage tracking, a threshold of 30mm/s from the tracking data was determined as frames of animal movement, whereas speeds below 30mm/s was taken as periods of rest.”

      Lines 191-195: Note that although the finding of reduced neural activity is in disagreement with a multi-unit recording study, it is consistent with other very recent single-cell Ca++ imaging data after stroke (PMID: 34172735 , 34671051).

      Text has been altered as follows:

      “These results indicate decreased neuronal spiking 1-week after stroke in regions immediately adjacent to the infarct, but not in distal regions, that is strongly related to sensorimotor impairment. This finding runs contrary to a previous report of increased spontaneous multi-unit activity as early as 3-7 days after focal photothrombotic stroke in the peri-infarct cortex [1], but is in agreement with recent single-cell calcium imaging data demonstrating reduced sensoryevoked activity in neurons within the peri-infarct cortex after stroke [60,61].”

      Fig 7. I don't understand what the color code represents. Are these neurons belonging to the same assembly (or membership?).

      That is correct, neurons with identical color code belong to the same assembly. The legend of Fig 7 has been modified as follows to make this more explicit:

      “Fig 7. Color coded neural assembly plots depict altered neural assembly architecture after stroke in the peri-infarct region. (A) Representative cellular Ca2+ fluorescence images with neural assemblies color coded and overlaid for each timepoint. Neurons belonging to the same assembly have been pseudocolored with identical color. A loss in the number of neural assemblies after stroke in the peri-infarct region is visually apparent, along with a concurrent increase in the number of neurons for each remaining assembly. (B) Representative sham animal displays no visible change in the number of assemblies or number of neurons per assembly.”

      Reviewer #2 (Recommendations For The Authors):

      Materials and methods

      Identification of forelimb and hindlimb somatosensory cortex representations [...] Cortical response areas are calculated using a threshold of 95% peak activity within the trial. The threshold is presumably used to discriminate between the sensory-evoked response and collateral activation / less "relevant" response (noise). Since the peak intensity is lower after stroke, the "response" area is larger - lower main signal results in less noise exclusion. Predictably, areas that show a higher response before stroke than after are excluded from the response area before stroke and included after. While it is expected that the remapped areas will exhibit a lower response than the original and considering the absence of neuronal activity, assembly architecture, or functional connectivity in the "remapped" regions, a minimal criterion for remapping should be to exhibit higher activation than before stroke. Please use a different criterion to map the cortical response area after stroke.

      We would like to thank the reviewer for this comment. We agree with the reviewer’s assessment of 95% of peak as an arbitrary criterion of mapped areas. To exclude noise from the analysis of mapped regions, a new statistical criterion of 5X the standard deviation of the baseline period was used to determine the threshold to use to define each response map. These maps were used to determine the peak intensity of the forelimb response. We also measured a separate ROI specifically overlapping the distal region, lateral to the hindlimb map, to determine specific changes to widefield Ca2+ responses within this distal region. We have amended the text as follows and have altered Figure 2 with new data generated from our new criterion for cortical mapping.

      “The trials for each limb were averaged in ImageJ software (NIH). 10 imaging frames (1s) after stimulus onset were averaged and divided by the 10 baseline frames 1s before stimulus onset to generate a response map for each limb. Response maps were thresholded at 5 times the standard deviation of the baseline period deltaFoF to determine limb associated response maps. These were merged and overlaid on an image of surface vasculature to delineate the cFL and cHL somatosensory representations and were also used to determine peak Ca2+ response amplitude from the timeseries recordings. For cFL stimulation trials, an additional ROI was placed over the region lateral to the cHL representation (denoted as “distal region” in Fig 2E) to measure the distal region cFL evoked Ca2+ response amplitude pre- and post-stroke. The dimensions and position of the distal ROI was held consistent relative to surface vasculature for each animal from pre- to post-stroke.”

      Animals

      Mice used have an age that goes from 3 to 9 months. This is a big difference given that literature on healthy aging reports changes in neurovascular coupling starting from 8-9 months old mice. Consider adding age as a covariate in the analysis.

      We do not have sufficient numbers of animals within this study to examine the effect of age on the results observed herein. We have amended the discussion with the following to address this point:

      “A potential limitation of our data is the undefined effect of age and sex on cortical dynamics in this cohort of mice (with ages ranging from 3-9 months) after stroke. Aging can impair neurovascular coupling [102–107] and reduce ischemic tolerance [108–111], and greater investigation of cortical activity changes after stroke in aged animals would more effectively model stroke in humans. Future research could replicate this study with mice in middle-age and aged mice (e.g. 9 months and 18+ months of age), and with sufficient quantities of both sexes, to better examine age and sex effects on measures of cortical function.”

      Statistics

      Please describe the "normalization" that was applied to the firing rate. Since a mixedeffects model was used, why wasn't baseline simply added as a covariate? With this type of data, normalization is useful for visualization purposes.

      On further analysis of our datasets, normalization throughout the manuscript was unnecessary for the visualization of results, and all normalized datasets have been replaced with nonnormalized datasets. All within group comparisons are now indicated throughout the manuscript and in the figures.

      Introduction

      Line 93 awake, freely behaving but head-fixed. That's not freely. Should just say behaving.

      Sentence has been edited as follows:

      “We used awake, behaving but head-fixed mice in a mobile homecage to longitudinally measure cortical activity, then used computational methods to assess functional connectivity and neural assembly architecture at baseline and each week for 2 months following stroke.”

      110 - 112 The last part of this sentence is unjustified because these areas have been incorrectly identified as locations of representational remapping.

      We agree with the reviewer and have amended the manuscript as follows after re-analyzing the dataset on widefield Ca2+ imaging of sensory-evoked responses: “Surprisingly, we also show that significant alterations in neuronal activity (firing rate), functional connectivity, and neural assembly architecture are absent within more distal regions of cortex as little as 750 µm from the stroke border, even in areas identified by regional functional imaging (under anaesthesia) as ‘remapped’ locations of sensory-evoked FL activity 8-weeks post-stroke.”

      Results

      149-152 There is no observed increase in the evoked response area. There is an observed change in the criteria for what is considered a response.

      We agree with the reviewer. Text has been amended as follows:

      “Fig 2A shows representative montages from a stroke animal illustrating the cortical cFL and cHL Ca2+ responses to 1s, 100Hz limb stimulation of the contralateral limbs at the pre-stroke and 8week post-stroke timepoints. The location and magnitude of the cortical responses changes drastically between timepoints, with substantial loss of supra-threshold activity within the prestroke cFL representation located anterior to the cHL map, and an apparent shift of the remapped representation into regions lateral to the cHL representation at 8-weeks post-stroke. A significant decrease in the cFL evoked Ca2+ response amplitude was observed in the stroke group at 8-weeks post-stroke relative to pre-stroke (Fig 2B). This is in agreement with past studies [19–25], and suggests that cFL targeted stroke reduces forelimb evoked activity across the cFL somatosensory cortex in anaesthetized animals even after 2 months of recovery. There was no statistical change in the average size of cFL evoked representation 8-weeks after stroke (Fig 2C), but a significant posterior shift of the supra-threshold cFL map was detected (Fig 2D). Unmasking of previously sub-threshold cFL responsive cortex in areas posterior to the original cFL map at 8-weeks post-stroke could contribute to this apparent remapping. However, the amplitude of the cFL evoked widefield Ca2+ response in this distal region at 8-weeks post-stroke remains reduced relative to pre-stroke activation (Fig 2E). Previous studies suggest strong inhibition of cFL evoked activity during the first weeks after photothrombosis [25]. Without longitudinal measurement in this study to quantify this reduced activation prior to 8-weeks poststroke, we cannot differentiate potential remapping due to unmasking of the cFL representation that enhances the cFL-evoked widefield Ca2+ response from apparent remapping that simply reflects changes in the signal-to-noise ratio used to define the functional representations. There were no group differences between stroke and sham groups in cHL evoked intensity, area, or map position (data not shown).”

      A lot of the nonsignificant results are reported as "statistical trends towards..." While the term "trend" is problematic, it remains common in its use. However, assigning directionality to the trend, as if it is actively approaching a main effect, should be avoided. The results aren't moving towards or away from significance. Consider rewording the way in which these results are reported.

      We have amended the text to remove directionality from our mention of statistical trends.

      R squared and p values for significant results are reported in the "impaired performance on tapered beam..." and "firing rate of neurons in the peri-infarct cortex..." subsections of the results, but not the other sections. Please report the results in a consistent manner.

      R-squared and p-values have been removed from the results section and are now reported in figure captions consistently.

      Discussion

      288 Remapping is defined as "new sensory-evoked spiking". This should be the main criterion for remapping, but it is not operationalized correctly by the threshold method.

      With our new criterion for determining limb maps using a statistical threshold of 5X the standard deviation of baseline fluorescence, we have edited text throughout the manuscript to better emphasize that we may not be measuring new sensory-evoked spiking with the mesoscale mapping that was done. We have edited the discussion as follows:

      “Here, we used longitudinal two photon calcium imaging of awake, head-fixed mice in a mobile homecage to examine how focal photothrombotic stroke to the forelimb sensorimotor cortex alters the activity and connectivity of neurons adjacent and distal to the infarct. Consistent with previous studies using intrinsic optical signal imaging, mesoscale imaging of regional calcium responses (reflecting bulk neuronal spiking in that region) showed that targeted stroke to the cFL somatosensory area disrupts the sensory-evoked forelimb representation in the infarcted region. Consistent with previous studies, this functional representation exhibited a posterior shift 8-weeks after injury, with activation in a region lateral to the cHL representation. Notably, sensory-evoked cFL representations exhibited reduced amplitudes of activity relative to prestroke activation measured in the cFL representation and in the region lateral the cHL representation. Longitudinal two-photon calcium imaging in awake animals was used to probe single neuron and local network changes adjacent the infarct and in a distal region that corresponded to the shifted region of cFL activation. This imaging revealed a decrease in firing rate at 1-week post-stroke in the peri-infarct region that was significantly negatively correlated with the number of errors made with the stroke-affected limbs on the tapered beam task. Periinfarct cortical networks also exhibited a reduction in the number of functional connections per neuron and a sustained disruption in neural assembly structure, including a reduction in the number of assemblies and an increased recruitment of neurons into functional assemblies. Elevated correlation between assemblies within the peri-infarct region peaked 1-week after stroke and was sustained throughout recovery. Surprisingly, distal networks, even in the region associated with the shifted cFL functional map in anaesthetized preparations, were largely undisturbed.”

      “Cortical plasticity after stroke Plasticity within and between cortical regions contributes to partial recovery of function and is proportional to both the extent of damage, as well as the form and quantity of rehabilitative therapy post-stroke [80,81]. A critical period of highest plasticity begins shortly after the onset of stroke, is greatest during the first few weeks, and progressively diminishes over the weeks to months after stroke [19,82–86]. Functional recovery after stroke is thought to depend largely on the adaptive plasticity of surviving neurons that reinforce existing connections and/or replace the function of lost networks [25,52,87–89]. This neuronal plasticity is believed to lead to topographical shifts in somatosensory functional maps to adjacent areas of the cortex. The driver for this process has largely been ascribed to a complex cascade of intra- and extracellular signaling that ultimately leads to plastic re-organization of the microarchitecture and function of surviving peri-infarct tissue [52,80,84,88,90–92]. Likewise, structural and functional remodeling has previously been found to be dependent on the distance from the stroke core, with closer tissue undergoing greater re-organization than more distant tissue (for review, see [52]).”

      “Previous research examining the region at the border between the cFL and cHL somatosensory maps has shown this region to be a primary site for functional remapping after cFL directed photothrombotic stroke, resulting in a region of cFL and cHL map functional overlap [25]. Within this overlapping area, neurons have been shown to lose limb selectivity 1-month post-stroke [25]. This is followed by the acquisition of more selective responses 2-months post-stroke and is associated with reduced regional overlap between cFL and cHL functional maps [25]. Notably, this functional plasticity at the cellular level was assessed using strong vibrotactile stimulation of the limbs in anaesthetized animals. Our findings using longitudinal imaging in awake animals show an initial reduction in firing rate at 1-week post-stroke within the peri-infarct region that was predictive of functional impairment in the tapered beam task. This transient reduction may be associated with reduced or dysfunctional thalamic connectivity [93–95] and reduced transmission of signals from hypo-excitable thalamo-cortical projections [96]. Importantly, the strong negative correlation we observed between firing rate of the neural population within the peri-infarct cortex and the number of errors on the affected side, as well as the rapid recovery of firing rate and tapered beam performance, suggests that neuronal activity within the peri-infarct region contributes to the impairment and recovery. The common timescale of neuronal and functional recovery also coincides with angiogenesis and re-establishment of vascular support for peri-infarct tissue [83,97–100].”

      “Consistent with previous research using mechanical limb stimulation under anaesthesia [25], we show that at the 8-week timepoint after cFL photothrombotic stroke the cFL representation is shifted posterior from its pre-stroke location into the area lateral to the cHL map. Notably, our distal region for awake imaging was directly within this 8-week post-stroke cFL representation. Despite our prediction that this distal area would be a hotspot for plastic changes, there was no detectable alteration to the level of population correlation, functional connectivity, assembly architecture or assembly activations after stroke. Moreover, we found little change in the firing rate in either moving or resting states in this region. Contrary to our results, somatosensoryevoked activity assessed by two photon calcium imaging in anesthetized animals has demonstrated an increase in cFL responsive neurons within a region lateral to the cHL representation 1-2 months after focal cFL stroke [25]. Notably, this previous study measured sensory-evoked single cell activity using strong vibrotactile (1s 100Hz) limb stimulation under aneasthesia [25]. This frequency of limb stimulation has been shown to elicit near maximal neuronal responses within the limb-associated somatosensory cortex under anesthesia [101]. Thus, strong stimulation and anaesthesia may have unmasked non-physiological activity in neurons in the distal region that is not apparent during more naturalistic activation during awake locomotion or rest. Regional mapping defined using strong stimulation in anesthetized animals may therefore overestimate plasticity at the cellular level.”

      “Our results suggest a limited spatial distance over which the peri-infarct somatosensory cortex displays significant network functional deficits during movement and rest. Our results are consistent with a spatial gradient of plasticity mediating factors that are generally enhanced with closer proximity to the infarct core [84,88,90,91]. However, our analysis outside peri-infarct cortex is limited to a single distal area caudal to the pre-stroke cFL representation. Although somatosensory maps in the present study were defined by a statistical criterion for delineating highly responsive cortical regions from those with weak responses, the distal area in this study may have been a site of activity that did not meet the statistical criterion for inclusion in the baseline map. The lack of detectable changes in population correlations, functional connectivity, assembly architecture and assembly activations in the distal region may reflect minimal pressure for plastic change as networks in regions below the threshold for regional map inclusion prior to stroke may still be functional in the distal cortex. Thus, threshold-based assessment of remapping may further overestimate the neuroplasticity underlying functional reorganization suggested by anaesthetized preparations with strong stimulation. Future studies could examine distal areas medial and anterior to the cFL somatosensory area, such as the motor and pre-motor cortex, to further define the effect of FL targeted stroke on neuroplasticity within other functionally relevant regions. Moreover, the restriction of these network changes to peri-infarct cortex could also reflect the small penumbra associated with photothrombotic stroke, and future studies could make use of stroke models with larger penumbral regions, such as the middle cerebral artery occlusion model. Larger injuries induce more sustained sensorimotor impairment, and the relationship between neuronal firing, connectivity, and neuronal assemblies could be further probed relative to recovery or sustained impairment in these models. Recent research also indicates that stroke causes distinct patterns of disruption to the network topology of excitatory and inhibitory cells [73], and that stroke can disproportionately disrupt the function of high activity compared to low activity neurons in specific neuron sub-types [61]. Mouse models with genetically labelled neuronal sub-types (including different classes of inhibitory interneurons) could be used to track the function of those populations over time in awake animals. A potential limitation of our data is the undefined effect of age and sex on cortical dynamics in this cohort of mice (with ages ranging from 3-9 months) after stroke. Aging can impair neurovascular coupling [102–107] and reduce ischemic tolerance [108–111], and greater investigation of cortical activity changes after stroke in aged animals would more effectively model stroke in humans. Future research could replicate this study with mice in middle-age and aged mice (e.g. 9 months and 18+ months of age), and with sufficient quantities of both sexes, to better examine age and sex effects on measures of cortical function.”

      315 - 317 Remodelling is dependent on the distance from the stroke core, with closer tissue undergoing greater reorganization than more distant tissue. There is no evidence that the more distant tissue undergoes any reorganization at all.

      We agree with the reviewer that no remodelling is apparent in our distal area. We have removed reference to our study showing remodeling in the distal area, and have amended the text as follows:

      “Likewise, structural and functional remodeling has previously been found to be dependent on the distance from the stroke core, with closer tissue undergoing greater re-organization than more distant tissue (for review, see [52]).”

      412-414 The authors speculate that a strong stimulation under anaesthesia may unmask connectivity in distal regions. However, the motivation for this paper is that anaesthesia is a confounding factor. It appears to me that, given the results of this study, the authors should argue that the functional connectivity observed under anaesthesia may be spurious.

      The incorrect word was used here. We have corrected the paragraph of the discussion and amended it as follows:

      “Consistent with previous research using mechanical limb stimulation under anaesthesia [25], we show that at the 8-week timepoint after cFL photothrombotic stroke the cFL representation is shifted posterior from its pre-stroke location into the area lateral to the cHL map. Notably, our distal region for awake imaging was directly within this 8-week post-stroke cFL representation. Despite our prediction that this distal area would be a hotspot for plastic changes, there was no detectable alteration to the level of population correlation, functional connectivity, assembly architecture or assembly activations after stroke. Moreover, we found little change in the firing rate in either moving or resting states in this region. Contrary to our results, somatosensoryevoked activity assessed by two photon calcium imaging in anesthetized animals has demonstrated an increase in cFL responsive neurons within a region lateral to the cHL representation 1-2 months after focal cFL stroke [25]. Notably, this previous study measured sensory-evoked single cell activity using strong vibrotactile (1s 100Hz) limb stimulation under aneasthesia [25]. This frequency of limb stimulation has been shown to elicit near maximal neuronal responses within the limb-associated somatosensory cortex under anesthesia [101]. Thus, strong stimulation and anaesthesia may have unmasked non-physiological activity in neurons in the distal region that is not apparent during more naturalistic activation during awake locomotion or rest. Regional mapping defined using strong stimulation in anesthetized animals may therefore overestimate plasticity at the cellular level.”

      Figures

      Figure 1 and 2: Scale bar missing.

      Scale bars added to both figures.

      Figure 2: The representative image shows a drastic reduction of the forelimb response area, contrary to the general description of the findings. It would also be beneficial to see a graph with lines connecting the pre-stroke and 8-week datapoints.

      The data for Figure 2 has been re-analyzed using a new criterion of 5X the standard deviation of the baseline period for determining the threshold for limb mapping. Figure 2 and relevant manuscript and figure legend text has been amended. In agreement with the reviewers observation, there is no increase in forelimb response area, but instead a non-significant decrease in the average forelimb area.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Overall comments

      We are pleased by the reviewers' comments and appreciate their suggestions for improvements. In addition to correcting small typos throughout the manuscript, we have made the following additions or changes in response to reviewer comments and suggestions:

      1. New complementation experiments to verify the impacts of mgtA and PA4824 on bacterial fitness in fungal co-culture.
      2. New experiments to measure intracellular Mg2+ levels in corA or mgtE mutants to strengthen our conclusion that neither of these constitutive Mg2+ transporters is required for maintaining intracellular Mg2+ levels in co-culture.
      3. New experiments to confirm that the * cerevisiae mnr2D mutant does not have a fitness defect compared to WT in co-culture. This finding rules out the possibility that metabolic defects in the mnr2D mutant restore the fitness of bacterial mgtA* mutant in co-culture and strengthens our hypothesis that Mg2+ sequestration by fungal vacuole triggers Mg2+ nutritional competition with bacteria.
      4. Clarification of bacterial species we tested in our study as suggested by Reviewer #3.
      5. Revised discussion to highlight how our findings relate to any fungal-bacterial interaction both in ecological and infection contexts and any known role of mgtA in antibiotic susceptibility, as suggested by Reviewer #2. All changes in response to the reviewer's comments have been detailed in our point-by-point response (below).

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript investigates polymicrobial interactions between two clinically relevant species, Pseudomonas aeruginosa and Candida albicans. The findings that C. albicans mediates P. aeruginosa tolerance to antibiotics through sequestration of magnesium provides insight into a specific interaction at play between these two organisms, and the underlying mechanism. The manuscript is well composed and generally the claims throughout are supported by the provided evidence. As a result, most comments are either for clarification, or minor in nature.

      We thank the reviewer for their positive comments and their suggestions for improvement.

      Major comments:

      1) For their experiments, the authors frequently switch between 30C and 37C, but there is no rationale for why a specific temperature was used, or both were. E.g. some of the antibiotic survival assays, and fungal-bacterial co-culture assays were performed at both temperatures, while the colistin resistance, fitness competition and RNA sequencing were performed at 30C. Given the fact that the two organisms are both human pathogens and co-exist in human infections, it is not clear why 30C was used. The authors should provide clarity for why these two temperatures were used.

      We thank the reviewer for raising this point. Fungal-bacterial interactions occur in a range of temperatures in ecological contexts (e.g., in soil or on plants) or during infection in animal hosts. Both 30oC and 37oC degree temperatures are used in C. albicans studies whereas 37oC is most preferred for P. aeruginosa studies. By providing data from both temperatures for critical experiments, we demonstrate that our findings are not dependent on temperature. Our studies also allow for an easy comparison to previously published studies performed at both temperatures. We chose to screen initial co-culture conditions showing fungal antagonism at 30oC, as C. albicans cells can reach higher CFUs than at 37oC due to growth in the single-celled yeast form.

      We agree with the reviewer that 37oC is more physiologically relevant for conditions under which these two species coexist in animal hosts. Thus, we tested our findings of Mg2+ competition and antibiotic survival at 37oC.

      We now clarify our reasoning in the revised Materials and Methods section as follows: "We chose 30oC for the initial co-culture assays for two reasons. First, C. albicans cells reached higher CFU at 30oC than 37oC, which would impose a stronger competition with bacteria. Second, C. albicans cells form hyphae at 37oC, which can have multiple cells in one filament and thus confound CFU measurements. We further confirmed that our findings of Mg2+ competition are independent of temperatures by setting up co-culture assays at both 30oC and 37oC."

      2) Lines 184-191: It would be useful to measure intracellular Mg2+ (using the Mg sensor) in the corA and mgtB tn mutants in media as well as the fungal spent media, to provide stronger support for the claim that "MgtA is a key bacterial Mg2+ transporter that is highly induced under low Mg2+ conditions".

      We thank the reviewer for this suggestion. Based on our experiments, neither CorA or MgtE are induced (in RNA-seq analyses) nor required in co-culture (in Tn-seq analyses), suggesting neither is involved in Mg2+ competition with C. albicans. In contrast, MgtA is highly induced in co-culture. Loss of mgtA significantly reduces bacterial fitness in co-culture and intracellular Mg2+ levels only in C. albicans-spent BHI, but not fresh BHI. These results suggest that MgtA is the key Mg2+ transporter required for bacterial Mg2+ uptake and fitness in co-culture.

      Nevertheless, we agree with the reviewer that despite being constitutively expressed, CorA or MgtE might play an important role in importing Mg2+ in BHI and C. albicans-spent BHI. To test this possibility, we performed a new experiment suggested by the reviewer (now included in the revised manuscript) in which we measured intracellular Mg2+ levels in corA or mgtE loss-of-function mutants in BHI versus C. albicans-spent BHI, and compared them to intracellular Mg2+ levels in a mgtA loss-of-function mutant strain. We find that lack of either corA or mgtE does not significantly reduce bacterial Mg2+ levels in C. albicans-spent BHI compared to DmgtA mutant (Fig. S7C). Thus, our results strengthen our conclusion that MgtA is the key Mg2+ transporter that gram-negative bacteria use to overcome fungal-mediated Mg2+ sequestration.

      3) Line no. 276. Does the mnr2∆ S. cerevisiae mutant have a growth defect compared to the WT? This would test whether the effect of the mnr2 mutant on P. aeruginosa fitness is strictly due to Mg2 and not due to reduced growth or metabolism of the mutant.

      We agree with the possibility raised by the reviewer. In new experiments included with our revision as Figure S10, we find that the S. cerevisiae mnr2 deletion mutant exhibits similar CFU as WT in monoculture as well as co-culture. Thus, the rescuing effect of mnr2D is less likely due to reduced growth or metabolism.

      4) The authors use the term 'antibiotic resistance' throughout the manuscript. However, the assays they perform do not directly test for antibiotic resistance which is defined as the ability to grow at higher concentrations of antibiotics (e.g. as measured by MIC tests). The authors should rephrase their phenotype as antibiotic survival or antibiotic tolerance.

      We agree with the reviewer and thank them for raising this point. We replaced the phrase 'antibiotic resistance' with 'antibiotic survival' throughout the revised manuscript. We also accordingly changed our title to 'Widespread fungal-bacterial competition for magnesium lowers antibiotic susceptibility'

      5) Also, the authors have two different assays, both measuring survival in antibiotic, but one is called a colistin resistance assay (line 508) and the other a colistin survival assay (line 523). It's not obvious what is the difference between what is being assayed in the two experiments, except perhaps the growth phase of the cells when they are exposed to the antibiotic? The authors should explain the difference, and the rationale for using two different assays.

      We thank the reviewer for raising this point. In the revised manuscript, we explain the rationale of our two assays. The first assay measures the bacterial survival after colistin treatment in C. albicans-spent BHI, and the second measures the bacterial survival after colistin treatment in co-culture with C. albicans. We performed both assays because C. albicans-spent BHI mimics Mg2+-depleted conditions by C. albicans but might not represent all aspects of fungal presence in co-culture. To make sure our findings are consistent across these two experiments, we specify the difference in these two assays in the revised manuscript as the following: "Since fungal spent media cannot fully recapitulate fungal presence in co-culture conditions, we tested whether fungal co-culture also conferred increased colistin survival."

      Minor comments:

      • For almost all the figures, blue and orange dots are used for 'monoculture' and 'coculture' respectively, while orange and black dots are used for WT and the mgtA mutant. However, the black and blue dots are hard to tell apart, and for several figure sub-panels, the legends are not provided (e.g. figures 2D, 2F, S9H), making it a little confusing to figure out what is being shown. It would be best if the WT and mgtA symbols were in colors completely different from the monoculture/co-culture colors, making it easier to tell those apart.

      We have updated these figures as the reviewer suggested.

      Line no 122 and Figure 1A. The term "defense genes" in bacteria typically refers to genes conferring protection against phage infections. Perhaps the authors can use a different term (e.g. 'protective genes').

      We agree with the reviewer. We have changed "defense genes" to "fungal-defense genes" to disambiguate the terms.

      Line no 186. 'However, neither MgtA...' should be 'However, neither MgtE...'

      We thank the reviewer for pointing out this typo. We have fixed this in our revision.

      Line no 268. Does fungal-mediated Mg2+ competition extend to Gram positive bacteria?

      We thank the reviewer for raising this interesting point. MgtA is prevalent in diverse gram-negative bacteria but rare in gram-positive bacteria. Using the fitness effect of mgtA mutants in co-culture vs monoculture allowed us to infer Mg2+ competition easily for diverse gram-negative bacteria. Currently, we do not have the experimental tools to extend this finding to gram-positive bacteria. Co-culture growth kinetics for gram-positive bacteria are also likely to be different from gram-negative bacteria in a way that makes direct comparisons challenging. We have clarified our writing in the revised manuscript: "This mode of competition might be highly specific between fungi and diverse gram-negative g-proteobacteria we have tested.... Whether fungi can suppress gram-positive bacteria through the same mechanism of Mg2+ competition remains an open question."

      Line no 314. It is unclear whether the 'transient co-culture' is the same or a different assay as the colistin survival assay.

      We apologize for the confusion and have removed the word 'transient' for clarity. The assays is the same as the 'colistin survival assay in fungal co-culture,' where we co-cultured log-phase P. aeruginosa cells with C. albicans for 5 hours and treated them with colistin.

      Line no 316. For the bacterial survival assays shown in figures 3 and 4 (and other supplementary figures), please provide absolute numbers as cfu (as in figures 1 and 2), as opposed to a percentage, for cell counts. This will allow readers to appropriately interpret the data.

      We thank the reviewer for this suggestion. We now include the raw CFU counts of colistin survival assays in Fig 3 and 4 and other supplementary figures in new supplementary figures (Fig. S11, S13, S14, S15, and S17) in our revision.

      Line no 934-5: Italicize P. aeruginosa.

      This typo has been fixed in our revision.

      Reviewer #1 (Significance (Required)):

      This study identifies a novel interaction between two the co-infecting human pathogens Pseudomonas aeruginosa and Candida albicans, where C. albicans causes Mg2+ limitation for P. aeruginosa. Further, the authors show that this interaction affects levels of antibiotic resistance, as well as the adaptive mutations seen during the evolution of antibiotic resistance. This advances the field by delineating how microbial interactions can affect clinically relevant phenotypes, and potentially clinical outcomes. The study should be of interest to a broad audience of researchers studying microbial ecology, evolutionary biology, microbiology, and infectious diseases.

      We are grateful for the reviewer's positive appraisal.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This paper looks at the interaction between the fungus Candida albicans (Ca) and the bacterium Pseudomonas aeruginosa (Pa), which are found together in some environments. Co-culture experiments showed that Ca can inhibit the growth of Pa. The goal of this study is to determine the reason for this phenomenon and how widespread it is. This was performed by Tnseq analysis of Pa that identified 3 genes which showed significant decreases in the presence of Ca. Interestingly these were all in an operon that was recognized by the authors as being induced by RNAseq during co-culture. One of these genes, mgtA is a known Mg2+ transporter and therefore the remainder of the paper discusses the importance of competition for Mg2+.

      The experiments seem to be well carried out and appropriately controlled.

      We thank the reviewer's appreciation of our science and the rigor of our experiments.

      The use of the Mg2+ genetic sensor reporter in Pa is an interesting approach to determine the intracellular Mg2+ concentrations, however how these levels relate to one another between different experiments is not clear. In Fig. S5, the levels are 5 AU for growth in minimal media +low (10uM) Mg2+ and 38 AU for growth in minimal media+high (10mM) Mg2+. But the levels seen in Figure 1E are all much lower. With such low levels, it is difficult to determine if the impact of ∆PA4824 and ∆mgtA (while perhaps additive) are relevant. Would differences be seen with these various strains grown under different conditions?

      We thank the reviewer for this query. The reviewer is right in that we do not use absolute quantification of intracellular Mg2+ levels. While our Mg2+ genetic sensor assay does not facilitate comparison of absolute Mg2+ levels across experiments, it provides a robust comparative measurement of relative intracellular Mg2+ levels in mutants versus WT cells, or between two different media conditions.

      Using this Mg2+ genetic sensor assay, we tested intracellular Mg2+ levels of WT P. aeruginosa under various media conditions. We found that lower intracellular Mg2+ levels in P. aeruginosa cells and the requirement of mgtA in these media are well-correlated at lower total Mg2+ levels in media (Fig. S9A-E). In contrast, there are no significant differences in intracellular Mg2+ levels between DmgtA (or DPA4824) and WT cells in BHI media, which has higher total Mg2+ levels than fungal-spent BHI media. Our experiments reveal that the lack of mgtA or PA4824 only affects intracellular Mg2+ levels when P. aeruginosa is cultured in media below a threshold level of Mg2+ concentration in media.

      The experiments suggesting that the protein PA4824 is also a Mg2+ transporter seem to be related only to alpha fold predictions.

      We clarify that our speculation that PA4824 encodes a potential novel Mg2+ transporter was first motivated by finding that it is induced in low Mg2+ conditions, its genetic importance in Tn-seq experiments independent of mgtA, and our finding that cells with loss-of-function mutations in PA4824 experience lower intracellular Mg2+ than WT cells. However, the reviewer is correct that this statement is speculative based on the Alphafold prediction. In the revised manuscript, we have clarified this point as the following: "Based on our co-culture RNA-seq and Tn-seq experiments, results from the Mg2+ genetic sensor assay, and the Alphafold prediction of PA4824 protein structure, we speculate that PA4824 potentially acts as a novel Mg2+ transporter."* * Is the statement in line 186 a typo? It is stated that "neither MgtA nor CorA was implicated in competition". Do the authors actually mean "MgtE"?

      It is a typo. We thank the reviewer for pointing this out and have changed this to "MgtE".

      Reviewer #2 (Significance (Required)):

      Ca and Pa are known to inhabit the same niches and previous studies have shown both can have antagonist effects on one another. Nutritional competition is one mechanism of antagonism that has not been that well studied between these two genera. That makes the finding of some significance and relevant to those with an interest in either of these microbes and co-infections. The authors also found that it was not just Ca that had this effect, but other fungi as well. And this effect was not just reserved to effect Pa, but also other bacteria, suggesting a more global impact.

      We thank the reviewer for an accurate summary of our findings.

      However, diminishing the impact of this finding is the question as to whether this is simply a phenomena seen under the very specific laboratory conditions tested here. Furthermore how these findings exactly relate to any infection environment is not clear.

      Fungal-bacterial interactions occur in a variety of broad biological contexts, including during infection in animal hosts or in environmental-associated microbial communities. Our study is the first to identify nutritional competition for Mg2+ as one of the most important axes of competition between fungi and bacteria. Our study also identifies MgtA as one of the key bacterial genes that mediates this interaction. MgtA is only induced upon experiencing low Mg2+ conditions; the fact that most gram-negative bacteria encode MgtA implies they must encounter low Mg2+ conditions and face fitness consequences in those conditions. To address the reviewer's concerns, we also highlight three additional points in our revised Discussion:

      1. Fungal-bacterial competition for Mg2+ is not restricted only to BHI media alone. We also found the same phenomenon in TSB media medium. Indeed, we show (Fig. S9F) also that Mg2+ competition occurs whenever the environmental Mg2+ level is lower than 0.45mM, a critical threshold for fungi and bacteria to compete for this vital ion.
      2. During infection in cystic fibrosis airways, proteomic experiments and Mg2+ measurement in CF sputum both suggest that * aeruginosa* experiences Mg2+ restriction.
      3. Many previous studies have shown that many Gram-negative bacteria, including Salmonella Typhimurium, encounter reduced magnesium concentrations upon infection of hosts (PMID: 29118452). Our discovery that fungal co-culture may generally exacerbate fitness challenges associated with low magnesium levels is of high importance to all studies of gram-negative bacteria, not just to Pa.
      4. In addition to infections in animal hosts, low Mg2+ is associated with worse outcomes of infections in plants. Our study suggests the importance of studying the role of Mg2+ competition in various infection contexts and the strategies of manipulating Mg2+ levels or fungal-bacterial interactions to constrain polymicrobial infectious diseases in diverse eukaryotic hosts and ecological conditions. The authors also seem to vastly overinterpret the significance of their findings; the impact on Pa is only to slow growth, not necessarily effect fitness, per se. The final number of bacteria appears to be the same, it just takes slightly longer to get there.

      We are puzzled by this comment from the reviewer; slow growth IS a fitness effect! Although we agree with the reviewer's point that C. albicans is more likely to inhibit bacterial growth rate than viability (bacteriostatic, not bacteriocidal), there are many bacteriostatic antibiotic mechanisms.

      In our co-culture assay, bacterial CFUs after 40 hours in co-culture are 10-100 times lower than in monoculture (this is not a subtle effect!). After 40 hours, bacterial cultures have already reached the stationary phase, which is why even slower growing bacterial cells in co-culture can 'catch up' (they are still lower by nearly 10-fold), despite fungal inhibition. Moreover, the co-culture condition provided enough of a fitness challenge to allow us to identify bacterial protective genes even in a pooled assay.

      The authors speculate that that since Mg2+ supplementation did not totally restore growth to Pa during co-culture, that other Mg2+ independent "axes of antagonism" must exist. This also tends to diminish the significance of these finding.

      Again, we are puzzled by this comment from the reviewer. Fungal-bacterial competition, like all microbial competition, is a multifactorial process, so we should not be surprised that Mg2+ isn't the only axis of competition. Indeed, our study reinforces the importance of investigating all potential axes of competition to get a complete understanding of the mechanisms of fungal-bacterial competition.

      The importance of mgtA on antibiotic susceptibility has been well studied in a number of bacteria including Pa making these findings generally confirmatory.

      We would like to clarify this comment. To the best of our knowledge, mgtA in P. aeruginosa has not been reported in antibiotic susceptibility studies. Instead, P. aeruginosa mgtE is induced upon treatment with aminoglycoside antibiotics, but its expression does not change antibiotic resistance (PMID: 24162608).

      The reviewer may be referring to studies in S. Typhimurium, where the DmgtA mutant shows increased susceptibility to nitrooxidative stress (PMID: 29118452) and to cyclohexane (PMID: 18487336), suggesting Mg2+ homeostasis might be generally important for bacterial survival to antimicrobial treatments. Although this is not the main focus of our study, we now include these references in our revised discussion to provide readers with more background on the relevance of our work: "Mg2+ has been implicated in altering the susceptibility of gram-negative bacteria to antibiotics other than colistin. For instance, in S. Typhimurium, impaired mgtA or Mg2+ homeostasis increases susceptibility to cyclohexane or nitrooxidative stress. In line with these observations, our study also highlights the importance of studying how Mg2+ homeostasis broadly impacts antimicrobial resistance in gram-negative bacteria."

      The importance of different mutations that emerge in Pa during mono vs. co-culture in the presence of colistin is not clearly explained. Why should co-culture inhibit the emergence of hypermutator Pa strains?

      We thank the reviewer for the opportunity to clarify this important point. Previous studies have shown, both in Pa as well as other bacteria, that hypermutator strains often arise when bacteria adapt to strong and continuous antibiotic stress (PMID: 28630206) to maximize exploration of mutation space necessary to acquire beneficial resistance mutations even though hypermutation itself is inherently deleterious to bacterial fitness. We show that fungal co-culture protects P. aeruginosa from high concentrations of colistin by sequestering the Mg2+ co-factor required for colistin action (Fig. 4C). Thus, under co-culture conditions, bacteria experience lower levels of colistin than the levels administered and are subject to less severe fitness challenges, allowing them to eschew the deleterious route of acquiring adaptive mutations with hypermutation.

      Our discovery that bacteria have an entirely different means of enhancing colistin resistance under fungal co-culture (or low Mg2+) conditions is one of the highlights of our study. Understanding the biological basis of this novel model of colistin resistance will be an active area of investigation to pursue in the future.

      No additional experiments are likely needed but the authors should be encouraged to place their findings more clearly in what is already known in the field as well as articulate the limitations of their study.

      We thank the reviewer for their detailed comments and suggestions. We hope our revisions have both clarified the importance and limitations of our study and provided the right context sought by the reviewer.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this study, Hsieh et al. find a critical axis of competition between Pseudomonas aeruginosa and Candida albicans is Mg2+ sequestered by Candida. The authors find that use of BHI, which is has lower Mg2+ levels compared to other media, allowed this discovery. The authors further demonstrate critical genes for this axis in multiple gammaproteobacteria and fungal species. The authors further show that fungal Mg2+ sequestration promotes polymyxin resistance in multiple gammaproteobacteria and show that it alters the course of Pseudomonas aeruginosa evolution of polymyxin resistance. Finally, they show that for populations evolved polymyxin resistance in the presence of Candida, removal of Candida by antifungal treatment re-induces sensitivity to polymyxins.

      We thank the reviewer for a concise and accurate summary of our study.

      Major comments: -The claims and conclusions are generally supported; however, a key phenotype of the ∆mgtA and ∆PA4824 mutants should be complemented in trans or in a second site of the chromosome.

      We thank the reviewer for this comment and agree with the suggestion. In our revised manuscript, we now provide results of new complementation experiments recommended by the reviewer, which find that expression of PA4824 or mgtA in trans restore the fitness cost of either deletion mutant (Fig. S4C and S4D).

      -The authors note that "This mode of competition appears to be highly specific between fungi and gram-negative bacteria." However, it does not appear that gram-positive bacteria were tested in competition with fungi. Additionally, the only gram-negative tested were gammaproteobacteria (although do represent diverse gammaproteobacteria). This could be addressed by clarifying the text or OPTIONAL additional experimentation.

      We agree with the reviewer. We had intended to highlight that we had only tested this mode of competition between fungi and gram-negative bacteria, but inadvertently phrased this to suggest that gram-positive bacteria are not subject to this competition. As we highlight in our response to Reviewer 1, we are unable to test this (so far) for gram-positive bacteria. We clarify this in our revision: ""This mode of competition might be highly specific between fungi and diverse gram-negative g-proteobacteria we have tested.... Whether fungi can suppress gram-positive bacteria through the same mechanism of Mg2+ competition remains an open question."

      -Figure 3A: is this depiction of modifications on the O-antigen correct? PhoQ- and PmrB-activated enzymes seem to modify the lipid A portion of LPS (eg PMID: 31142822)

      We thank the reviewer for noting this error, which we have now fixed in the revision.

      • For many of the figures, multiple t-tests are used and it seems like perhaps an ANOVA with multiple comparisons would be more appropriate

      We thank the reviewer for this feedback. In our revision, we now use Dunnett's one-way ANOVA test for figures with multiple comparisons; our conclusions are unchanged.

      Minor comments: - The text and figures are clear and accurate

      We thank the reviewer for this feedback.

      -the cited nutritional immunity reviews are out of date (e.g. reference 37) and there are more recent reviews on the topic (e.g. PMID: 35641670)

      We have added the suggested reference in our revision.

      -Line 293: Unclear why polymyxin resistance would be "unexpected" following the explanation of why Mg2+ depletion might confer it

      We agree and have removed 'unexpected.'

      -Line 318: "antibitoics" typo

      We thank the reviewer for pointing out this typo, which we have now corrected.

      Reviewer #3 (Significance (Required)):

      The following aspects are important:

      • General assessment: This study is very mechanistic, identifying the role of Mg2+ sequestration by fungi that limit gram-negative bacterial growth in Mg2+ deplete environments. The strengths are that relevant Mg2+ acquisition genes are identified or tested in Pseudomonas aeruginosa, the main test organism, as well as Salmonella enterica and Escherichia coli. Additionally, the authors identify a relevant Mg2+ mechanism in fungal species tested, including showing the importance with a genetic knockout. The limitations are relatively minor, and include lack of complementation, potential issues in model figure depiction of LPS modifications, and potential minor issues in statistical tests used. Future directions discussed include expanding analysis to clinical isolates, which is outside the scope of this manuscript which already showed the same mechanism in diverse gammaproteobacterial.

      We thank the reviewer for their positive appraisal.

      • Advance: This study has two major advances: The first is uncovering this critical Mg2+ sequestration axis in competition between fungal species and gammaproteobacteria. The second is the finding that the Mg+ sequestration induces polymyxin resistance and alters the evolutionary path to further polymyxin resistance. While nutrient metals as an axis of competition is not a conceptual advance, the specific role of Mg2+ and its affect on evolution of polymyxin antibiotic resistance is a conceptual advance.
      • Audience: I think this study would be of interest to a relatively broad audience. The study itself touches on multiple fields including intermicrobial competition, nutritional immunity, antimicrobial resistance, and microbial evolution. Additionally, there are clinical implications for the potential to use antifungals to resensitize polymyxin-resistant P. aeruginosa to polymyxins.
      • My field of expertise is bacterial genetics and physiology, nutritional immunity, and bacterial cell envelope. I do not have expertise in fungus.

      We appreciate the reviewer's positive and constructive feedback on our study and for highlighting the relevance of our research to a broader audience in microbiology and evolution. We do hope our mechanistic understanding of fungal-bacterial competition will spark further conversation or collaboration between evolutionary microbiologists and physician-scientists.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. Point-by-point description of the revisions

      Reviewer #1:

      Evidence, reproducibility and clarity (Required):

      In this manuscript Czajkowski et al explore the role of the doublecortin-family kinase ZYG-8 during meiosis in C. elegans Oocytes. First by studying available temperature-sensitive mutants and then by generating their own strain expressing ZYG-8 amenable to auxin-inducible degradation, they establish that defects in ZYG-8 lead to defects in spindle assembly, such as the formation of multipolar spindles, and spindle maintenance, in which spindles elongate, fall apart, and deform in meiosis. Based on these observations the authors conclude that ZYG-8 depletion leads to excessive outward force. As the lab had previously found that the motor protein KLP-18 generates outside directed forces in meiosis, Czajkowski et al initially speculate that ZYG-8 might regulate KLP-18. KLP-18 depletion generally leads to the formation of monopolar spindles in meiosis. Intriguingly, when the authors co-deplete ZYG-8 they find that in some cases bipolarity was reestablished. This led to the hypothesis that yet another kinesin, BMK-1, the homolog of the mammalian EG-5, could provide redundant outward directed forces to KLP-18. The authors then study the effect of ZYG-8 and KLP-18 co-depletion in a BMK-1 mutant background strain and observe that bipolarity is no longer reestablished under these conditions, suggesting that BMK-1 generates additional outward directed forces. The authors also conclude that ZYG-8 inhibits BMK-1. To follow up on this Czajkowski et al generate a ZYG-8 line that carries a mutation in the kinase domain, which should inhibit its kinase activity. This line shows similar effects in terms of spindle elongation but reduced impact on spindle integrity, reflected in minor effects on the number of spindle poles and spindle angle. The authors conclude that ZYG-8's kinase activity is required for the function of ZYG-8 in meiosis and mitosis. Overall, the paper is well written, and the data is presented very clearly and reproducible. The experiments are adequately replicated, and statistical analysis are adequate. *The observations are very interesting. However, the authors could provide some additional insight into the function of ZYG-8. This paper is strongly focused on motor generated forces within the spindle and tries to place ZYG-8 within this context, but there is compelling evidence from other studies that ZYG-8 also affects microtubule dynamics, which would have implications for spindle assembly and structure. The paper would strongly benefit from the authors exploring this role of ZYG-8 in the context of meiosis further. If the authors feel that this would extend beyond the scope of this paper, I would suggest that the authors rephrase some of their introduction and discussion to reflect the possibility that changes in microtubule growth and nucleation rates could explain some of the phenotypes (think of katanin) and effects and that therefore it can not necessarily be concluded that BMK-1 is inhibited by ZYG-8. *

      We thank the reviewer for these positive comments on our manuscript and on the rigor of our data. We also thank them for the excellent suggestion to explore a potential role for microtubule dynamics. As detailed below in response to the specific points, we performed new experiments to explore this possibility, and found via FRAP analysis that there were substantial changes in microtubule dynamics upon ZYG-8 depletion. We have therefore added these new data and have re-written major parts of the manuscript to incorporate a discussion of microtubule dynamics throughout the paper (introduction, results, model, discussion). Our data now support two roles for ZYG-8 in regulating acentrosomal spindle assembly and stability - one in modulating microtubule dynamics and the other in tuning forces (either directly or indirectly). We are grateful to the reviewer for motivating us to do these experiments, as they have added a whole new angle to the manuscript and have greatly increased its impact, as we now have a fuller understanding of how ZYG-8 contributes to oocyte meiosis.

      Major points:

      *1.) Zyg-8, as well as the mammalian homolog DCLK-1, has been reported to play an important role for microtubule dynamics. While the introduction mentions its previously shown role in meiosis and mitosis, it is totally lacking any background on the effect on microtubule dynamics. The authors mention these findings in the discussion, but it would be helpful to incorporate this in the introduction as well. As an example, Goenczy et al 2001 demonstrated that ZYG-8 is involved in spindle positioning but also showed its ability to bind microtubules and promote microtubule assembly. Interestingly, like the authors here, Goenczy et al concluded that while the kinase domain contributes to, it is not essential ZYG-8's function. Also, Srayko et al 2005 (PMID 16054029) demonstrated that ZYG-8 depletion led to reduced microtubule growth rates and increased nucleation rates in C. elegans mitotic embryos. And in mammalian cells DCLK-1 was shown to increase microtubule nucleation rate and decrease catastrophe rate, leading to a net stabilization of microtubules (Moores et al 2006, PMID: 16957770). It would be great if the authors could add to the introduction that ZYG-8 has been suggested to affect microtubule dynamics. *

      We agree that this is a great idea. As the reviewer suggested, we decided to explore the possibility that ZYG-8 impacts microtubule dynamics within the oocyte spindle. We depleted ZYG-8 and performed FRAP experiments to determine if there were effects on microtubule turnover. We found that loss of ZYG-8 caused a dramatic decrease in the spindle's ability to recover tubulin, both at the spindle center and at the spindle poles (shown in a new Figure 7). We made substantial changes to the manuscript when adding these new data - the manuscript now discusses ZYG-8's role in modulating microtubule dynamics in the introduction, results, discussion, and model (Figure 9), and we added all of the references suggested by the reviewer. We think that the manuscript is greatly improved due to these additions and changes.

      *2a.) The authors initially study two different ts alleles, or484ts and b235ts. The experiments clearly show a significant increase in spindle length in both strains. However, the or484 strain had been previously studied (McNally et al 2016, PMID: 27335123), and only minor effects on spindle length were reported (8.5µm in wt metaphase and 10µm in zyg-8 (or484)). How do the authors explain these differences in ZYG-8 phenotype. Even though the ZYG-8 phenotype is consistent throughout this paper it would be good to explain why the authors observe spindle elongation, fragmentation and spindle bending in contrast to previous observations. *

      The reviewer is correct that McNally et.al. (2016) noted only minor effects on spindle length and did not report observing spindle bending or pole defects. However, the images presented in their paper of spindles in the zyg-8(or484) mutant (in Figure 8B) only showed spindles after they had already shrunk in preparation for anaphase; it is possible that these spindles had pole or midspindle defects prior to this shrinking, and that the authors did not note those phenotypes because their analysis focused on anaphase. In contrast, since the goal of our study focused on how ZYG-8 impacts spindle assembly and maintenance, we looked carefully at spindle morphology and quantified a larger number of metaphase spindles (in their study, only 12 metaphase spindles were measured, since metaphase was not the focus of their manuscript). Recently (after we submitted our manuscript), another study from the McNally lab was published, where they did note metaphase defects following ZYG-8 inhibition (though they did not describe the defects in detail or explore why they happened). We now mention and cite this new paper (Li et.al., 2023) in our manuscript, to show that our findings are consistent with the work of others in the field.

      *2b.) As a general note, it would be helpful if the authors could indicate if the spindles are in meiosis I or II. The only time where this is specifically mentioned is in Video 7, showing a Meiosis II spindle, which makes me assume all other data is in Meiosis I. Adding this to the figures would also help to distinguish if some of the images, i.e. Figure 1B, show multipolar spindles due to failed polar body extrusion. If this is the case then the quantification of number of poles should maybe reflect different possibilities, such as fragmented poles vs. multiple poles because two spindles form around dispersed chromatin masses. *

      We agree that it is a good idea to clarify this issue. For all of our experiments, we analyzed both MI and MII spindles. However, there were no noticeable differences in phenotype between MI and MII spindles for any of our mutant/depletion conditions - we observed bent spindles, elongated spindles, and extra poles in both MI and MII following ZYG-8 inhibition. Therefore, for the quantifications presented in the manuscript (spindle length, spindle angle, number of poles), we pooled our MI and MII data. We have now added this information to the manuscript for clarity (lines 97-99 and 139-141). In addition, we have added new images to Figure 1B that show examples of MII spindles (both at the permissive and restrictive temperature), to show that the phenotypes are indistinguishable between MI and MII.

      We agree with the reviewer that one of the spindles in the original Figure 1B looked like it could have resulted from failed polar body extrusion (the chromosomes appeared to be in two masses, something we did not originally notice, so theoretically each mass could have organized its own spindle). To determine if this was the case, we looked closely at the chromosomes in this image; we confirmed that there were only 6 chromosomes, and that all were bivalents (these can be distinguished from MII chromosomes based on size). Therefore, this spindle was not multipolar due to an issue with polar body extrusion. However, to prevent future confusion, we picked a different representative spindle (where the bivalents we not grouped into two masses), and we added a new column to the figure that shows the DNA channel in grayscale (so it is easier to see and count the chromosomes). We also now note in the materials and methods how we were able to distinguish between MI and MII in our experiments (chromosome count, size, presence of polar bodies), so that it is clear that none of our phenotypes result from failed polar body extrusion (lines 600-603).

      *3) The authors generate a line that carries a mutation leading to a kinase dead version of ZYG-8. It would be great if the authors could further test if this version is truly kinase dead. What is interesting is that the kinase dead version the authors create has less effect on the numbers of pole than the zyg-8 (b235)ts strain, which carries a mutation in a less conserved kinase region. Overall, it seems that the phenotypes are very similar, independent on mutations in the microtubule binding area, kinase area or after AID. This could of course be due to all regions being important, i.e. microtubule binding is required for localizing kinase-activity. Generating mutant versions of the target proteins, for example here BMK-1, that can not be phosphorylated or are constitutively active as well as assessment of changes in protein phosphorylation levels in the kinase dead strain would be helpful to provide deeper insight into potential regulation of proteins by ZYG-8. *

      We agree that it would be ideal to test whether the D604N mutant is truly kinase dead. However, in the interest of time, we ask to be allowed to skip that experiment. The analogous residue has been mutated in mammalian ZYG-8 (DCLK1), and has been shown to cause DCLK1 to be kinase dead in vitro; this is a highly conserved aspartic acid in the central part of the catalytic domain, so we infer that the mutation we made in ZYG-8 should be kinase dead as well. However, since we did not test this directly, we softened our language in the manuscript, explaining that we "infer" that it is kinase dead rather than stating definitively that it is. With regard to the zyg-8(b235)ts mutant having a stronger phenotype, we think that it is possible that this mutation destabilizes a larger portion of the protein (rather than just affecting the catalytic activity), since the phenotypes in this mutant are similar to depletion of the protein in the ZYG-8 AID strain. Therefore, we think that our D604N mutant reveals new information about the role of kinase activity, since it is a more specific mutation that should likely only affect catalytic activity and not the rest of the protein (based on the previous work on DCLK1).

      While we appreciate the suggestion from the reviewer to generate mutant versions of potential target proteins, we ask that this be considered beyond the scope of the study. Now that we know that ZYG-8 not only affects forces within the spindle (maybe BMK-1) but also microtubule dynamics, there are many potential targets - it would require a lot of work to figure out what the relevant targets are. Instead of exploring this experimentally in this manuscript, we added a new section to the discussion where we speculate on what some of these targets could be, to motivate future studies.

      *4a) The authors state that "BMK-1 provides redundant outward force to KLP18". Redundancy usually suggests that one protein can take over the function of another one when the other is not there. In these scenarios a phenotype is often only visible when both proteins are depleted as each can take over the function of the other one. Here however the situation seems slightly different, as depletion of BMK-1 has no phenotype while depletion of KLP-18 leads to monopolar spindles. If BMK-1 would normally provide outward directed forces, would this not be visible in KLP-18 depleted oocytes if they were truly redundant? I assume the authors hypothesize that ZYG-8 inhibits BMK-1 and thus it can not generate outward directed forces. In this case, do the authors envision that ZYG-8 inhibits BMK-1 prior to or in metaphase or only in anaphase or throughout meiosis? Do they speculate, that BMK-1 is inhibited in anaphase and only active in metaphase? *

      The reviewer makes an excellent point - we agree that we should not use the word "redundant" in this context, so we have removed this phrasing from the manuscript. We hypothesize that BMK-1 can provide outward forces during spindle assembly but is not capable of providing as much force as KLP-18 (the primary force-generating motor). We infer this based on our experiments where we co-deplete KLP-18 and ZYG-8 (using long-term depletion). Although BMK-1 is presumably activated under these conditions, it is not able to restore spindle bipolarity (there are outward forces generated, which results in minus ends being found at the periphery of the monopolar spindle, but spindles are not bipolar).

      Therefore, BMK-1 is not able to fully replace the function of KLP-18 during spindle assembly. Interestingly, our experiments imply that BMK-1 can better substitute for KLP-18 later on (when ZYG-8 is inhibited); when we remove ZYG-8 from formed monopolar spindles, bipolarity can be restored (an activity dependent on BMK-1). These findings suggest that ZYG-8 plays a more important role in suppressing BMK-1 activity after the spindle forms, to prevent spindle overelongation in metaphase. We have edited the manuscript to better explain these points.

      *4b) In addition, Figure S4 somewhat argues against a role for ZYG-8 in regulating BMK-1. ZYG-8 depletion supposedly leads to increased outward forces due to loss of BMK-1 inhibition, thus co-depletion of ZYG-8 with BMK-1 should rescue the increased spindle size at least to some extent, however neither increase in spindle length nor increase in additional spindle pole formation are prevented by co-depletion of BMK-1 suggesting that BMK-1 is not generating the forces leading to spindle length increase. Thus, arguing that after all ZYG-8 does not regulate BMK-1. This should be discussed further in the paper and the authors should consider changing the title. At this point the provided evidence that ZYG-8 is regulating motor activity is not strong enough to make this claim. *

      The reviewer is correct that Figure S4 shows that the effects of depleting ZYG-8 on bipolar spindles (spindle elongation and pole/midspindle defects) cannot solely be explained by a role for ZYG-8 in regulating BMK-1 - this was the point that we were trying to make when we included this data in the original manuscript. However, we previously did not know what this other role could be, and therefore we only speculated on other potential roles in the discussion. Now that we have done FRAP experiments and have found that ZYG-8 also affects microtubule dynamics in the oocyte spindle, we now have a better explanation for the data presented in Figure S4 - it makes sense that deleting BMK-1 would not rescue the effects of ZYG-8 depletion, since we have evidence that ZYG-8 also regulates microtubule dynamics. We now clearly explain this in the revised manuscript and we have changed the title to make it clear that ZYG-8 plays multiple roles in oocytes.

      *5) The authors are proposing that ZYG-8 regulates/ inhibits BMK-1, however convincing evidence for an inhibition is not provided in my opinion and the effect of ZYG-8 on BMK-1 could be indirect. To make a compelling argument for a regulation of BMK-1 the authors would have to investigate if ZYG-8 interacts and/ or phosphorylates BMK-1 (see 7) and if this affects its dynamics. In addition, given the reported role of ZYG-8 on microtubule dynamics it would be very important that the authors consider studying the effect of ZYG-8 degradation on microtubule dynamics. Tracking of EBP-2 would be good, however this is very difficult to do inside meiotic spindles due to their small size. In addition, the authors could maybe consider some FRAP experiments, which could provide insights into microtubule dynamics and motions, which could be indicative of outward directed forces/ sliding. *

      We thank the reviewer for these comments as they motivated us to explore a role for ZYG-8 in modulating microtubule dynamics. The reviewer is correct that tracking EBP-2 in the very small meiotic spindle is not possible due to technical limitations, so we took the suggestion to perform FRAP. These experiments revealed that microtubule turnover in the spindle is greatly slowed following ZYG-8 depletion, suggesting a global stabilization of microtubules (data presented in a new Figure 7). This change in dynamics could contribute to the observed spindle phenotypes, which we now explain in detail in the manuscript. Given these new findings, we also now note that the effects we see on BMK-1 activity could be indirect (i.e. maybe increasing the stability of microtubules allows motors to exert excess forces). We now clearly discuss these various possibilities in the discussion.

      Summary: Additional requested experiments:

      • Interaction/ phosphorylation of BMK-1 by ZYG-8, i.e. changes of BMK-1 phosphorylation in absence of ZYG-8, BMK-1 mutations that may prevent phosphorylation by ZYG-8.
      • Assessment of microtubule dynamics (EBP-2, FRAP, length in monopolar spindles...)
      • Kinase activity of the kinase dead ZYG-8 strain (OPTIONAL) We assessed the role of ZYG-8 in microtubule dynamics (bullet point #2). Because this new analysis revealed that ZYG-8 plays multiple roles in the spindle, we decided not to further investigate whether ZYG-8 phosphorylates BMK-1, since the manuscript now no longer argues that this is ZYG-8's major function. We also did not assess the kinase activity of the D604N mutant since this has been done previously for DCLK1, and instead we softened the language in manuscript when describing this mutant.

      Minor points:

      *1) In Figure 4C it seems that the ZYG-8 AID line as well as the zyg-8 (or848)ts already have a phenotype (increased ASPM-1 foci) in absence of auxin/ at the permissive temperature. Does this suggest that the ZYG-8 AID as well as the zyg-8 (or848) strains are after all slightly defective (even if Figure 1, S1 and S2 argue otherwise) and thus more responsive to the loss of KLP-18? *

      The reviewer is correct that the ZYG-8 AID strain (without auxin) and zyg-8(or848)ts strain (at the permissive temperature) are slightly defective in the klp-18(RNAi) monopolar spindle assay. To more rigorously determine whether these strains were also defective in other assays, we generated new graphs comparing the spindle lengths and angles of the two temperature sensitive strains at the permissive temperature to wild-type (N2) worms. These data are now shown in Figure S1 (new panels F and G). A comparison of our ZYG-8 AID strain to a control strain (both in the absence of auxin) are shown in Figure S2 (panels C and D). In this analysis, there wasn't a significant difference for either of these comparisons (i.e. the spindle lengths and angles were all equivalent). We do not know why these strains appear to be slightly defective in the monopolar spindle assay, though perhaps this assay is more sensitive and can detect very mild defects in protein function.

      *2) The authors observe that in preformed monopolar spindles degradation of ZYG-8 can sometimes restore bipolarity. This observation is very interesting but why do the authors not observe a similar phenotype in long-term ZYG-8 AID; klp-18 (RNAi) or zyg-8(or484)ts; klp-18(RNAi). In the latter conditions bipolarity does not seem to occur at all. Do the authors think this is due to differences in timing of events? *

      We thank the reviewer for highlighting this point. We do think that our data suggest that ZYG-8 plays a more important role maintaining the spindle that it does in spindle formation; we have now more clearly explained this in the manuscript (detailing the differences in phenotypes we observe when we deplete ZYG-8 prior to spindle assembly or after the spindle has already formed, lines 180-189 and 227-231). To emphasize this point further, we have also included a graph in Figure S3G that directly compares the number of poles per spindle in long-term auxin treated spindles to short-term auxin treated spindles (with and without metaphase arrest).

      *3) Based on the Cavin-Meza 2022 paper it looks like depletion of KLP-18 in a BMK-1 mutant background does not look different from klp-18 (RNAi) alone. However, looking at Video 8, it looks like spindles "shrink" in absence of KLP-18 and BMK-1. Or is this due to any effects from the ZYG-8 AID strain? This can also be seen in Video 9. *

      The reviewer highlights a fair point that was not clearly explained in our manuscript. In normal monopolar anaphase, chromosomes move in towards the center pole as the spindle gets smaller (C. elegans oocyte spindles shrink in both bipolar and monopolar anaphase); this was previously described in Muscat et.al. 2015, and, as the reviewer noted, in Cavin-Meza et.al. 2022 (in a strain with the bmk-1 mutation). We see this same monopolar anaphase behavior in the ZYG-8 AID strain (Figure 6). We have now better explained normal monopolar anaphase progression and we have cited the Muscat et.al. paper in the relevant sections of the manuscript (lines 221-223 and 714-717).

      *4) Line 311: " ZYG-8 loads onto the spindle along with BMK-1, and functions to inhibit BMK-1 from over elongating microtubules during metaphase." Maybe this sentence could be re-phrased as it currently sounds like BMK-1 elongates (polymerizes) microtubules. *

      In re-writing the manuscript and emphasizing that there are multiple for ZYG-8 (in addition to regulating forces within the spindle), we removed this sentence.

      *5) Line 313: "Intriguingly, in C. elegans oocytes and mitotically-dividing embryos, BMK-1 inhibition causes faster spindle elongation during anaphase, suggesting that BMK-1 normally functions as a brake to slow spindle elongation (Saunders et al., 2007; Laband et al., 2017). Further, ZYG-8 has been shown to be required for spindle elongation during anaphase B (McNally et al., 2016). Our findings may provide an explanation for this phenotype, since if ZYG-8 inhibits BMK-1 as we propose, then following ZYG-8 depletion, BMK-1 could be hyperactive, slowing anaphase B spindle elongation." This paragraph could be modified for better clarity. It is not clear how the findings of the authors, BMK-1 provides outward force but is normally inhibited by ZYG-8, align with the last sentence saying "following zyg-8 depletion, BMK-1 could be hyperactive slowing anaphase B spindle elongation", should it not increase elongation according to the authors observations? *

      In re-writing the manuscript to incorporate our new data showing that ZYG-8 plays a role in modulating microtubule dynamics, we also re-wrote this discussion so that there would be less emphasis on the potential connection between ZYG-8 and BMK-1. In making these edits to expand the focus of the manuscript, we removed this section of the discussion.

      Reviewer #1 (Significance (Required)): *In this manuscript Czajkowski et al explore the role of the doublecortin-family kinase ZYG-8 during meiosis in C. elegans Oocytes. The authors conclude that BMK-1 generates outward directed force, redundant to forces generated by KLP-18, and that ZYG-8 inhibits BMK-1. The authors conclude that ZYG-8's kinase activity is required for the function of ZYG-8 in meiosis and mitosis. This research is interesting and provides some novel insight into the role of ZYG-8. In particular the observed spindle elongation and subsequent spindle fragmentation are novel and had not yet been reported. Also, the observation that degradation of ZYG-8 in monopolar klp-18(RNAi) spindles can restore bipolarity is novel and interesting, as well as the observation that this is somewhat dependent on the presence of BMK-1. This will be of interest to a broad audience and provides some new insight into the role of importance of ZYG-8 and BMK-1. The limitation of the study is the interpretation of the results and the lack of solid evidence that the observed phenotypes are due to ZYG-8 regulation motor activity, as the title claims. To support this some more experiments would be required. In addition, ZYG-8 has been reported to affect microtubule dynamics, which can certainly affect the action of motors on microtubules. This line of research is not explored in the paper but would certainly add to its value.

      Field of expertise: Research in cell division *

      We thank the reviewer for their positive comments on the impact and novelty of our findings. We hope that the additional experiments we performed and the revisions we made to the text thoroughly address the reviewer's concerns and that they deem the revised manuscript ready for publication.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): *In this manuscript, the authors explore the requirement for doublecortin kinase Zyg8 in C elegans oocytes. Oocytes build meiotic spindles in the absence of centrosomes, and therefore unique regulation occurs during this process. Therefore, how spindles are built and its later stability are an area of active investigation in the field. Using mutant alleles of Zyg8 and auxin-induced degron alleles, the authors demonstrate that this kinase is required to negatively regulate outward pole forces through BMK1 kinesin and that it has other functions to still explore. Overall, I find that this study takes an elegant genetic approach to tackling this important question in oocyte biology. I have some comments to consider for making the MS clear to a reader. *

      We thank the reviewer for these positive comments on our approach and the importance of our research question. We have attempted to address all of the reviewer's suggestions and we think that they increase the clarity of the manuscript.

      Major Comments:

      *1.) Although I like the graphs describing the altered angles of the spindles, it falls short in fully assessing the phenotype in a meaningful statistical way. Could the authors also graph the data to show statistical significance in the angles between conditions? Perhaps by grouping them into angle ranges and performing an Anova test? This is important in Figure 2E where it is not obvious that there is a difference. *

      The reviewer makes a good point - we have now addressed this concern by performing ANOVA tests to compare conditions on each of the angle graphs. Results of these tests have been reported in the corresponding figure legends. This analysis has confirmed all of the statements we made in the original manuscript. In Figure 1D and S1D, spindle angles were significantly different in the zyg-8 temperature sensitive mutants at the restrictive temperature, and in Figure 2, the angles were significantly different between the "minus auxin" and "plus auxin" conditions. This differs from Figure 7, where there was no significant difference in spindle angle between control spindles and kinase dead mutant spindles (p-value >0.1).

      *2.) The authors do not discuss the significance of the altered spindle angles which I think is an interesting phenotype. Would this be a problem upon Anaphase onset? What is known about spindle angle and aneuploidy or cell viability? Has this phenotype been described before in oocytes or somatic cells? Does depletion of other kinesin motors cause this? *

      The reviewer brings up a good point that warrants more discussion in the manuscript. We agree that the angled spindles are an interesting phenotype; we believe that they could be a result of the spindle elongating to a point where the spindle center becomes weakened, suggesting that the severity of the angle is representative of the severity of spindle elongation. Alternatively, the angled spindles could be a result of the loss of spindle stability factors, such as the doublecortin domain of ZYG-8. This domain is known to have microtubule binding activity; this could be required to maintain stable crosslinked microtubules in the spindle center, such that when ZYG-8 is depleted, the spindle more easily comes apart as the spindle elongates. We now discuss these possibilities in the revised manuscript.

      To the reviewer's second point, we did not examine anaphase outcomes in our manuscript. However, this was recently explored by another lab (in a study that was published after we submitted our manuscript). This study showed that spindles lacking ZYG-8 were able to initiate anaphase and segregate chromosomes (McNally et.al., 2023, https://doi.org/10.1371/journal.pgen.1011090). Perhaps when the spindle shrinks at anaphase onset, the spindle is able to reorganize and largely correct the angle defect, enabling bi-directional chromosome segregation. Interestingly, however, McNally et.al. did report conditions under which spindle bending in anaphase resulted in polar body extrusion errors. The authors reported that BMK-1, which is known to act as a brake to prevent spindle oveelongation in anaphase, is required to prevent bent spindles during anaphase by resisting the forces of cortical myosin on the spindle. Thus, there is precedence for the idea that spindle needs to remain straight throughout anaphase, to ensure proper chromosome segregation.

      *3.) How is embryo spindle positioning determined? It is not clear from the images that there is a defect so I'm not sure what to look for. Is there a way to quantify this? *

      In the original manuscript, spindle positioning within the embryo was determined qualitatively by eye, which we agree was not a precise measure. To address the reviewer's comment, we re-analyzed our images and assessed the position of the spindle within each embryo quantitatively - these data are now shown in Figure 8H and Figure S2B. Spindle position was quantified by analyzing images using Imaris software. The center of the spindle was set by creating a Surface of the DNA signal, and finding the center of that signal. The cell center was determined by measuring the length of the embryo along the long axis and the width of the embryo along the short axis, and setting the center as the halfway point of the total length and width of the embryo. Distance from spindle center to cell center was then measured and graphed. This quantification confirmed the claims we made in our original manuscript - both auxin-treated ZYG-8 AID spindles and ZYG-8 kinase-dead mitotic spindles were significantly mispositioned. The details of how we performed this quantification have been added to the materials and methods.

      *4.) In Figure 1, it appears that there are 2 spindles. Are these MI and MII spindles or ectopic spindles? How do the authors know which one to measure? *

      We thank the reviewer for pointing this out. Reviewer 1 had a similar comment, and we now understand that using that image was misleading, as it looked like as if were two separate MII spindles formed following a failed polar body extrusion event. We have gone through all of our images to stage the oocytes by looking at their chromosome morphology (i.e., to distinguish MI and MII) - the image in question had 6 bivalents and was therefore in Meiosis I; we think that this was a single spindle where the chromosomes happened to cluster into two masses. However, to prevent further confusion, we have replaced this image with a different representative image. In spindles like this with multiple poles, we measure the dominant axis of the spindle (if there are multiple poles, we pick the most prominent ones for the angle measurement). For additional details please see our response to Reviewer 1 major point #2b.

      *5.) The authors show depletion of Zyg8 by western (long) and loss of Gfp (long and short), but don't do so for the acute treatment. I'm guessing this is because the Gfp tag is taken by the spindle marker. The authors should either demonstrate or explain how they know that the acute depletion is effective in removing Zyg8 protein. *

      The reviewer makes a valid point. However, we are unable to see ZYG-8 depletion via acute auxin treatment using live imaging, as ZYG-8 localization is too dim and diffuse to see on the spindle using our typical live imaging parameters (we attempted to do this in a version of the ZYG-8 AID strain that has mCherry::tubulin and GFP::ZYG-8, so that there was no other spindle protein tagged with GFP). To see any GFP::ZYG-8 signal, we had to increase the laser power and exposure time well above what we typically use for live imaging - in doing this, we noticed that there was a limit to how high we could go before the cell began dying during the imaging time course, evident by a lack of chromosome movement, lack of tubulin turnover, and a general increase in tubulin signal throughout the cytoplasm. We do believe that ZYG-8 is being depleted using acute auxin treatment, however, as we see spindle defects very quickly upon dissection of the oocytes into auxin - we just unfortunately don't have a good way of quantifying this given these technical limitations. We have now added information to the materials and methods noting that we cannot see GFP::ZYG-8 under our live imaging conditions (lines 552-561), so that the reader better understands this caveat.

      *6.) In video 2, the chromosome signal is dimmer in the auxin treatment compared to video 1. Why is this? Is it just an experimental artifact or is there something significant about this? If it is because of video choice, consider replacing this one. *

      We thank the reviewer for their keen observations. The chromosome signal being dimmer in the auxin treatment is an experimental artifact - the brightness of the signal can vary depending on how far the spindle is from the slide (this can vary from video to video, and can also change over the time course of one video if the spindle moves during filming). Because of this, movies taken at the same intensity and exposure conditions may appear to have varying levels of brightness. So that readers of the manuscript can better see the chromosomes in this video, we have brightened the chromosome channel in this movie and noted this in the materials and methods (lines 549-551).

      7.) Please consider color palette changes for color-blind readership.

      We agree that it is important to present data in a way that can be appreciated by color-blind readers. Although we would prefer not to have to alter every image in our paper at this point, we have provided all important individual channels in grey scale. We are also planning to adopt a change in color palette for future papers.

      Reviewer #2 (Significance (Required)):

      *The strengths of this manuscript include use of multiple genetic approaches to establish temporal requirements of ZYG8 and which pathway it is acting through. Additionally, the videos and images make the phenotypes clear to evaluate. A minor limitation is that we don't know if the ZYG8 and BMK1 genetic interaction is a direct phosphorylation or not. This MS is an advancement to the field of spindle building and stability, and is particularly relevant to human oocyte quality and fertility. Previous work has shown that human oocyte spindles are highly unstable, but it is challenging to dissect genetic interactions and to conduct mechanistic studies in human oocytes. Therefore, the work here, although conducted in a nematode, can shed light on mechanism as to why human oocyte spindles are unstable and associated with high aneuploidy rates. Based on my expertise in mammalian oocyte biology, I am confident that work presented here will be of high interest to people in the field of meiotic spindle building, aneuploidy and fertility. It also will have broader interest to folks in the areas of kinesin biology, general microtubule and spindle biology. *

      We thank the reviewer for these positive comments on the strength of our data and the significance of our findings reported in our original manuscript. We think that the improvements that we have made in response to suggestions from all three reviewers has further increased this impact.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *Summary: The focus of this paper is the function of a relatively understudied (at least in meiosis) kinase in acentrosomal spindle assembly (Zyg-8, or DCLK1 in mammals) in C. elegans oocytes. The authors use existing ts alleles and a newly generated GFP-Auxin fusion protein, and find that the ts alleles and auxin degron have similar phenotypes. They also examine the interaction with two related kinesins, KLP-18 and BMK-1 in order to investigate the mechanism behind the zyg-8 mutant phenotype. One can probably debate the significance and focus of their conclusions (force balance on the spindle). However, this is an important study because its the first on the meiotic function of a ZYG-8 kinase, and it may open the way to further studies of this kinase and how it regulates multiple kinesins and meiotic spindle assembly. *

      We thank the reviewer for these positive comments and for pointing out the potential future impacts of our work. In revising the manuscript, we have broadened the focus of the manuscript - we no longer solely focus on force balance within the spindle. Thus, our revisions have substantially increased the significance and impact of our work, since the manuscript is no longer narrowly focused.

      Major points:

      *1.) The main concern is the focus that the main defect in zyg-8 depleted oocytes is on outward forces (eg line 134, 277, but many other places in Results and Discussion). The arguments in favor (eg line 269-271) are reasonable. However, these data are not conclusive, and do not rule out regulation of other motor activities, such as bundling, depolymerization or chromosome movement. These are complex phenotypes, and a kinase could have multiple targets and there are often multiple interpretations. This is briefly alluded to in line 372-373 but the authors could do more. Spindle length changes could be caused by different rates of depolymerization or polymerization at the poles or chromosomes. Its not clear how poleward force regulation explains the multiple pole phenotype, although a lack of central spindle integrity could do that. In most of the Results and Discussion, it is not clearly stated on what structures these outward forces are acting. Are these forces effecting kinetochore associated microtubules, or antiparallel overlap microtubules? What do the authors mean by proper force balance? Figure 8 suggests the defect is associated with the amount of overlap and force among antiparallel microtubules - that the forces effected are from the sliding of these microtubules. *

      We agree that our original manuscript was too narrowly focused on the idea of force balance and that we did not discuss other potential roles for ZYG-8 in enough detail (except for briefly in our discussion). In response to both this comment and to a suggestion by Reviewer #1, we decided to investigate a potential role for ZYG-8 in modulating microtubule dynamics (which could be another explanation for some of the phenotypes we observed). We performed FRAP to measure the rate of tubulin turnover within the spindle near the center and at the poles. Interestingly, these experiments revealed that loss of ZYG-8 slows the rate of tubulin turnover, suggesting a general stabilization of microtubules. Thus, we have re-written our manuscript to clearly explain that ZYG-8 plays multiple roles in oocyte spindles - with these changes throughout the manuscript (in the introduction, results, and discussion), the paper is now no longer focused primarily on forces. We hypothesize in the discussion that the phenotypes we observe could be a combination of the effects on microtubule dynamics and spindle forces; if microtubules become more stable and motors produce excess outward forces, this may cause stress on the spindle structure that could cause the midspindle to bend and the poles to split (lines 379-382). We also now more clearly explain that the effect ZYG-8 has on spindle forces could be either direct or indirect (e.g., ZYG-8 could directly regulate motors or, by affecting the microtubule tracks themselves, it could affect their ability to exert forces). As for which population of microtubules are affected, we hypothesize that the excess forces act primarily on overlapping antiparallel microtubules (these microtubules run laterally alongside chromosomes in this system), as is represented in the model figure (Figure 9); we attempted to more clearly explain this in the re-written manuscript.

      *2.) Based on differences between the long term and short term knockdown phenotypes, the authors suggest ZYG-8 is more important for spindle maintenance. For example, in line 299 the authors note that there is a more severe phenotypes with zyg-8 removed from pre-formed spindles. The authors could improve the presentation of this to allow the reader to appreciate this observation. The data is spread between Figures 2 and 3 without a direct comparison of the data. One solution would be to graph the data (eg # of poles) together in one graph and indicate if there is statistical significance. In the Discussion, the authors could refer to specific figure panels. *

      The reviewer is correct that our data suggests that ZYG-8 is more important for spindle maintenance than it is for assembly. As suggested, we made a graph that includes all the pole data from Figures 2 and 3 (long-term auxin, short-term auxin, and metaphase-arrested short-term auxin) - this is now shown in Figure S3D. This makes it easier for the reader to compare these data and appreciate this point. In addition, we added text to the results section, to more clearly explain our rationale for thinking that ZYG-8 plays a more prominent role in spindle maintenance than in assembly (lines 180-189 and 227-231).

      *3.) What is the practical difference between acute and short term depletion. Does acute show weaker phenotypes because there is more residual protein? Unfortunately, the effectiveness of Auxin treatment does not appear to be measured for acute or short term. If the acute depletion adds little to Figure 3, or is not much different than long term, then its not clear what it adds to the paper. Later, in Figure 6, why is only short term and acute analyzed. In general, the authors need to provide better rationale for the different auxin conditions, particularly acute and short term (eg. line 135). If they don't add anything, they should consider not presenting them because readers may get confused by the different conditions, why they were done, and what is learned from each one. *

      The reviewer brings up a fair point that we agree requires clarification. Descriptions of the different types of auxin experiments is provided in Figure 1A. Long-term AID depletes proteins overnight, so the protein of interest is already missing from the oocyte when the spindle begins to form - this allows us to assess whether the protein is required for spindle assembly. However, to determine if a protein is required to stabilize pre-formed spindles, we need to remove the protein quickly after the spindle forms (using either acute or short-term AID). Acute AID is performed by dissecting oocytes directly into auxin-containing media; this allows us to watch what happens to the spindle live, as the protein is being depleted. However, one limitation is that we can only film for a short time before the oocytes begin to die (oocytes become unhappy with extended light exposure, so we cut off the videos after 15 minutes or so, to ensure that we are not filming past the point where they begin to arrest or die). Therefore, to assess what happens to spindles beyond this point, we perform short-term auxin treatment, where whole worms are soaked in auxin containing solution for 30-45 minutes and then the oocytes are dissected for immunofluorescence; this technique allows us to look at what happens to the spindle after more extended protein depletion (since we are not limited to the 10-15 minute window of filming). We have now clarified this in the manuscript by adding these details to the materials and methods. Unfortunately, it is not technically possible to quantify the extent of protein depletion in acute AID via western blotting since we would not be able to easily collect enough dissected oocytes to make a protein sample. (It is also technically challenging to quantify this via imaging; see our response to Reviewer #2, point #5). However, we assume that we are depleting ZYG-8 since we see dramatic spindle defects immediately upon dissection into auxin.

      Minor points:

      *3.) I am a little confused about imaging for GFP::tubulin in auxin experiments. Doesn't the ZYG-8 protein also have GFP? Should this be visible in controls? Is it measurable in the experiments? *

      The reviewer is correct that the ZYG-8 protein is also tagged with GFP in the GFP::tubulin; mCherry::histone live imaging experiments. However, we found that the GFP::ZYG-8 signal is undetectable using the live imaging conditions we are using. We determined this by analyzing a version of the ZYG-8 AID strain in which tubulin was tagged with mCherry (and thus the only GFP-tagged protein was ZYG-8). Using the same live imaging parameters we use for our movies of GFP::tubulin (same exposure time, laser power, etc), we did not detect any GFP::ZYG-8. We have now added this information to the materials and methods (lines 552-561) to clarify these points for the reader, to prevent further confusion.

      *4.) It is nice that the authors validated the results in an emb-30 background with unarrested oocytes. The authors note that the wild-type oocytes undergo anaphase (line 150). The images seem to suggest the auxin treated oocytes do not. Can the authors comment on anaphase in the depletion experiments. Even better, would be to comment on the accuracy of chromosome alignment and segregation. If zyg-8 mutant oocytes complete meiosis, is there any aneuploidy? These are important questions because otherwise the defects in zyg-8 mutants have less significance. *

      We thank the reviewer for their comment. Previous work on ZYG-8 in C. elegans examined a role for ZYG-8 in anaphase and showed that this protein is required for anaphase B spindle elongation (McNally et.al. 2016); because this was known when we launched our study, we purposely did not extensively study ZYG-8 in anaphase and instead focused on understanding how ZYG-8 contributes to spindle formation and stability. Our fixed imaging long-term AID experiments revealed that spindles were able to go through anaphase and segregate chromosomes bidirectionally despite the metaphase spindle phenotypes, consistent with this previous work (McNally et.al. 2016) and with another recent paper from the same lab (McNally et.al. 2023). However, we did not examine whether there were chromosome segregation errors. Given that anaphase is not the focus of our paper, we ask that this be deemed beyond the scope of our study.

      5.) Later, in line 184, the authors indicate that zyg-8 bipolar spindles "segregate chromosomes". Which images show anaphase I? As noted above, a limitation of these studies is not knowing the outcome of meiosis in these Zyg-8 depletions.

      We agree that in the original manuscript it was difficult to see that chromosomes were segregating bidirectionally in our movies and in the still timepoint images presented in Figure 5. Therefore, we brightened the chromosome channel in the relevant videos to make it easier to see the segregating chromosomes. Video 6 shows an oocyte in Meiosis II, as the first polar body can be seen near the spindle in this movie. At 2 minutes, the monopolar spindle becomes bipolar and begins to shrink as it goes into anaphase. Chromosomes begin to move apart and then the spindle elongates. At 11 minutes, you can see that the chromosomes have segregated bidirectionally. Thus, when monopolar spindles reorganize into bipolar spindles under these conditions, they can drive bidirectional chromosome segregation. We did not assess the fidelity of chromosome segregation under these conditions (i.e., whether chromosomes segregated accurately), as the question we were trying to answer in this experiment was whether outward forces sufficient to re-establish bipolarity could be activated upon ZYG-8 depletion (as explained above in response to point #4, we focused our study on trying to understand the effects of ZYG-8 depletion on the spindle, rather than on anaphase). We agree that analyzing anaphase outcomes would be interesting, but we ask that it be considered beyond the scope of this study.

      *6.) Line 206 suggests that ZYG-8 inhibits BMK-1. Is a simple explanation that BMK-1 is required for the bipolar spindles observed in the klp-18 zyg-8 AID oocytes? *

      Yes, the reviewer is correct that BMK-1 is required for the generation of bipolar spindles in the klp-18(RNAi) ZYG-8 AID conditions. In the original manuscript we extrapolated this result to propose that ZYG-8 regulates BMK-1. However, this comment, as well as feedback from the other reviewers and our new experiments (showing that ZYG-8 also modulates microtubule dynamics) has made us re-think the way we discuss this result, as we now agree that it does not prove this regulation (it is only suggestive). Therefore, in the revised manuscript, we no longer definitely claim that ZYG-8 regulates BMK-1 - we have switched to softer language (stating that ZYG-8 "may regulate" BMK-1, etc.). In the results section we now describe our conclusions as follows: "These data demonstrate that BMK-1 produces the outward forces that are activated upon ZYG-8 and KLP-18 co-depletion and raise the possibility that ZYG-8 regulates BMK-1 either directly or indirectly" (lines 250-252).

      *7.) Given that many mitotic and meiotic kinases are localized to specific regions or domains of the spindle, there is only limited discussion of the ZYG-8 localization pattern. Does the ZYG-8 localization pattern provide any insights into its mechanism of promoting spindle assembly? *

      The reviewer makes a good point - while we did report ZYG-8 localization, the discussion on the importance of its localization pattern was limited. To address this, we now remind readers in the discussion that ZYG-8 and BMK-1 co-localize throughout meiosis, consistent with the possibility that ZYG-8 could regulate BMK-1. Notably, this localization pattern is also consistent with the observation that ZYG-8 modulates microtubule dynamics across the spindle; this is now also noted in the discussion (lines 358-361).

      *8.) Line 96-97 - how much is the ZYG-8 depletion? *

      To address this question, we have quantified the amount of ZYG-8 protein in our ZYG-8 AID strain in control, long-term, and short-term auxin treated conditions. The western blot was quantified by comparing the raw intensity of the bands and subtracting the background signal. Short-term auxin depletion resulted in an ~63% reduction in ZYG-8 GFP signal, and long-term depletion resulted in an ~93% reduction in ZYG-8 GFP signal. This has now been reported in the manuscript on lines 785-786.

      *9.) Line 140: the authors say spindle length could not be measured, but perhaps it makes more sense to measure half spindle (chromosome to spindle pole). The images do give the impression that the chromosome to pole distance is shorter. *

      While we liked this idea and tried to perform these measurements, it turned out to be difficult in practice, since the spindle length measurements are obtained by finding the distance from pole (center of the ASPM-1 staining) to center of the chromosome signal. If you look carefully at our images you will notice that the chromosomes lose alignment following short-term AID; therefore, the chromosomes do not form one mass, which made it very difficult to determine an accurate "center" of the DNA signal. Additionally, in most cases the poles are disrupted such that ASPM-1 is found in many separate masses and/or is diffusely localized around the periphery of the spindle. Because of this, we unfortunately felt that these measurements would not be very accurate and would be hard to interpret.

      *10.) Don't see the point of lines 323-330. Could be deleted? *

      In revising our manuscript, we have rephrased these lines in an attempt to provide more context. Because DCLK1 has been shown to be upregulated in a wide variety of cancers, there are ongoing efforts to find chemical inhibitors that specifically block the kinase activity of this protein to be used as cancer therapeutics. However, no one has previously shown that the kinase activity of DCLK1 is important for its in vivo function (in any organism). Therefore, we were trying to make the point that, since we demonstrated that kinase activity is important for the functions of a DCLK1 family member in vivo, this suggests that these kinase inhibitors may in fact be beneficial in knocking down DCLK1 activity.

      11.) Figure 1: Because ts alleles could have a defective phenotype at "permissive" temperature, a wild-type control should be included. This data does appear in a later figure.

      The reviewer is correct that this data does appear in a later figure, but we agree this direct comparison would provide clarity to the reader. To address this comment, we compared the spindle lengths and angles of the two temperature-sensitive (TS) strains (at both the permissive and restrictive temperatures) to wild-type (N2) worms - these data have been added to Figure S1 (new panels F and G). The spindle lengths of both TS strains at the permissive temperature did not significantly differ from wild-type spindle lengths (p>0.1), while both TS strains at the restrictive temperature were significantly different than wild-type (p0.1), but there was a significant difference between wild-type spindles and the TS mutants at the restrictive temperature (doublecortin domain mutant (p Reviewer #3 (Significance (Required)): The strengths of this paper are the novelty of studying Zyg-8. It also addresses important questions regarding acentrosmal spindle assembly in oocytes. The weakness is mostly in the limited interpretation of results and not enough consideration of alternative interpretations. Related to this, the authors only test the force balance hypothesis with the knockout of two related kinesins. They don't experimentally investigate other mechanisms for the zyg-8 phenotype. This research should be of broad interest to anyone interested in oocyte spindle assembly, and also in a more specialized way to those who study kinases or Zyg-8 homologs in other cell types or organisms.

      We thank the reviewer for these positive comments on the strengths and novelty of our manuscript. We also appreciate the constructive suggestions of all three reviewers, which motivated us to perform new experiments that revealed additional functions for ZYG-8 - these revisions have greatly improved the manuscript and have broadened its impact.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript Czajkowski et al explore the role of the doublecortin-family kinase ZYG-8 during meiosis in C. elegans Oocytes. First by studying available temperature-sensitive mutants and then by generating their own strain expressing ZYG-8 amenable to auxin-inducible degradation, they establish that defects in ZYG-8 lead to defects in spindle assembly, such as the formation of multipolar spindles, and spindle maintenance, in which spindles elongate, fall apart, and deform in meiosis. Based on these observations the authors conclude that ZYG-8 depletion leads to excessive outward force. As the lab had previously found that the motor protein KLP-18 generates outside directed forces in meiosis, Czajkowski et al initially speculate that ZYG-8 might regulate KLP-18. KLP-18 depletion generally leads to the formation of monopolar spindles in meiosis. Intriguingly, when the authors co-deplete ZYG-8 they find that in some cases bipolarity was reestablished. This led to the hypothesis that yet another kinesin, BMK-1, the homolog of the mammalian EG-5, could provide redundant outward directed forces to KLP-18. The authors then study the effect of ZYG-8 and KLP-18 co-depletion in a BMK-1 mutant background strain and observe that bipolarity is no longer reestablished under these conditions, suggesting that BMK-1 generates additional outward directed forces. The authors also conclude that ZYG-8 inhibits BMK-1. To follow up on this Czajkowski et al generate a ZYG-8 line that carries a mutation in the kinase domain, which should inhibit its kinase activity. This line shows similar effects in terms of spindle elongation but reduced impact on spindle integrity, reflected in minor effects on the number of spindle poles and spindle angle. The authors conclude that ZYG-8's kinase activity is required for the function of ZYG-8 in meiosis and mitosis.

      • Overall, the paper is well written, and the data is presented very clearly and reproducible. The experiments are adequately replicated, and statistical analysis are adequate. The observations are very interesting. However, the authors could provide some additional insight into the function of ZYG-8. This paper is strongly focused on motor generated forces within the spindle and tries to place ZYG-8 within this context, but there is compelling evidence from other studies that ZYG-8 also affects microtubule dynamics, which would have implications for spindle assembly and structure. The paper would strongly benefit from the authors exploring this role of ZYG-8 in the context of meiosis further. If the authors feel that this would extend beyond the scope of this paper, I would suggest that the authors rephrase some of their introduction and discussion to reflect the possibility that changes in microtubule growth and nucleation rates could explain some of the phenotypes (think of katanin) and effects and that therefore it can not necessarily be concluded that BMK-1 is inhibited by ZYG-8.

      Major comments:

      1) Zyg-8, as well as the mammalian homolog DCLK-1, has been reported to play an important role for microtubule dynamics. While the introduction mentions its previously shown role in meiosis and mitosis, it is totally lacking any background on the effect on microtubule dynamics. The authors mention these findings in the discussion, but it would be helpful to incorporate this in the introduction as well. As an example, Goenczy et al 2001 demonstrated that ZYG-8 is involved in spindle positioning but also showed its ability to bind microtubules and promote microtubule assembly. Interestingly, like the authors here, Goenczy et al concluded that while the kinase domain contributes to, it is not essential ZYG-8's function. Also, Srayko et al 2005 (PMID 16054029) demonstrated that ZYG-8 depletion led to reduced microtubule growth rates and increased nucleation rates in C. elegans mitotic embryos. And in mammalian cells DCLK-1 was shown to increase microtubule nucleation rate and decrease catastrophe rate, leading to a net stabilization of microtubules (Moores et al 2006, PMID: 16957770).

      It would be great if the authors could add to the introduction that ZYG-8 has been suggested to affect microtubule dynamics.

      2) The authors initially study two different ts alleles, or484ts and b235ts. The experiments clearly show a significant increase in spindle length in both strains. However, the or484 strain had been previously studied (McNally et al 2016, PMID: 27335123), and only minor effects on spindle length were reported (8.5µm in wt metaphase and 10µm in zyg-8 (or484)). How do the authors explain these differences in ZYG-8 phenotype. Even though the ZYG-8 phenotype is consistent throughout this paper it would be good to explain why the authors observe spindle elongation, fragmentation and spindle bending in contrast to previous observations.

      As a general note, it would be helpful if the authors could indicate if the spindles are in meiosis I or II. The only time where this is specifically mentioned is in Video 7, showing a Meiosis II spindle, which makes me assume all other data is in Meiosis I. Adding this to the figures would also help to distinguish if some of the images, i.e. Figure 1B, show multipolar spindles due to failed polar body extrusion. If this is the case then the quantification of number of poles should maybe reflect different possibilities, such as fragmented poles vs. multiple poles because two spindles form around dispersed chromatin masses.

      3) The authors generate a line that carries a mutation leading to a kinase dead version of ZYG-8. It would be great if the authors could further test if this version is truly kinase dead. What is interesting is that the kinase dead version the authors create has less effect on the numbers of pole than the zyg-8 (b235)ts strain, which carries a mutation in a less conserved kinase region. Overall, it seems that the phenotypes are very similar, independent on mutations in the microtubule binding area, kinase area or after AID. This could of course be due to all regions being important, i.e. microtubule binding is required for localizing kinase-activity. Generating mutant versions of the target proteins, for example here BMK-1, that can not be phosphorylated or are constitutively active as well as assessment of changes in protein phosphorylation levels in the kinase dead strain would be helpful to provide deeper insight into potential regulation of proteins by ZYG-8.

      4) The authors state that "BMK-1 provides redundant outward force to KLP18". Redundancy usually suggests that one protein can take over the function of another one when the other is not there. In these scenarios a phenotype is often only visible when both proteins are depleted as each can take over the function of the other one. Here however the situation seems slightly different, as depletion of BMK-1 has no phenotype while depletion of KLP-18 leads to monopolar spindles. If BMK-1 would normally provide outward directed forces, would this not be visible in KLP-18 depleted oocytes if they were truly redundant? I assume the authors hypothesize that ZYG-8 inhibits BMK-1 and thus it can not generate outward directed forces. In this case, do the authors envision that ZYG-8 inhibits BMK-1 prior to or in metaphase or only in anaphase or throughout meiosis? Do they speculate, that BMK-1 is inhibited in anaphase and only active in metaphase? In addition, Figure S4 somewhat argues against a role for ZYG-8 in regulating BMK-1. ZYG-8 depletion supposedly leads to increased outward forces due to loss of BMK-1 inhibition, thus co-depletion of ZYG-8 with BMK-1 should rescue the increased spindle size at least to some extent, however neither increase in spindle length nor increase in additional spindle pole formation are prevented by co-depletion of BMK-1 suggesting that BMK-1 is not generating the forces leading to spindle length increase. Thus, arguing that after all ZYG-8 does not regulate BMK-1. This should be discussed further in the paper and the authors should consider changing the title. At this point the provided evidence that ZYG-8 is regulating motor activity is not strong enough to make this claim.

      5) The authors are proposing that ZYG-8 regulates/ inhibits BMK-1, however convincing evidence for an inhibition is not provided in my opinion and the effect of ZYG-8 on BMK-1 could be indirect. To make a compelling argument for a regulation of BMK-1 the authors would have to investigate if ZYG-8 interacts and/ or phosphorylates BMK-1 (see 7) and if this affects its dynamics. In addition, given the reported role of ZYG-8 on microtubule dynamics it would be very important that the authors consider studying the effect of ZYG-8 degradation on microtubule dynamics. Tracking of EBP-2 would be good, however this is very difficult to do inside meiotic spindles due to their small size. In addition, the authors could maybe consider some FRAP experiments, which could provide insights into microtubule dynamics and motions, which could be indicative of outward directed forces/ sliding.

      Summary:

      Additional requested experiments:

      • Interaction/ phosphorylation of BMK-1 by ZYG-8, i.e. changes of BMK-1 phosphorylation in absence of ZYG-8, BMK-1 mutations that may prevent phosphorylation by ZYG-8. -Assessment of microtubule dynamics (EBP-2, FRAP, length in monopolar spindles...) -Kinase activity of the kinase dead ZYG-8 strain (OPTIONAL)

      Minor comments:

      1) In Figure 4C it seems that the ZYG-8 AID line as well as the zyg-8 (or848)ts already have a phenotype (increased ASPM-1 foci) in absence of auxin/ at the permissive temperature. Does this suggest that the ZYG-8 AID as well as the zyg-8 (or848) strains are after all slightly defective (even if Figure 1, S1 and S2 argue otherwise) and thus more responsive to the loss of KLP-18?

      2) The authors observe that in preformed monopolar spindles degradation of ZYG-8 can sometimes restore bipolarity. This observation is very interesting but why do the authors not observe a similar phenotype in long-term ZYG-8 AID; klp-18 (RNAi) or zyg-8(or484)ts; klp-18(RNAi). In the latter conditions bipolarity does not seem to occur at all. Do the authors think this is due to differences in timing of events?

      3) Based on the Cavin-Meza 2022 paper it looks like depletion of KLP-18 in a BMK-1 mutant background does not look different from klp-18 (RNAi) alone. However, looking at Video 8, it looks like spindles "shrink" in absence of KLP-18 and BMK-1. Or is this due to any effects from the ZYG-8 AID strain? This can also be seen in Video 9.

      4) Line 311: " ZYG-8 loads onto the spindle along with BMK-1, and functions to inhibit BMK-1 from over elongating microtubules during metaphase." Maybe this sentence could be re-phrased as it currently sounds like BMK-1 elongates (polymerizes) microtubules.

      5) Line 313: "Intriguingly, in C. elegans oocytes and mitotically-dividing embryos, BMK-1 inhibition causes faster spindle elongation during anaphase, suggesting that BMK-1 normally functions as a brake to slow spindle elongation (Saunders et al., 2007; Laband et al., 2017). Further, ZYG-8 has been shown to be required for spindle elongation during anaphase B (McNally et al., 2016). Our findings may provide an explanation for this phenotype, since if ZYG-8 inhibits BMK-1 as we propose, then following ZYG-8 depletion, BMK-1 could be hyperactive, slowing anaphase B spindle elongation." This paragraph could be modified for better clarity. It is not clear how the findings of the authors, BMK-1 provides outward force but is normally inhibited by ZYG-8, align with the last sentence saying "following zyg-8 depletion, BMK-1 could be hyperactive slowing anaphase B spindle elongation", should it not increase elongation according to the authors observations?

      Significance

      In this manuscript Czajkowski et al explore the role of the doublecortin-family kinase ZYG-8 during meiosis in C. elegans Oocytes. The authors conclude that BMK-1 generates outward directed force, redundant to forces generated by KLP-18, and that ZYG-8 inhibits BMK-1. The authors conclude that ZYG-8's kinase activity is required for the function of ZYG-8 in meiosis and mitosis. This research is interesting and provides some novel insight into the role of ZYG-8. Inm particular the observed spindle elongation and subsequent spindle fragmentation are novel and had not yet been reported. Also, the observation that degradation of ZYG-8 in monopolar klp-18(RNAi) spindles can restore bipolarity is novel and interesting, as well as the observation that this is somewhat dependent on the presence of BMK-1. This will be of interest to a broad audience and provides some new insight into the role of importance of ZYG-8 and BMK-1. The limitation of the study is the interpretation of the results and the lack of solid evidence that the observed phenotypes are due to ZYG-8 regulation motor activity, as the title claims. To support this some more experiments would be required. In addition, ZYG-8 has been reported to affect microtubule dynamics, which can certainly affect the action of motors on microtubules. This line of research is not explored in the paper but would certainly add to its value.

      Field of expertise: Research in cell division

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to extend our gratitude to the reviewers for their meticulous analysis and constructive feedback on our manuscript. We have revised our paper based on the suggestions regarding supporting literature and the theory behind CAPs along with detailed insights regarding our methods. Their suggestions have been extremely useful in strengthening the clarity and rigor of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) There are no obvious problems with this paper and it is relatively straightforward. There are some challenges that I would like to suggest. These variants have multiple mutations, so it would be interesting if you could drill down to find out which mutation is the most important for the collective changes reported here. I would like to see a sequence alignment of these variants, perhaps in the supplemental material, just to get some indication of the extent of mutations involved.

      Finding the most important mutation within a set is a tricky question, as each mutation changes the way future mutations will affect function due to epistasis. Indeed, this is what we aim to explore in this work. To illustrate this point, we included a new supplementary figure S5A. Three critical mutations that emerged quickly, and were frequently observed in other dominant variants, were S477N, T478K, and N501Y. Thus, we computed the EpiScore values of these three mutations, with several critical residues contributing to hACE2 binding. The EpiScore distribution indicates that residues 477, 478, and 501 have strong epistatic (i.e., non-additive) interactions, as indicated by EpiScore values above 2.0.

      To further investigate these epistatic interactions, we first conducted MD simulations and computed the DFI profile of these three single mutants. We analyzed how different the DFI scores of the hACE2 binding interface residues of the RBD are, across three single mutants with Omicron, Delta, and Omicron XBB variants (Fig S5B). Fig S5B shows how mutations at these particular sites affect the binding interface DFI in various backgrounds, as the three mutations are also observed in the Omicron, XBB, and XBB 1.5 variants. If the difference in the DFI profile of the mutant and the given variant is close to 0, then we could safely state that this mutation affected the variant the most. However, what we observe is quite the opposite: the DFI profile of the mutation is significantly different in different variant backgrounds. While these mutations may change overall behavior, their individual contributions to overall function are more difficult to pin down because overall function is dependent on the non-additive interactions between many different residues.

      Author response image 1.

      (A) Three critical mutations that emerged quickly, and were frequently observed in other dominant variants, were S477N, T478K, and N501Y. EpiScores of sites 477, 478, and 501 with one another are shown with k = the binding interface of the open chain. These residues are highly epistatic, producing higher responses than expected when perturbed together. (B) The difference in the dynamic flexibility profiles between the single mutants and the most common variants for the hACE2 binding residues of the RBD. DFI profiles exhibit significant variation from zero, and also show different flexibility in each background variant, highlighting the critical non-additive interactions of the other mutation in the given background variant. Thus, these three critical mutations, impacting binding affinity, do not solely contribute to the binding. There are epistatic interactions with the other mutations in VOCs that shape the dynamics of the binding interface to modulate binding affinity with hACE2.

      As we discussed above, while the epistatic interactions are crucial and the collective impact of the mutations shape the mutational landscape of the spike protein, we would like note that mutation S486P is one of the critical mutations we identify, modulating both antibody and hACE2 binding and our analysis reveals the strong non-additive interactions with the other mutational sites. This mutational site appears in both XBB1.5 and earlier Omicron strains which highlights its importance in functional evolution of the spike protein. CAPs 346R, 486F, and 498Q also may be important, as they have a high EpiScore, indicating critical epistatic interaction with many mutation sites.

      Regarding to the suggestion about presenting the alignment of the different variants, we have attached a mutation table, highlighting the mutated residues for each strain compared to the reference sequence as supplemental Figure S1 along with the full alignment file.

      (2) Also, I am wondering if it would be possible to insert some of these flexibilities and their correlations directly into the elastic network models to enable a simpler interpretation of these results. I realize this is beyond the scope of the present work, but such an effort might help in understanding these relatively complex effects.

      This is great suggestion. A similar analysis has been performed for different proteins by Mcleash (See doi: 10.1016/j.bpj.2015.08.009) by modulating the spring constants of specific position to alter specific flexibility and evaluate change in elastic free energy to identify critical mutation (in particular, allosteric mutation) sites. We will be happy to pursue this as future work.

      Minor

      (3) 1 typo on line 443 - should be binding instead of biding.

      Fixed, thanks for spotting that.

      (4) The two shades of blue in Fig. 4B were not distinguishable in my version.

      To fix this, we have changed the overlapping residues between Delta and Omicron to a higher contrast shade of blue.

      (5) Compensatory is often used in an entirely different way - additional mutations that help to recover native function in the presence of a deleterious mutation.

      Although our previous study (Ose et al. 2022, Biophysical Journal) shows that compensatory mutations were generally additive, the two ideas are not one and the same. We thank the reviewer for pointing this out. Therefore, to clarify, we have now described our results in terms of dynamic additivity, rather than compensation.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors note that the identified CAPs overlap with those of others (Cagliani et al. 2020; Singh and Yi 2021; Starr, Zepeda, et al. 2022). In itself, this merits a deeper discussion and explicit indication of which positions are not identified. However, there is one point that I believe may represent a fundamental flaw in this study in that the calculation of EP from the alignment of S proteins ignores entirely the differences in the interacting interface with which S for different coronaviruses in the alignment interact in the different receptors in each host species. This may be the reason why so many "CAPs" are in the RBD. The authors should at the very least make a convincing case of why they are not simply detecting constraints imposed by the different interacting partners, at least in the case of positions within the RBD interface with ACE2. Another point that the authors should discuss is that ACE2 is not the only receptor that facilitates infection, TMPRSS2 and possibly others have been identified as well. The results should be discussed in light of this.

      To begin with, we have now explicitly noted (on line 135) that “sites 478, 486, 498, and 681 have already been implicated in SARS-CoV-2 evolution, leaving the remaining 11 CAPs as undiscovered candidate sites for adaptation.” Evolutionary analyses are done using orthologous protein sequences, so there is no way to integrate information on different receptors in each host species in the calculation of EPs. However, we appreciate that the preponderance of CAPs in the RBD is likely due to different binding environments. We have added the following text (on line 83) to clarify our point: “Adaptation in this case means a virus which can successfully infect human hosts. As CAPs are unexpected polymorphisms under neutral theory, their existence implies a non-neutral effect. This can come in the form of functional changes (Liu et al. 2016) or compensation for functional changes (Ose et al. 2022). Therefore, we suspect that these CAPs, being unexpected changes from coronaviruses across other host species with different binding substrates, may be partially responsible for the functional change of allowing human infection.” This hypothesis is supported by the overlap of CAPs we identified with the positions identified in other studies (e.g., 478, 486, 498, and 681). Binding to TMPRSS2 and other substrates are also covered by this analysis as it is a measure of overall evolutionary fitness, rather than binding to any specific substrate. Our paper does focus on discussing hACE2 binding and mentions furin cleavage, but indeed lacks discussion on the role of TMPRSS2. We have added the following text to line 157: “Another host cell protease, TMPRSS2, facilitates viral attachment to the surface of target cells upon binding either to sites Arg815/Ser816, or Arg685/Ser686 which overlaps with the furin cleavage site 676-689, further emphasizing the importance of this area (Hoffmann et al. 2020b; Fraser et al. 2022).”

      (2) Turning now to the computational methods utilized to study dynamics, I have serious reservations about the novelty of the results as well as the validity of the methodology. First of all, the authors mention the work of Teruel et al. (PLOS Comp Bio 2021) in an extremely superficial fashion and do not mention at all a second manuscript by Teruel et al. (Biorxiv 2021.12.14.472622 (2021)). However, the work by Teruel et al. identifies positions and specific mutations that affect the dynamics of S and the evolution of the SARS-CoV-2 virus in light of immune escape, ACE2 binding, and open and closed state dynamics. The specific differences in approach should be noted but the results specifically should be compared. This omission is evident throughout the manuscript. Several other groups have also published on the use of nomal-mode analysis methods to understand the Spike protein, among them Verkhivker et al., Zhou et al., Majumder et al., etc.

      Thank you for your suggestions. Upon further examination of the listed papers, we have added citations to other groups employing similar methods. However, it's worth noting that the results of Teruel et al.'s studies are generally not directly comparable to our own. Particularly, they examine specific individual mutations and overall dynamical signatures associated with them, whereas our results are always considered in the context of epistasis and joint effects with CAPs, and all mutations belong to the common variants. Although important mutations may be highlighted in both cases, it is for very different reasons. Nevertheless, we provide a more detailed mention of the results of both studies. See lines 178, 255, and 393.

      (3) The last concern that I have is with respect to the methodology. The dynamic couplings and the derived index (DCI) are entirely based on the use of the elastic network model presented which is strictly sequence-agnostic. Only C-alpha positions are taken into consideration and no information about the side-chain is considered in any manner. Of course, the specific sequence of a protein will affect the unique placement of C-alpha atoms (i.e., mutations affect structure), therefore even ANM or ENM can to some extent predict the effect of mutations in as much as these have an effect on the structure, either experimentally determined or correctly and even incorrectly modelled. However, such an approach needs to be discussed in far deeper detail when it comes to positions on the surface of a protein such that the reader can gauge if the observed effects are the result of modelling errors.

      We would like to clarify that most of our results do not involve simulations of different variants, but rather how characteristic mutation sites for those variants contribute to overall dynamics. For the full spike, we operate on only two simulations: open and closed. When we do analyze different variants, starting on line 438, the observed difference does not come from the structure, but from the covariance matrix obtained from molecular dynamics (MD) simulations, which are sensitive to single amino acid changes.

      Reviewer #3 (Recommendations For The Authors):

      (1) On line 99 there is a misspelling, 'withing'.

      It has been fixed. Thanks for spotting that.

      (2) Some graphical suggestions to make the figures easier to read:

      In Figure 1C, a labeled circle around the important sites, the receptor binding domain, and the Furin cleavage site, would help the reader orient themselves. Moreover, it would make clear which CAPs are NOT in the noteworthy sites described in the text.

      Good idea. We have added transparent spheres and labels to show hACE2 binding sites and Furin cleavage sites.

      In Figure 2C the colors are a bit low contrast; moreover, there are multiple text sizes on the same figure which should perhaps be avoided to ensure legibility.

      We have made yellow brighter and standardized font sizes.

      Figure 3 is a bit dry, perhaps indicating in which bins the 'interesting' sites could be informative.

      Thank you for the suggestion, but the overall goal of Figure 3 is to illustrate that the mutational landscape is governed by the equilibrium dynamics in which flexible sites undergo more mutations during the evolution of the CoV2 spike protein. Therefore, adding additional positional information may complicate our message.

      Figure 4, the previous suggestions about readability apply.

      We ensured same sized text and higher contrast colors.

      Figure 5B, the residue labels are too small.

      We increased the font size of the residue labels.

      In Figure 8 maybe adding Delta to let the reader orient themselves would be helpful to the discussion.

      Unfortunately, there is no single work that has experimentally quantified binding affinities towards hACE2 for all the variants. When we conducted the same analysis for the Delta variant in Figure 8, the experimental values were obtained from a different source (doi: 10.1016/j.cell.2022.01.001) and the values were significantly different from the experimental work we used for Omicron (Yue et al. 2023). When we could adjust based on the difference in experimentally measured binding affinity values of the original Wuhan strain in these two separate studies, we observed a similar correlation, as seen below. However, we think this might not be a proper representation. Therefore, we chose to keep the original figure.

      Author response image 2.

      The %DFI calculations for variants Delta, Omicron, XBB, and XBB 1.5. (A) %DFI profile of the variants are plotted in the same panel. The grey shaded areas and dashed lines indicate the ACE2 binding regions, whereas the red dashed lines show the antibody binding residues. (B) The sum of %DFI values of RBD-hACE2 interface residues. The trend of total %DFI with the log of Kd values overlaps with the one seen with the experiments. (C) The RBD antibody binding residues are used to calculate the sum of %DFI. The ranking captured with the total %DFI agrees with the susceptibility fold reduction values from the experiments.

      (3) Replicas of the MD simulations would make the conclusions stronger in my opinion.

      We ran a 1µs long simulation and performed convergence analysis for the MD simulations using the prior work (Sawle L, Ghosh K. 2016.) More importantly, we also evaluated the statistical significance of computed DFI values as explained in detail below (Please see the answer to question 3 of Reviewer #3 (Public Review):)

      Reviewer #3 (Public Review):

      (1) A longer discussion of how the 19 orthologous coronavirus sequences were chosen would be helpful, as the rest of the paper hinges on this initial choice.

      The following explanation has been added on line 114: EP scores of the amino acid variants of the S protein were obtained using a Maximum Likelihood phylogeny (Kumar et al. 2018) built from 19 orthologous coronavirus sequences. Sequences were selected by examining available non-human sequences with a sequence identity of 70% or above to the human SARS CoV-2’s S protein sequence. This cutoff allows for divergence over evolutionary history such that each amino acid position had ample time to experience purifying selection, whilst limiting ourselves to closely related coronaviruses. (Figure 1A).

      (2) The 'reasonable similarity' with previously published data is not well defined, nor there was any comment about some of the residues analyzed (namely 417-484). We have revised this part of the manuscript and add to the revised version.

      We removed the line about reasonable similarity as it was vague, added a line about residues 417-484, and revised the text accordingly, starting on line 354.

      (3) There seem to be no replicas of the MD simulations, nor a discussion of the convergence of these simulations. A more detailed description of the equilibration and production schemes used in MD would be helpful. Moreover, there is no discussion of how the equilibration procedure is evaluated, in particular for non-experts this would be helpful in judging the reliability of the procedure.

      We opted for a single, extended equilibrium simulation to comprehensively explore the longterm behavior of the system. Given the specific nature of our investigation and resource constraints, a well-converged, prolonged simulation was deemed a practical and scientifically valid approach, providing a thorough understanding of the system's dynamics. (doi: 10.33011/livecoms.1.1.5957, https://doi.org/10.1146/annurev-biophys-042910-155255 )

      We updated our methods section starting on line 605 with extended information about the MD simulations and the converge criteria for the equilibrium simulations. We also added a section that explains our analysis to check statistical significance of obtained DFI values.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study addresses the long-term effect of warming and altered precipitation on microbial growth, as a proxy for understanding the impact of global warming. While the methods are compelling and the evidence supporting the claims is solid, additional analysis of the data would strengthen the study, which should be of broad interest to microbial ecologists and microbiologists.

      We sincerely appreciate your assessment and thoughtful comments, which are valuable and very helpful for improving our manuscript. We have carefully considered all comments, and made extensive, thorough corrections and additional analysis of the data, which we hope to meet with approval.

      Reviewer #1 (Public Review):

      Warming and precipitation regime change significantly influences both above-ground and below-ground processes across Earth's ecosystems. Soil microbial communities, which underpin the biogeochemical processes that often shape ecosystem function, are no exception to this, and although research shows they can adapt to this warming, population dynamics and ecophysiological responses to these disturbances are not currently known. The Qinghai-Tibet Plateau, the Third Pole of the Earth, is considered among the most sensitive ecosystems to climate change. The manuscript described an integrated, trait-based understanding of these dynamics with the qSIP data. The experimental design and methods appear to be of sufficient quality. The data and analyses are of great value to the larger microbial ecological community and may help advance our understanding of how microbial systems will respond to global change. There are very few studies in which the growth rates of bacterial populations from multifactorial manipulation experiments on the Qinghai-Tibet Plateau have been investigated via qSIP, and the large quantity of data that comprises the study described in this manuscript, will substantially advance our knowledge of bacterial responses to warming and precipitation manipulations.

      We appreciate the encouragement and positive comments.

      Specific comments:

      (1) Please add some names of microbial groups with most common for the growth rates.

      We have added the sentence “The members in Solirubrobacter and Pseudonocardia genera had high growth rates under changed climate regimes” In the Abstract (Line 57-58).

      (2) L47-48, consider changing "microbial growth and death" to "microbial eco-physiological processes (e.g., growth and death)", and changing "such eco-physiological traits" to "such processes".

      Done (Line 47 and 48).

      (3) L50-51, the author estimated bacterial growth in alpine meadow soils of the Tibetan Plateau after warming and altered precipitation manipulation in situ. Actually, the soil samples were collected and incubated in the laboratory rather than in the field like the previous experiment conducted by Purcell et al. (2021, Global Change Biology). "In situ" would lead me to believe that the qSIP incubation was conducted in the field, so I think the use of the word in situ is inappropriate here. [https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15911]

      Agreed. We have deleted “in situ”.

      (4) L52, what does "interactive global change factors" mean?

      We have revised this sentence to “the growth of major taxa was suppressed by the single and combined effects of temperature and precipitation” (Line 52-53).

      (5) L61, in my opinion, "Microbial diversity" belongs to the category of species composition, rather than ecosystem functional services. Please revise it.

      Agree. We have deleted it.

      (6) L69, consider changing "further" to "thus".

      Done (Line 70).

      (7) L82, delete "The evidence is overwhelming that".

      Done.

      (8) L85-90, these two sentences have similar meanings, please express them concisely.

      We have deleted the sentence “Altered precipitation, particularly drought or heavy precipitation events, also tends to negatively influence soil processes and biodiversity”.

      (9) L91, the effect of drought on soil microorganisms is lacking here.

      We have added the sentence “Reduced precipitation affects soil processes notably by directly stressing soil organisms, and also altering the supply of substrates to microbes via dissolution, diffusion, and transport” in the Introduction (Line87-89).

      (10) L102, "Growth" should be highlighted here, as changes in relative abundance can also be classified as population dynamics. The use of the term "population dynamics" will eliminate the highlight of this study in calculating the growth rate of microbial species in in-situ soil based on qSIP. Consider changing "population dynamics" to "population-growth responses" or something like that.

      Done (Line 98).

      (11) L105, please note that this citation focuses on plant physiological characteristics.

      We have revised the reference (Line 102).

      (12) L115, "soil temperature, water availability" should be considered as a direct impact of climate change, rather than an indirect impact on microorganisms.

      We have deleted them.

      (13) L134-135, please clarify the interaction types between which climate factors.

      We have deleted this sentence.

      (14) L135-138, suggest modifying or deleting this sentence. The results in this study are already eco-physiological data and do not need to be further "understood and predicted".

      We have deleted this sentence.

      (15) L150, "The experimental design has been described in previously". I think this refers to another study and not the actual incubations in this study. Also in L198, suggest a change to "Incubation conditions were similar to those previously described". So, it's clear it's not the same experiment.

      We have revised these sentences to “has been described previously in (Ma et al., 2017)” (Line 136) and “according to a previous publication” (Line 194).

      Reference:

      Ma, Z., Liu, H., Mi, Z., Zhang, Z., Wang, Y., Xu, W. et al. (2017). Climate warming reduces the temporal stability of plant community biomass production. Nature Communications, 8, 15378.

      (16) L188, change "pre-wet soil samples" to "pre-wet samples" and change "soil samples for 48h incubation" to "incubation samples". What does "pre-wet" mean? Does it represent soil pre-cultivation?

      Done. The pre-wet samples, i.e., the soil samples before incubation (T = 0 d), were used to estimate the initial microbial composition. "pre-wet" does not mean soil pre-cultivation. We have added the description “A portion of the air-dried soil samples was taken as the pre-wet treatment (i.e., before incubation without H2O addition)” in MATERIALS AND METHODS (Line 174-175).

      (17) Unify the time unit of incubation (hour or day). Consider changing "48 h" to "2 d" in Materials and Methods.

      Done.

      (18) L247, what version of RDP Classifier was used?

      We used RDP v16 database for taxonomic annotation. We have added this information in the revision (Line 246).

      (19) L270, "average molecular weights".

      Done (Line 268).

      (20) L272-275, based on the preceding description, it appears that the culture period was limited to 48 hours. Please confirm it.

      Apologize for this mistake. We have revised it (Line 273).

      (21) L297, switch the order of the first two sentences of this paragraph.

      Done (Line 297).

      (22) L331, change "smaller-than-additive" to "smaller than their expected additive effect".

      Done (Line 331).

      (23) L374 and 381, I struggle with why "larger combined effects" than single factor effects represent higher degree of antoninism, and I think it should be "smaller combined effects".

      Agree. We have revised it according to this suggestion (Line 369 and 374).

      (24) L375, remove "than that of drought and warming".

      Done.

      (25) L405, simplify the expression, change "between different warming and rainfall regimes" to "between climate regimes"

      We have deleted this sentence.

      (26) L406-408, species are already on the phylogenetic tree and they can not "clustered at the phylogenetic branches", but the functional traits of microbes can. Please revise it.

      We have revised this sentence to “Overall, the most incorporators whose growth was influenced by the antagonistic interaction of T × P showed significant phylogenetic clustering (i.e., species clustered at the phylogenetic branches; NTI > 0, P < 0.05)” (Line 402-404).

      (27) L409, the same as above, and consider removing "The incorporators subjected to". We have revised this sentence to “The incorporators whose growth subjected to the additive interaction of warming × drought also showed significant phylogenetic clustering (P < 0.05)” (Line 404-406).

      (28) L412, consider changing "incorporators subjected to the synergistic interaction" to "the synergistic growth responses under multifactorial changes".

      We have revised the sentence to “incorporators whose growth is influenced by the synergistic interaction showed phylogenetically random distribution under both climate scenarios (P > 0.05)” (Line 407-409).

      (29) L505-506, please add a reference for this sentence.

      Done (Line 488).

      (30) L511-514, It should be noted that the production of MBC does not necessarily imply a net change in the C pool size. The accelerated growth rates may result in expedited turnover of MBC, rather than an increase in carbon sequestration.

      Thanks. We have deleted this sentence.

      (31) Language precision. In the discussion section there must be some additional caveats introduced to some of the claims the authors are making. For instance, L518, the author should clarify that "in this study, the bacterial growth in alpine grassland may be influenced by antagonistic interactions between multiple climatic factors after a decadal-long experiment". Because other studies may exhibit different results due to the focus on different ecosystem functions as well as environmental conditions. As such, softening of the language is recommended- lines are noted below- and these will not adjust the outcomes of this study, but support more precise interpretation.

      We have revised the sentence to “In this study, a decade-long experiment revealed that bacterial growth in alpine meadows is primarily influenced by the antagonistic interaction between T × P” (Line 497-499).

      (32) Picrust analysis is a good way to connect species and their functions, especially Picrust2, which updated the reference database and optimized the algorithm to improve its prediction accuracy (Douglas et al., 2020, Nature Biotechnology). However, the link between microbial taxonomy and microbial metabolism is still not straightforward, especially in diverse microbial communities like soils. The authors should introduce caveats within discussion that they know the limitations of their methods. For context, as a reader who does metabolisms in soils, I found myself somewhat disappointed when piecrust data was introduced and not properly caveated. Particularly, it might be helpful to introduce briefly in the last paragraph of the results. These caveats are necessary to not potentially overstate the author's findings, and to make sure the reader knows the authors understand the very clear limitations of these methods. [https://www.nature.com/articles/s41587-020-0548-6]

      Thanks. We have introduced caveats in DISCUSSION, that is “This is, however, still to be verified, as the functional output from PICRUSt2 is less likely to resolve rare environment-specific functions (Douglas et al., 2020)” (Line 540-542).

      Reference:

      Douglas, G., Maffei, V., Zaneveld, J., Yurgel, S., Brown, J., Taylor, C. et al. (2020). PICRUSt2 for prediction of metagenome functions. Nature Biotechnology, 38, 1-5.

      (33) Although the author has explained the potential causes for the negative effects of different climate change factors (i.e., warming, drought, and wet) on microbial growth, there seems to be a lack of a summary assertion and an extension on how climate change affects microbial growth and related ecosystem functions. It is recommended to make a general summary of the results in the last part of Discussion.

      We have added a general summary in the last paragraph of DISCUSSION, that is “Our results demonstrated that both warming and altered precipitation negatively affect the growth of grassland bacteria; However, the combined effects of warming and altered precipitation on the growth of ~70% soil bacterial taxa were smaller than the single-factor effects, suggesting antagonistic interaction. This suggests the development of multifactor manipulation experiments in precise prediction of future ecosystem services and feedbacks under climate change scenarios” (Line 552-558).

      (34) L546, please add the taxonomic information for "OTU 14".

      Done (Line 533).

      (35) L800, change "The phylogenetic tree" to "A phylogenetic tree".

      Done (Line 762).

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to describe the effect of different temperature and precipitation regimes on microbial growth responses in an alpine grassland ecosystem using quantitative 18O stable isotope probing. It was found that all climate manipulations had negative effects on microbial growth, and that single-factor manipulations exerted larger negative effects as compared to combined-factor manipulations. The degree of antagonism between factors was analyzed in detail, as well as the differential effect of these divergent antagonistic responses on microbial taxa that incorporated the isotope. Finally, a hypothetical functional profiling was performed based on taxonomic affiliations. This work gives additional evidence that altered warming and precipitation regimes negatively impact microbial growth.

      Strengths:

      A long term experiment with a thorough experimental design in apparently field conditions is a plus for this work, making the results potentially generalisable to the alpine grassland ecosystem. Also, the implementation of a qSIP approach to determine microbial growth ensures that only active members of the community are assessed. Finally, particular attention was given to the interaction between factors and a robust approach was implemented to quantify the weight of the combined-factor manipulations on microbial growth.

      We appreciate the reviewer’s positive comments.

      Weaknesses:

      The methodology does not mention whether the samples taken for the incubations were rhizosphere soil, bulk soil or a mix between both type of soils. If the samples were taken from rhizosphere soil, I wonder how the plants were affected by the infrared heaters and if the resulting shadow (also in the controls with dummy heaters) had an effect on the plants and the root exudates of the parcels as compared to plants outside the blocks? If the samples were bulk soil, are the results generalisable for a grassland ecosystem? In my opinion, it is needed to add more info on the origin of the soil samples and how these were taken.

      The samples taken for the incubations can be considered as a mixture of rhizosphere and bulk soils. During soil sampling, we did not use conventional rhizosphere soil collection methods. However, there is a certain proportion of fragmented roots in the soil samples we collected, indicating that soil properties are influenced by plants. We have added this description in MATERIALS AND METHODS (Line 158).

      To minimize the impact of physical shading on the plants, each sampling point was as far away from infrared heaters as possible. We have added this information of soil collection in MATERIALS AND METHODS, that is “In each plot, three soil cores of the topsoil (0-5 cm in depth) were randomly collected and combined as a composite sample, which can be considered as a mixture of rhizosphere and bulk soils. Each sampling point was as far away from infrared heaters as possible to minimize the impact of physical shading on the plants. The fresh soil samples were shipped to the laboratory and sieved (2-mm) to remove root fragments and stones.” (Line 157-162).

      Previous studies based on our field experiment assessed the effects of warming and altered precipitation on soil microbial communities (Zhang et al., 2016), the temporal stability of plant community biomass (Ma et al., 2017), shifting plant species composition and grassland primary production (Liu et al., 2018). These studies provide guidance for the experiment design and execution.

      Reference:

      Zhang, KP., Shi, Y., Jing, X. et al. (2016). Effects of Short-Term Warming and Altered Precipitation on Soil Microbial Communities in Alpine Grassland of the Tibetan Plateau. Frontiers in Microbiology, 7, 1-11.

      Ma ZY., Liu, HY., Mi, ZR. et al. (2017). Climate warming reduces the temporal stabilityof plant community biomass production. Nature Communications, 8, 15378.

      Liu, HY., Mi, ZR., Lin, L. et al. (2018). Shifting plant species composition in response to climate change stabilizes grassland primary production. Proceedings of the National Academy of Sciences, 115, 4051-4056.

      The qSIP calculations reported in the methodology for this work are rather superficial and the reader must be experienced in this technique to understand how the incorporators were identified and their growth quantified. For instance, the GC content of taxa was calculated for reads clustered in OTUs, and it is not discussed in the text the validity of such approach working at genus level.

      We have added the description of qSIP calculations in Supplementary Materials.

      The approach of GC content calculation can be used at genus level (Koch et al., 2018). The GC content of each bacterial taxon (Gi) was calculated using the mean density for the unlabeled (WLIGHTi) treatments (Hungate et al. 2015), rather than OTU sequence information. We have revised the sentence in MATERIALS AND METHODS, that is “the number of 16S rRNA gene copies per OTU taxon (e.g., genus or OTU) in each density fraction was calculated by multiplying the relative abundance (acquisition by sequencing) by the total number of 16S rRNA gene copies (acquisition by qPCR)” (Line 255-258).

      Reference:

      Hungate, B., Mau, R., Schwartz, E., Caporaso, J., Dijkstra, P., Van Gestel, N. et al. (2015). Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology, 81, 7570-7581.

      Koch, B., McHugh, T., Hayer, M., Schwartz, E., Blazewicz, S., Dijkstra, P. et al. (2018). Estimating taxon-specific population dynamics in diverse microbial communities. Ecosphere, 9, e02090.

      The selection of V4-V5 region over V3-V4 region to quantify the number of copies of the 16S rRNA gene should be substantiated in the text. Classic works determined one decade ago that primer pairs that amplify V3-V4 are most suitable to assess soil bacterial communities. Hungate et al. (2015), worked with the V3-V4 region when establishing the qSIP method. Maybe the number of unassigned OTUs is related with the selection of this region.

      Both primer sets (V3-V4 and V4-V5 regions), are widely used across various sample sets, with highly similar in representing the total microbial community composition (Fadeev et al., 2021; Zhang et al., 2018).

      A previous study based on our Field Research Station of Alpine Grassland Ecosystem used V4-V5 primer pairs to investigated the effect of warming and altered precipitation on the overall bacterial community composition (Zhang et al., 2016).

      Another reason for choosing the V4-V5 primer set in this study was to integrate and compare the data with that of two previous qSIP studies (Ruan et al., 2023; Guo et al., submitted), both of them focused on the growth responses of active species to global change and used V4-V5 primer pairs.

      We have added an explanation about primer selection as “The V4-V5 primer pairs were chosen to facilitate integration and comparison with data from previous studies (Ruan et al., 2023; Zhang et al., 2016)” (Line 213-215).

      Reference:

      Fadeev, E., Cardozo-Mino, M.G., Rapp, J.Z. et al. (2021). Comparison of Two 16S rRNA Primers (V3–V4 and V4–V5) for Studies of Arctic Microbial Communities. Frontiers in Microbiology, 12

      Zhang, J.Y., Ding, X., Guan, R. et al. (2018). Evaluation of different 16S rRNA gene V regions for exploring bacterial diversity in a eutrophic freshwater lake. Science of The Total Environment, 618, 1254-1267.

      Zhang, K.P., Shi, Y., Jing, X. et al. (2016). Effects of Short-Term Warming and Altered Precipitation on Soil Microbial Communities in Alpine Grassland of the Tibetan Plateau. Frontiers in Microbiology, 7, 1-11.

      Ruan, Y., Kuzyakov, Y., Liu, X. et al. (2023). Elevated temperature and CO2 strongly affect the growth strategies of soil bacteria. Nature Communications, 14, 1-12.

      Guo, J.J., Kuzyakov, Y., Li, L. et al. (2023). Bacterial growth acclimation to long-term nitrogen input in soil. The ISME Journal, Submitted.

      Report of preprocessing and processing of the sequences does not comply state of the art standards. More info on how the sequences were handled is needed, taking into account that a significant part of the manuscript relies on taxonomic classification of such sequences. Also, an OTU approach for an almost species-dependent analysis (GC contents) should be replaced or complemented with an ASV or subOTUs approach, using denoisers such as DADA2 or deblur. Usage of functional prediction tools underestimates gene frequencies, including those related with biogeochemical significance for soil-carbon and nitrogen cycling.

      (1) We have complemented the information about sequence processing as “The raw sequences were quality-filtered using the USEARCH v.11.0 (Edgar, 2010). In brief, the paired-end sequences were merged and quality filtered with “fastq_mergepairs” and “fastq_filter” commands, respectively. Sequences < 370 bp and total expected errors > 0.5 were removed. Next, “fastx_uniques” command was implemented to remove redundant sequences. Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) with “cluster_otus” commandat a 97% identity threshold, and the most abundant sequence from each OTU was selected as a representative sequence.” (Line 238-245).

      (2) We have complemented the zero-radius OTU (ZOTU) analysis by the unoise3 command in USEARCH (https://drive5.com/usearch/manual/pipe_otus.html), as shown in Fig. S1-S2. The results showed that overall growth responses of soil bacteria to warming and precipitation changes were similar based on OTU and ZOTU analyses, i.e., warming and altered precipitation tend to negatively affect the growth of grassland bacteria and the prevalence of antagonistic interactions of T × P. The similarity of results between the different methods is reflected at the overall community level, the phylum level, the genus level and the species (i.e., OTU or ZOTU) level (Fig. S1 and S2).

      Author response image 1.

      The growth responses of grassland bacteria to warming and altered precipitation based on ZOTU analysis. The results of growth rates at the community level (A), the phylum level (B), and the ZOTU level (C and D) were similar to those based on OTU analysis. C the single and combined factor effects of climate factors on species growth, by comparing with the growth rates in T0nP. D the proportions of species growth influenced by different interaction types of T × P. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. Values represent mean and the error bars represent standard deviation. Different letters indicate significant differences between climate treatments.

      Author response image 2.

      The growth responses of grassland bacteria at the genus level to warming and altered precipitation based on OTU analysis (A and C) and ZOTU analysis (B and D). A and B the single and combined factor effects of climate factors on growth in genera, by comparing with those in T0nP. C and D the proportions of genera whose growth influenced by different interaction types of T × P.

      (3) Agreed. We have introduced the caveat about the limitation of usage of functional prediction tools to the end of DISCUSSION, that is “This is, however, still to be verified, as the functional output from PICRUSt2 is less likely to resolve rare environment-specific functions (Douglas et al., 2020)” (Line 540-542). The caveat ensures that the reader knows the limitations of these methods, and are not potentially overstate our findings.

      Reference:

      Douglas, G.M., Maffei, V.J., Zaneveld, J.R. et al. (2020) PICRUSt2 for prediction of metagenome functions. Nat Biotechnol. 38, 685–688.

      Reviewer #2 (Recommendations For The Authors):

      General suggestions:

      Regarding the qSIP method, would be of help to see the differences in density vs number of 16S rRNA gene abundance for the most responsive bacterial groups in the different treatments, taking into account that with only 7 fractions the entire change in bacterial growth was resolved.

      We have selected three representative bacterial taxa (OTU1 belonging to Bradyrhizobium, OTU14 belonging to Solirubrobacter, OTU15 belonging to Pseudoxanthomonas), which have high growth rates in climate change treatments. The result showed that the peaks in the 18O treatment are shifted "backwards" (greater average weighted buoyancy density) compared to the 16O treatment, indicating that these species assimilates the 18O isotope into the DNA molecules during growth.

      Author response image 3.

      The distribution of 16S rRNA gene abundance of three representative bacterial taxa (OTU1- Bradyrhizobium, OTU14-Solirubrobacter, and OTU15-Pseudoxanthomonas) in different buoyant density fractions. Values represent mean and the error bars represent standard deviation.

      Seven fractionated DNA samples were selected for sequencing because they contained more than 99% gene copy numbers of each samples (please see the Figure below). The DNA concentrations of other fractions were too low to construct sequencing libraries.

      Author response image 4.

      Relative abundance of 16S rRNA gene copies in each fraction. The fractions with density between 1.703 and 1.727 g ml-1 were selected because they contained more than 99% gene copy numbers of each sample. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. Values represent mean and the error bars represent standard deviation.

      With such dataset additional multivariate analysis would be of help to better interpret the ecological framework.

      Thanks for the suggestion. Interpreting the ecological framework is meaningful for understanding microbial responses to environmental changes.

      The main objective of this study is to investigate the growth response of soil microbes in alpine grasslands to the temperature and precipitation changes, and the interaction between climate factors. Our results, as well as the results of complementary analyses (based on subOTU analyses, SHOWN BELOW), indicate that warming and altered precipitation tend to negatively affect the growth of grassland bacteria, and the prevalence of antagonistic interactions of T × P.

      We have emphasized our research objectives and main conclusions in the revised manuscript: “The goal of current study is to comprehensively estimate taxon-specific growth responses of soil bacteria following a decade of warming and altered precipitation manipulation on the alpine grassland of the Tibetan Plateau” (Line 112-114);

      “Our results demonstrated that both warming and altered precipitation negatively affect the growth of grassland bacteria; However, the combined effects of warming and altered precipitation on the growth of ~70% soil bacterial taxa were smaller than the single-factor effects, suggesting antagonistic interaction” (Line 552-556).

      Extension of interaction analysis and its conclusions should be shortened, summarizing the most relevant findings. In my opinion, it becomes a bit redundant.

      We have shortened the discussion of Extension of interaction analysis by deleting the little relevant contents.

      Below are some, but not all, examples that have been deleted or revised in the Discussion,

      (1) Deleted “This result supports our second hypothesis that the interactive effects between warming and altered precipitation on soil microbial growth are not simply additive”;

      (2) Deleted “A previous study suggested that multiple global change factors had negative effects on soil microbial diversity (Yang et al., 2021)”;

      (3) Revised “A meta‐analysis of experimental manipulation revealed that the combined effects of different climate factors were usually less than expected additive effects, revealing antagonistic interactions on soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011). Moreover, two experimental studies on N cycling and net primary productivity demonstrated that the majority of interactions among multiple factors are antagonistic rather than additive or synergistic, thereby dampening the net effects (Larsen et al., 2011; Shaw et al., 2002)” to “A range of ecosystem processes have been revealed to be potentially subject to antagonistic interactions between climate factors, for instance, net primary productivity (Shaw et al., 2002), soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011; Larsen et al., 2011)” (Line 499-503);

      (4) Revised “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022). During the first phase of soil warming (~ 10 years), microbial activity increased, resulting in rapid soil carbon mineralization and respiration (Melillo et al., 2017)” to “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022), mainly because of the rapid soil carbon mineralization and respiration (Melillo et al., 2017)” (Line 464-466).

      I strongly suggest a functional analysis based on shotgun sequencing or RNAseq approaches. With this approach this work would be able to answer who is growing under altered T and Precipitation regimes and what are those that are growing doing.

      Thanks for the suggestion. Metagenomic sequencing is a popular approach to evaluate potential functions of microbial communities in environment. However, there are two main reasons that limit the application of metagenomic or metatranscriptomic sequencing in this study: 1) Most of the fractionated samples in SIP experiment have low DNA concentration and do not meet the requirement of library construction for sequencing; 2) Metagenome and metatranscriptomics usually have relatively low sensitivity to rare species, reducing the diversity of detected active species.

      This study focused on active microbial taxa and their growth in response to multifactorial climate change. We have added the prospect in DISCUSSION, that is “This suggests the development of methods combining qSIP with metagenomes and metatranscriptomes to assess the functional shifts of active microorganisms under global change scenarios” (Line 542-544).

      Minor suggestions:

      L121. _As

      We have deleted this sentence and relocated the hypotheses in the last paragraph of INTRODUCTION (according to the suggestion of the reviewer #3).

      Line150. Described previously in.

      Done (Line 136).

      Line500. Check whether it is better to use the word acclimatization (Coordinated response to several simultaneous stressors) in exchange of acclimation

      We have revised it according to this suggestion (Line 481).

      Fig.4C Drought

      Done (Line 761).

      Reviewer #3 (Public Review):

      Summary:

      In this paper, Ruan et al. studied the long-term impact of warming and altered precipitations on the composition and growth of the soil microbial community. The researchers adopted an experimental approach to assess the impact of climate change on microbial diversity and functionality. This study was carried out within a controlled environment, wherein two primary factors were assessed: temperature (in two distinct levels) and humidity (across three different levels). These factors were manipulated in a full factorial design, resulting in a total of six treatments. This experimental setup was maintained for ten years. To analyze the active microbial community, the researchers employed a technique involving the incorporation of radiolabeled water into biomolecules (particularly DNA) through quantitative stable isotope probing. This allowed for the tracking of the active fraction of microbes, accomplished via isopycnic centrifugation, followed by Illumina sequencing of the denser fraction. This study was followed by a series of statistical analysis to identify the impact of these two variables on the whole community and specific taxonomic groups. The full factorial design arrangement enabled the researchers to discern both individual contributions as well as potential interactions among the variables

      Strengths:

      This work presents a timely study that assesses in a controlled fashion the potential impact of global warming and altered precipitations on microbial populations. The experimental setup, experimental approach and data analysis seem to be overall solid. I consider the paper of high interest for the whole community as it provides a baseline to the assessment of global warming on microbial diversity.

      Thanks for the encouragement and positive comments.

      Weaknesses:

      While taxonomic information is interesting, it would have been highly valuable to include transcriptomics data as well. This would allow us to understand what active pathways become enriched under warming and altered precipitations. Non-metabolic OTUs hold significance as well. The authors could have potentially described these non-incorporators and derived hypotheses from the gathered information. The work would have benefited from using more biological replicates of each treatment.

      Thanks for the valuable suggestions.

      (1) Metatranscriptomics can assess the functional profiles of the community, but it has relatively low sensitivity to rare species, which is difficult to correlate the function pathways with the assignment to the numerous active taxa identified by qSIP. Additionally, due to the low DNA concentration, most fractionated samples are difficult to construct sequencing libraries, while amplicon based sequencing analyses were allowed. This study therefore focused on active microbial taxa and their growth in response to multifactorial climate change. We have added the prospect in DISCUSSION, that is “This suggests the development of methods combining qSIP with metagenomes and metatranscriptomes to assess the functional shifts of active microorganisms under global change scenarios” (Line 542-544).

      (2) 18O-qSIP can identify the growing microbial species (i.e., 18O incorporators) in the environment rather than metabolically active taxa. These non-incorporators in our study were likely to be metabolically active, i.e., maintaining life activities without reproduction, or recently deceased (Blazewicz et al., 2013). Therefore, it is hard to distinguish whether these non-incorporators possess metabolic activity.

      (3) Agreed. The qSIP experiments involve the use of isotopes and the sequencing of a large number of DNA samples (90 samples per biological replicate in this study). Considering its high cost, we selected three replicates for analysis. We have explained this issue in MATERIALS AND METHODS, that is “Considering the cost of qSIP experiment (i.e., the use of isotopes and the sequencing of a large number of DNA samples), we randomly selected three out of the six plots, serving as three replicates for each treatment” (Line 154-157).

      Reference:

      Nuccio, E.E., Starr, E., Karaoz, U. et al. (2020) Niche differentiation is spatially and temporally regulated in the rhizosphere. ISME J 14, 999–1014.

      Blazewicz, S.J., Barnard, R.L., Daly, R.A., Firestone, M.K (2013). Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. The ISME Journal, 7, 2061–2068.

      Reviewer #3 (Recommendations For The Authors):

      Major comments:

      The manuscript should be written in a clearer way. The language should be more direct, so the message is conveyed faster and clearer. Some sentences, for instance, could be shortened or re-organized. Below, you will find some examples.

      We have rewritten the sentences to make the manuscript clearer. Below are some, but not all, examples that have been revised:

      (1) Deleted “(reduced precipitation, hereafter ‘drought’, or enhanced precipitation, hereafter ‘wet’)” in INTRODUCTION;

      (2) Deleted “Controlled experiments simulating climate change have investigated changes in microbial community composition as measured by shifts in the relative abundances (Evans & Wallenstein, 2014; Barnard et al., 2015). However, changes in relative abundances may be poor indicators of population responses to environmental change in some cases (Blazewicz et al., 2020). Another challenge is the presence of a large number of inactive microbial cells in the soil, which hinders the direct, quantitative measure of the ecological drivers in population dynamics (Fierer, 2017; Lennon & Jones, 2011).” in DISCUSSION;

      (3) Deleted “This result supports our second hypothesis that the interactive effects between warming and altered precipitation on soil microbial growth are not simply additive” in DISCUSSION;

      (4) Deleted “A previous study suggested that multiple global change factors had negative effects on soil microbial diversity (Yang et al., 2021)” in DISCUSSION;

      (5) Revised “A meta‐analysis of experimental manipulation revealed that the combined effects of different climate factors were usually less than expected additive effects, revealing antagonistic interactions on soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011). Moreover, two experimental studies on N cycling and net primary productivity demonstrated that the majority of interactions among multiple factors are antagonistic rather than additive or synergistic, thereby dampening the net effects (Larsen et al., 2011; Shaw et al., 2002)” to “A range of ecosystem processes have been revealed to be potentially subject to antagonistic interactions between climate factors, for instance, net primary productivity (Shaw et al., 2002), soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011; Larsen et al., 2011)” in DISCUSSION (Line 499-503);

      (6) Revised “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022). During the first phase of soil warming (~ 10 years), microbial activity increased, resulting in rapid soil carbon mineralization and respiration (Melillo et al., 2017)” to “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022), mainly because of the rapid soil carbon mineralization and respiration (Melillo et al., 2017)” in DISCUSSION (Line 464-466).

      I'm curious about why, even though there were six replicates of the experiment, only three samples were collected for analysis. Metagenomic analyses tend to display high variability.

      The qSIP experiments involve the use of isotopes and the sequencing of a large number of DNA samples (90 samples per biological replicate in this study). Considering its high cost, we selected three replicates for analysis..

      In Fig. 3A, the absolute growth rates (16S copies/d*g) are shown. How do you know that the efficiency of DNA extraction was similar across all treatments and therefore the absolute numbers are comparable?

      To avoid differences in extraction efficiency caused by experimental procedures, all DNA samples were extracted by the same person (the first author) within 2-3 hours, and a unifying procedure of cell lysis and DNA extraction was used, i.e., the mechanical cell destruction was attained by multi-size beads beating at 6 m s-1 for 40 s, and then FastDNA™ SPIN Kit for Soil (MP Biomedicals, Cleveland, OH, USA) was used for DNA extraction.

      We have measured the concentration of extracted DNA and found no significant difference between treatments (Table for the response letter).

      Author response table 1.

      Soil DNA concentration in climate change treatments after qSIP incubation (measured by Qubit® DNA HS Assay Kits).

      Values represent mean and standard deviation. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. The results of ANOVA indicated no significant difference of extracted DNA concentration between treatments (p > 0.05).

      We have introduced the caveat in the DISCUSSION, that is “Note that the experimental parameters such as DNA extraction and PCR amplification efficiencies also have significant effects on the accuracy of growth assessment. This alerts the need to standardize experimental practices to ensure more realistic and reliable results” (Line 544-547).

      Line 96-99 and 121-124: "Hypotheses are typically placed at the end of the final paragraph in the Introduction section. It is advisable to relocate them there and provide a clearer description of the paper's main goal."

      We have relocated the hypotheses at the end of INTRODUCTION, and the main goal of this study, that is “The goal of current study is to comprehensively estimate taxon-specific growth responses of soil bacteria following a decade of warming and altered precipitation manipulation on the alpine grassland of the Tibetan Plateau, by using the 18O-quantitative stable isotope probing (18O-qSIP)” (Line 112-115).

      Line 399: Although you describe the classification among antagonistic interactions in the Methods section, I think you should describe this in further detail here. Can you clarify how you carried out this categorization and how these results were interpreted considering the phylogenetic classification.

      We have added the description of antagonistic interactions, that is “The interaction type of T × P on the growth of ~70% incorporators was antagonistic (i.e., the combined effect size is smaller than the additive expectation) (Fig. 4C)” (Line 388-390).

      The interaction types between factors can be classified into three categories: additive, synergistic and antagonistic. Additive interactions are those in which the combined effect size of factors is equal to the sum of the single effects of the factors (i.e., additive expectation, Fig. 1B). Synergistic interactions refer to the effect size was larger than the additive expectation by the combined manipulation of factors. On the contrary, antagonistic interactions refer to the combined effect size of factors is smaller than the additive expectation. In this study, the antagonistic interactions were further divided into three sub-categories: weak antagonistic interaction, strong antagonistic interaction, and neutralizing effect (Fig. 1B). The weak antagonistic interaction refers to the combined effect size smaller than the additive expectation and larger than any of the single factor effects. The strong antagonistic interaction refers to that the combined effect size is smaller than any of the single factor effects but larger than 0. The neutralizing effect refers to that the combined effect size is equal to 0, implying that the effects of different factors cancel each other out.

      Methodologically, the single and combined effects of two climate factors and their interaction effects were calculated by the natural logarithm of response ratio (lnRR) and Hedges’ d, respectively (Yue et al., 2017).

      We have added the result interpretation about the phylogenetic distribution patterns of incorporators, that is “The degree of phylogenetic relatedness can indicate the processes that influenced community assembly, like the extent a community is shaped by environmental filtering (clustered by phylogeny) or competitive interactions (life strategy is phylogenetically random distribution) (Evans & Wallenstein, 2014; Webb et al., 2002).The results showed that the incorporators whose growth was influenced by the antagonistic interaction of T × P showed significant phylogenetic relatedness, indicating the occurrence of taxa more likely shaped by environment filtering (i.e., selection pressure caused by changes in temperature and moisture conditions). In contrast, the growing taxa affected by synergistic interactions of T × P showed random phylogenetic distributions (Table S1), which may be explained by competition between taxa with similar eco-physiological traits or changes in genotypes (possibly through horizontal gene transfer) (Evans & Wallenstein, 2014). We also found that the extent of phylogenetic relatedness to which taxa groups of T × P interaction types varied by climate scenarios, suggesting that different climate history processes influenced the ways bacteria survive temperature and moisture stress” (Line 515-529).

      Reference:

      Evans, S.E. & Wallenstein, M.D. (2014). Climate change alters ecological strategies of soil bacteria. Ecology Letters, 17, 155-164.

      Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and Community Ecology. Annual Review of Ecology and Systematics, 33, 475-505.

      Yue, K., Fornara, D.A., Yang, W., Peng, Y., Peng, C., Liu, Z. et al. (2017). Influence of multiple global change drivers on terrestrial carbon storage: additive effects are common. Ecology Letters, 20, 663-672.

      Line 407-8: What do you mean with "...clustered at the phylogenetic branches" and Line 410: "cluster near the tips of the phylogenetic tree". Can you please clarify?

      Sorry for the unclear statement. We have added the explanation of NTI, that is “Nearest taxon index (NTI) was used to determine whether the species in a particular growth response are more phylogenetically related to one another than to other species (i.e., close or clustering on phylogenetic tree). NTI is an indicator of the extent of terminal clustering, or clustering near the tips of the tree (Evans & Wallenstein, 2014; Webb et al., 2002)” (Line 397-401).

      Reference:

      Evans, S.E. & Wallenstein, M.D. (2014). Climate change alters ecological strategies of soil bacteria. Ecology Letters, 17, 155-164.

      Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and Community Ecology. Annual Review of Ecology and Systematics, 33, 475-505.

      Could you provide some info about the biochemistry of the incorporation of heavy water into DNA molecules? What specific enzymes are typically involved?

      Due to the low DNA concentration in most fractionated samples (less than 10 ng/μL, measured by Qubit DNA HS Assay Kits), only amplicon based sequencing analyses were allowed. This study therefore focused only on active microbial taxa and their growth in response to multifactorial climate change.

      What might be the impact of soil desiccation on bacterial survival and subsequent water uptake?

      Slow dehydration and air drying of soil is a very common phenomenon in nature (Koch et al., 2018). In this process, microorganisms will reduce metabolism, and shift towards a potentially active state (Blagodatskaya and Kuzyakov, 2013). A previous study suggested that the potentially active microbial population permanently existing in soil between the active and dormant physiological states. Even under long-term starvation the potentially active microorganisms maintain ‘physiological alertness’ to be ready to occasional substrate input (Blagodatskaya and Kuzyakov, 2013). These microorganisms are important participants in the biogeochemical cycle is the focus of this study.

      Replacing the environmental water in the soil with 18O-labelled water is a typical practice for qSIP studies (Hungate et al. 2015; Koch et al., 2018). This process may cause disturbance to the microbial community. In this study, the soil samples were placed in a thermostatic incubator (14℃ and 16℃), rather than air-drying at 25℃ (as used in most studies). The incubation temperature is relatively low (compared to 25℃) and there is no violent air convection in the incubator, resulting slower evaporation and no significant discoloration caused by severe soil dehydration after 48 h. The process of soil drying in this study simulated the natural phenomenon, i.e., slow water loss in soil.

      We have added the description in MATERIALS AND METHODS, that is “There is no violent air convection in the incubator and the incubation temperature is relatively low (compared to 25℃ used in previous studies), resulting slower evaporation and no significant discoloration caused by severe soil dehydration after 48 h” (Line 171-174).

      Reference:

      Blagodatskaya, E. & Kuzyakov, Y. (2013) Active microorganisms in soil: Critical review of estimation criteria and approaches. Soil Biology and Biochemistry, 67, 192-211.

      Hungate, B., Mau, R., Schwartz, E., Caporaso, J., Dijkstra, P., Van Gestel, N. et al. (2015). Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology, 81, 7570-7581.

      Koch, B., McHugh, T., Hayer, M., Schwartz, E., Blazewicz, S., Dijkstra, P. et al. (2018). Estimating taxon-specific population dynamics in diverse microbial communities. Ecosphere, 9, e02090.

      The analysis of the 180 incorporators is interesting as it defines what microbes are metabolically active and hence growing under the different conditions tested. Should not be worth to analyze the non-incorporators? Is it possible to identify a pattern to generate a hypothesis of why they are metabolically inactive based on this information? In the Methods section, the authors state that they identified a total of 6,938 OTUs, of which only 1,373 were found to be incorporators.

      Microbes exist in a range of metabolic states: growing, active (non-growth), dormant and recently deceased (Blazewicz et al., 2013), and there is still a lack of clear threshold for their identification. 18O-DNA qSIP can identified the growing microbial species (i.e., 18O incorporators) rather than all metabolic active taxa, because some cells are measurably metabolizing (catabolic and/or anabolic processes) without reproduction. Therefore, the non-incorporators in our study may be metabolically active, or not (recently deceased microorganisms). This study focuses on the growing microorganisms identified by 18O-qSIP.

      In this study, ~20% microbial taxa (1,373/6,938) were identified as 18O incorporators. Microorganisms in soils suffer from resource and energy constraints frequently (Blagodatskaya and Kuzyakov, 2013). The energy requirements of species in the growing state are much higher (~30 fold) than those in the non-growing state, so the percentage of growing bacterial taxa in soil tends to be low.

      Reference:

      Blazewicz, S.J., Barnard, R.L., Daly, R.A., Firestone, M.K (2013). Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. The ISME Journal, 7, 2061–2068.

      Blagodatskaya, E. & Kuzyakov, Y. (2013) Active microorganisms in soil: Critical review of estimation criteria and approaches. Soil Biology and Biochemistry, 67, 192-211.

      Minor comments:

      Fig. 3A and 3B. Please show the results of the multiple comparisons.

      Done.

      Author response image 5.

      Bacterial growth responses to climate change and the interaction types between warming and altered precipitation. The growth rates (A), and responses (LnRR) of soil bacteria to warming and altered precipitation (B) at the whole community level. The growth rates (C), and responses of the dominant bacterial phyla (D) had similar trends with that of the whole community. Values represent mean and the error bars represent standard deviation. Different letters indicate significant differences between climate treatments.

      Fig. 4. This figure should be self-explanatory. This diagram is challenging to understand.

      We have revised Fig. 4 to improve clarity.

      Author response image 6.

      The growth responses and phylogenetic relationship of incorporators subjected to different interaction types under two climate scenarios. A phylogenetic tree of all incorporators observed in the grassland soils (A). The inner heatmap represents the single and combined factor effects of climate factors on species growth, by comparing with the growth rates in T0nP. The outer heatmap represents the interaction types between warming and altered precipitation under two climate change scenarios. The proportions of positive or negative responses in species growth to single and combined manipulation of climate factors by summarizing the data from the inner heatmap (B). The proportions of species growth influenced by different interaction types of T × P by summarizing the data from the outer heatmap (C).

      Fig. 4. It says "Dorought" instead of "drought"

      Done (Line 760).

      Line 109: "relieves" instead of "relieved"

      Done (Line 102).

      Line 129: Should be: "We classified the interaction types as additive, synergistic, antagonistic, null and neutralizing."

      Done (Line 117).

      Line 233: How were the 16S rRNA sequences from each density fraction analyzed?

      (1) Raw sequencing data processing:

      The raw 16S rRNA gene sequences of each density fraction were quality-filtered using the USEARCH v.11.0 (Edgar, 2010). The paired-end sequences were merged and quality filtered with “fastq_mergepairs” and “fastq_filter” commands, respectively. Sequences < 370 bp and total expected errors > 0.5 were removed. Next, “fastx_uniques” command was implemented to identify the unique sequences. Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) with “cluster_otus” commandat a 97% identity threshold, and the most abundant sequence from each OTU was selected as a representative sequence. The taxonomic affiliation of the representative sequence was determined using the RDP classifier (Wang et al., 2007).

      (2) qSIP calculation:

      Sequencing data reflects the relative abundance of taxa in community. We multiply the OTU’s relative abundance (acquisition by sequencing) and the number of 16S rRNA gene copies (acquisition by qPCR) to obtain the number of gene copies per OTU in each fraction. Then, the proportion of gene copies of a specific OTU of each fraction relative to the total amount of gene copies in one sample was calculated and used as a weight value for further calculation of the average weighted buoyant density (the critical parameter for assessing microbial growth).

      Line 366: "Three single-factor ... between warming and altered precipitation" -> "The individual impact of warming, drought, and wet conditions resulted in the most substantial negative effects on bacterial growth compared with the effects of warming x drought and warming x wet. A result that illustrates the negative interactions between warming and modified precipitations patterns."

      Done (Line 365-368).

      Line 376: "Similar with the result of whole growth of bacteria community, the growth responses of the major bacterial phyla were also negatively influenced by single climate factors". This sentence is hard to read. Maybe something like this: "Growth of the major bacterial phyla was also negatively influenced by the individual climate factors".

      Done (Line 371-372).

      Line 383: "In particular, the effects of wet and warming neutralized each other, resulting the net effects became zero on the growth rates of the phyla Actinobacteria and Bacteroidetes". "In Actinobacteria and Bacteroidetes, the effect of wet and warming neutralized each other, as the combined effect of these two factors had no effect on growth".

      Done (Line 377-379).

      Line 390: "The individual warming treatment (T+nP) reduced the growth rates of 75% incorporators..." "Warming (T+nP) reduced the growth of 75% of the taxonomic groups, which was followed by drought and wet.

      Done (Line 384-385).

      Line 392: "The combined manipulations of warming and altered precipitation lowered the percentages of incorporators with negative responses compared with single factor manipulation, especially warming and enhanced precipitation manipulation" -> "Warming x drought and warming x wet had a smaller impact on the growth of incorporators, compared with single effects."

      Done (Line 385-387).

      Line 468. This sentence "To the best ..." is not necessary.

      We have deleted this sentence.

      Line 476. Is it really "synthesis" the word you want to use?

      We have deleted this sentence.

      Line 477. Maybe should written like this: "Consistent with our findings, a recent experimental study demonstrated that 15 years of warming reduced the growth rate of soil bacteria in a montane meadow in northern Arizona."

      Done (Line 459-461).

      Line 490 and 502. Consider using "however" only once in a paragraph.

      We have deleted the second “however” (Line 483).

      Line 555-559. Based on genomic data you cannot predict the functional role of microbes in the environment. These sentences are speculative. Please, consider using less strong affirmations and focus more on the pathways that are enriched in the incorporators.

      Agreed. We have deleted this part of content.

    1. Bots, on the other hand, will do actions through social media accounts and can appear to be like any other user. The bot might be the only thing posting to the account, or human users might sometimes use a bot to post for them.

      On social media, many accounts do appear to be run by real people, but in reality, they may be run by bots. As far as I know, these bots can automatically post relevant topics and content, and may also be used by some users to help them post or edit content. Most of these bots are designed to help people manage their accounts or simplify their tasks, but some are used to spread negative information and cultivate negative opinions. Therefore, I think we need to improve our ability to distinguish real users from bots so that we can better ensure the fairness and authenticity of social media.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      The authors sincerely appreciate the editors’ and the reviewers’ dedication in providing constructive and insightful comments aimed at enhancing the quality of the manuscript. In response to the valuable feedback received, we have implemented significant revisions to the manuscript, including the addition of key experiments, reorganization of the figures as well as providing detailed point-to-point responses to address the reviewers’ concerns. With these changes, we are confident that we have effectively addressed the comments raised by all three reviewers and have strengthened the overall quality of the manuscript.

      Below are the major improvements we have made in the revised manuscript:

      1. Figure 4  new figure with polysome profiling assay to strengthen the link between translational regulation and mitochondrial defects.
      2. Figure 7  added confocal images showing the transfer of mitochondria into recipient cells.
      3. Figure S2  added RER data further supporting a shift of metabolism to favor fatty acid oxidation as shown by proteomics data.
      4. Figure S4  added WB data showing that protein degradation was not affected, strengthening a protein synthesis defect due to Fam210a KO.
      5. Figure S5B, S6C  added quantification to the staining and blots.

      1. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In the manuscript entitled "FAM210A mediates an inter-organelle crosstalk essential for protein synthesis and muscle growth in mouse", Chen et al, found that knocking out of FAM210A specifically in muscle using Myl Cre resulted in abnormal mitochondria, hyperacetylation of cytosolic proteins, and translation defects. The manuscript uncovered the new functions of FAM210A in regulating metabolism and translation. I have the following the concerns about the manuscript.

      Comments

      One of the major phenotypes of FAM210A is the decrease of muscle mass after 6 weeks after birth. Is this phenotype caused by the accumulation of progressive loss of muscle mass from birth? Are the body weight and muscle mass reduced in FAM210A knocking out new-born mice? Is the muscle mass growth curve the same in FAM210A and WT mice from birth to 6 weeks after birth? These results will reveal more mechanism of FAM210A mediated muscle mass control. Answer: Indeed, the phenotype of the Fam210aMKO was caused by the progressive loss of muscle mass. The body weight of the mice was not different before 3-weeks of age (Figure 2B). We reasoned that myonuclei accretion occurred before Myl1Cre induced knockout of Fam210a, accounting for the relative normal muscle development and nuclei accretion prior to 21 days after birth (refer to Response Figure 2). However, due to the small muscle mass, it is hard to accurately evaluate whether the muscle mass in very young mice. Regardless, we believe that body weight and muscle weight closely mimic each other and exhibit similar slopes in WT and KO mice (Response Figure 1).

      Beyond 21 days, muscle growth is mainly attributed to hypertrophy of myofibers, a process that relies on protein synthesis. Yet the Fam210aMKO myofibers has defects in protein synthesis, explaining why the muscles cannot gain weight after 3 weeks and started to lose weight. We have shown that at 4 weeks the TA muscle weight was 13 mg in Fam210aMKO compared to 25 mg in WT control. At 6-weeks, the TA weight in the Fam210aMKO mouse was 10 mg compared to 28 mg in the WT control. Furthermore, the TA weight of the Fam210aMKO mouse was 8.7 mg compared to 36mg in the WT control. These results provide compelling evidence that the Fam210aMKO muscles are progressively wasted.

      Response Figure 1. Changes of body weights and TA muscle weights during postnatal growth. The muscle weights increased (in wildtype mice) or decreased (in KO mice) with body weights at similar trends.

      Does the muscle mass continue to decrease after 8 weeks?

      Answer: Based on the trend (see Response Figure 1), we believe the answer is “yes”. However, we were not allowed to monitor the Fam210aMKO mice after 8 weeks of age, as they were severely lethargic and can barely move, reaching the humane endpoint determined by the IACUC guidelines.

      FAM210A knockout mice displayed high lethal rate. Is there any potential mechanism for the high lethality?

      Answer: We performed extensive necropsy and could not identify a direct cause. The potential cause for the lethality could be the difficulty of breathing as the diaphragm muscle was very thin in the Fam210aMKO mouse compared to the WT control. Besides, the diminished muscle contraction force (Figure 3) might have prohibited normal activities (including eating), leading to exhaustive death.

      In Figure 2, the muscle mass decreased significantly, while the fat mass only decreased slightly in FAM210A knockout mice. However, the ratio of the lean mass and fat mass to body mass did not change in FAM210A knockout mice compared to WT mice. How do the authors reconcile this?

      Answer: Just to clarify, Figure 2D-E shows that fat mass was significantly reduced at 4-week old but not reduced at 6-week old. We interpret the significant reduction of the mass but not the ratio (to body weight) as the result of the concomitant reduction of the body weight in the Fam210aMKO mice.

      Are there changes of the number of nuclei per myotube? Is the muscle atrophy in FAM210A knockout mice caused by the defects of fusion, or the degradation of protein, or both?

      Answer: We thank the reviewer for this question. To answer this question, we isolated myofibers from WT and Fam210aMKO mouse at 4-week-old and quantify the myonuclei number. We did not observe a significant reduction of myonuclei number per myofiber in the Fam210aMKO mouse, suggesting that the myoblast fusion into myofibers was not affected in the Fam210aMKO model. (Response Figure 2)

      Response Figure 2. DAPI staining and quantification in the single myofiber isolated from WT and Fam210aMKO mice.

      The number of myonuclei in the WT and Fam210aMKO was not different, suggesting normal fusion of satellite cells in Fam210aMKO mice.

      We also did western blot to check the atrophy related protein expression in WT and Fam210aMKO mouse at different ages. Interestingly, we did not observe a significant induction of these proteins (Atrogin-1, MuRF1) in the Fam210aMKOmuscle. Therefore, we conclude that the muscle atrophy was due to protein translation defects in the Fam210aMKO, independent of myoblast fusion and protein degradation (Figure S4C).

      Are the growth curves of muscle mass growth in EDL and SOL the same in FAM210A knockout mice?

      Answer: We thank the reviewer for the question. In the Myl1Cre mediated Fam210a KO model, Fam210a was deleted in both fast (EDL) and slow (SOL) muscles (see response to Reviewer 3, second point). We think that the “growth curve” of the EDL and SOL muscle should be same (stagnant and even reduced) upon Fam210a KO as the mouse grows from 4-week to 8-week.

      The oxygen consumption and carbon dioxide production are higher in FAM210A knockout mice, suggesting a high metabolism rate. In contrast, the heat production of FAM210A knockout mice is lower, suggesting a low metabolism rate. Any explanation?

      Answer: The VCO2 and VO2 values were normalized to the body weight, and the KO value appeared high because their body weights were much lower at the time of test. While for heat production (unit: Kcal/hr), body weight was not a factor in the calculation. The seemingly contradicting/surprising result that a weak KO mouse could have higher VCO2 and VO2could be recapitulated in other mouse models (for example PMID: 22307625).

      Given the high glucose consumption in FAM210A, why is the clearance rate of blood glucose low?

      Answer: We believe there is a misunderstanding here. A smaller AUC (as seen in the KO) suggest faster blood glucose clearance. The circulating glucose level after fasting is lower in the KO mice, which suggests that the Fam210aMKO mice were consuming more glucose compared to the WT mice. In the GTT test, the Fam210aMKO mice showed a lower AUC after the injection of glucose, implying that the Fam210aMKO mice cleared the injected glucose at a faster rate, probably due to a pseudo-fasting state which would promote the uptake of circulating glucose when available.

      Are there any changes of the abilities for the FAM210A knockout mice in running endurance?

      Answer: Indeed, the Fam210aMKO mice ran less distance, shorter time, and at a lower speed when tested on a treadmill endurance running program (Figure 3)

      In page 5, the last sentence of the 2nd paragraph, the authors concluded "There results suggest that Fam210aMKO induces a metabolic switch to a more oxidative state." It is better to describe it as muscle metabolic since the whole-body metabolism has not been carefully examined.

      Answer: We thank the reviewer for pointing this out, we will change the wording to better reflect the changes observed in the Fam210aMKO mouse regarding the metabolism.

      In Fig. 6, what is the link between increased transcription level of Fgf21 and the elevated level of aberrant acetylation of proteins?

      Answer: We thank the reviewer for this interesting question! However, we did not pursue a direct causal relationship between Fgf21 level and aberrant protein acetylation. In our model, we are proposing that mitochondrial defects in the Fam210aMKO model can trigger the integrated stress response which leads to a higher Fgf21 transcript level in the muscle. This is coinciding with the acetylation increase in the muscle due to the excessive production of acetyl-CoA. A potential relationship between Fgf21 and protein acetylation warrant examination in a future study.

      After careful considerations on the mechanism proposed in the study, we decided to remove qPCR data showing the modest increase of Fgf21 mRNA level. The removal of this data will not change the conclusions we draw nor lessen the significance of the mitochondria transfer experiment.

      Is there any link between the increased acetylation level of rebolsome proteins and the translation defects?

      Answer: Indeed, there are ample studies showing that ribosomal proteins can be acetylated, and that the acetylation of ribosomal proteins can affect the protein synthesis process, for example in PMID: 35604121 and PMID: 37742082. Here in this paper, we showed by ribosome profiling assay that the muscle has defects in the polysome formation (at 4-week and 6-week), when the protein acetylation was significantly increased in the Fam210aMKO mice (Figure 4D-4G).

      How do the abnormal mitochondria lead to increased protein acetylation? And how do these defects further cause translation problem?

      Answer: As elaborated in the discussion, we propose that upon Fam210a KO in mature myofiber, the TCA cycle in the mitochondria was disrupted, blocking utilization of acetyl-CoA and resulting in the accumulation of acetyl-CoA in the muscle. The excess acetyl-CoA lead to increased protein acetylation in the cytosol. We identified that ribosomal proteins are hyperacetylated in the muscle. We also observed that the polysome formation in the muscle was impaired, which exacerbates the translation efficiency.

      Consistently, when we treated C2C12 during in vitro culture with sodium acetate to mimic the increase of acetylation of proteins, we showed that excessive levels of acetyl-CoA can block the differentiation of C2C12 cells (Response Figure 3).

      Response Figure 3. The effect of sodium acetate on the differentiation of C2C12 myoblasts.

      The differentiation of C2C12 myoblasts into myotubes were probed by the protein abundance of Myog and MF20, which showed a decrease in the expression level when sodium acetate was added in increasing amounts.

      The defects in translation will cause general problems besides mitochondria defects. Are there any phenotypes related to the overall translation inhibition observed? If not, why?

      Answer: Just to clarify, our model suggests that mitochondrial defects in the Fam210a KO causes cytosolic translation defects, not the other way around. We showed by SUnSET experiment that the global translation was indeed reduced in the Fam210aMKO muscle at 4-week. We also observed that the p-S6 level which indicates the global protein translation was decreased. It is also true that the global translational arrest can exacerbate the mitochondrial defects and fewer mitochondrial proteins can be synthesized. This feed forward loop can explain the aggravating phenotype in the Fam210aMKO mouse as the mouse gets older.

      Are the abnormal mitochondria, increased protein acetylation, and translation inhibition observed in 2-6 weeks old mice? When were these defects first found? Are they correlated with muscle atrophy?

      Answer: At 2-week-old, the protein synthesis or degradation was not changed between WT and Fam210aMKO mice (Figure S4C). The mitochondria abnormality was first observed at 4 weeks of age, concomitant with the decrease of protein translation (decreased p-S6), polysome formation, and protein hyperacetylation. The acetylation increase was apparent at 6-week together with decreased p-S6 level, polysome assembly and mitochondrial defects. Decreased protein translation has been shown to cause muscle atrophy (PMID: 19046572).

      Reviewer #1 (Significance (Required)):

      This manuscript described many interesting phenotypes of Fam210a knockout mice. However, the links between these phenotypes are obscure. The logic of the manuscript will be greatly improved if the authors could provide explanations to logically link the phenotypes.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In this manuscript, Chen et al., investigate the functions of FAM210A in skeletal muscle physiology and metabolism. FAM210A is a mitochondria-localized protein in which mutations have been associated with sarcopenia and osteoporosis. Using publicly available gene expression datasets from human skeletal muscle biopsies the authors first demonstrate that the expression of FAM210 is reduced in muscle atrophy-associated diseases and increased in muscle hypertrophy conditions. Based on this, they show that a muscle specific Fam210a deletion leads to muscle atrophy/weakness, systemic metabolic defects, and premature lethality in mouse. Further examination of the knockout myofibers reveals impaired mitochondrial respiration and translation program. Additionally, the authors demonstrate that the flow of TCA cycle is disrupted in the FAM210A-deleted myofibers, which causes abnormal accumulation of acetyl-coA and hyperacetylation of a subset of proteins. The authors claim that Fam210a deletion in skeletal muscle induces the hyper-acetylation of several small ribosomal proteins that leads to ribosomal disassembly and translational deficiency. However, this conclusion is not supported by adequate experimentation and rigorous analysis of ribosomal proteins acetylation and ribosome assembly.

      Major comments:

      -In general, figure legends are lacking information regarding number of biological replicates used and details about statistical analysis. What does three * vs. one * mean in terms of p-value? Exact p-values should be indicated.

      Answer: We thank the reviewer for pointing this out, we have added the information to the revised figure legends.

      -The mechanistic studies linking muscle phenotypes with ribosomal protein hyperacetylation and mRNA translation defects are underdeveloped and not rigorously carried.

      Answer: We agree with the reviewer and have added new data in the revised manuscript to strengthen this link. For example, we have now provided direct evidence on the defective polysome assembly in the Fam210a KO muscles (Figure 4D-4G), which should profoundly impact mRNA translation. In addition, other groups have also shown that ribosomal protein acetylation can impact mRNA translation and polysome formation (PMID: 35604121).

      We also explored the effect of acetylation on differentiation (a process accompanied by extensive protein synthesis) related to our mouse model. We used sodium acetate to elevate acetylation during C2C12 differentiation. We found that increased acetylation indeed impaired the differentiation as can be seen by the reduced expression of MF20 (myosin protein) by WB and IF. The differentiation marker Myogenin was also reduced (Response Figure 3, 4).

      Response Figure 4. Immunofluorescence staining of Myog and MF20 in the differentiated C2C12 myotubes treated with different amounts of sodium acetate.

      The number of MF20 (green) positive myotubes and Myog (red) positive nuclei was significantly reduced in the cells treated with 15mM and 30mM sodium acetate.

      -Fig S1: The validation WB of FAM210A KO is not the most convincing. Why are the FAM210A levels so low in TA compared to other tissues?

      Answer: This is due to the insufficient proteins loaded as it was obvious from the Tubulin marker. We have replaced the WB blot with more convincing blots as requested (Figure S1C).

      -Fig 2G: The authors state "Hematoxylin and eosin (H&E) staining did not reveal any obvious myofiber pathology in the Fam210a KO mice up to 8 weeks". However there seems to be a progressive increase in nuclei up to 8-weeks in the KO. What is the significance of this?

      Answer: Thank you for pointing this out. We have now changed the wording and quantified the myonuclei number per myofiber. The increase of myonuclei in the H&E images is likely due to the smaller myofiber size in the Fam210aMKOmouse compared to the WT (Response Figure 5).

      Response Figure 5. Quantification of the myonuclei number in the H&E images.

      -IP-MS analysis for FAM210A interacting proteins requires validation with IP and reverse IP + WB experiment.

      Answer: We did perform the co-IP with SUCLG2 and FAM210A antibodies to try to confirm the interaction. To be more specific, we transduced C2C12 myoblasts cells with an Fam210a overexpression virus and differentiated the cells for 3 days. The myotubes were used to test the interaction by pulling down Fam210a with a myc antibody (FAM210A has a myc tag) and blot with SUCLG2 antibody. Unfortunately, the results were not promising (Response Figure 6). We reasoned that the interaction might be indirect or too transient to be reliably detected.

      Response Figure 6. co-IP of SUCLG2 and FAM210A.

      • Figure 4A requires quantification of the SDH signals from multiple samples.

      Answer: We thank the reviewer for this suggestion. We have added the quantification of the staining (Figure S5B).

      • Figure 6F: To clearly demonstrate an increase in protein acetylation in the FAM210 MKO, the authors must provide quantification data generated with more then N=1. Please add the molecular weights markings on the side of the blots.

      Answer: We thank the reviewer for this suggestion, we have provided the quantification of the Acetylated-lysine blots, and added the molecular weight markers (Figure 6F, Figure S6C).

      • Figure 6H and S5: The mitochondria transfer experiment appears to be quite efficient compared to previously published studies. It would be important to control that the signal observed in the recipient cells is not due to the leakage of the MitoTracker dye from the donor mitochondria.

      Answer: This is an interesting point though MitoTracker dye is not supposed to leak as it covalently binds to mitochondrial proteins. Even though the dye may leak to mark the endogenous mitochondrial, it does not affect our goal to demonstrate that transfer of Fam210aMKO mitochondria into healthy cells can induce protein hyperacetylation. Additional evidence argues against the leakiness of Mitotracker dye to subsequently mark other mitochondria in the recipient cells: 1) mtDNA and MitoTracker signal both increase linearly with the increasing amounts of mitochondria transferred (Figure S7A); 2) We have now also included confocal images to show the presence of both MitoTracker labeled and non-labeled mitochondria in the recipient cells. We reason that if MitoTracker leaks within a cell then it would have labeled all mitochondrial in that cell (Figure 7C).

      • Figure 6J: The increase in Fgf21 is modest. Although the difference is statistically significant, is it biologically important?

      Answer: We thank the reviewer for this question; indeed, the increase is modest. We think the reason of the modest increase compared to the drastic increase seen in vivo was because when we transplanted the WT and Fam210aMKOmitochondria to the recipient cell, the original mitochondria in the recipient were not depleted, which could explain the milder effect. However, we were able to show that the recipient cells readily increase the acetylation of proteins after receiving the Fam210aMKO mitochondria, recapitulating the phenotype we saw in the Fam210aMKO muscle.

      After careful considerations on the mechanism proposed in the study, we decided to remove qPCR data showing the modest increase of Fgf21 mRNA level. The removal of this data will not change the conclusions we draw nor lessen the significance of the mitochondria transfer experiment.

      • Figure 6C: How significant is the difference in acetylation of RPL30 in WT vs. KO. RPS13 was not found in the WT MS? Was this normalized to Input?

      Answer: the MS was performed with same loading. The mass spectrometry results for protein identification after AcK-IP were from pooled samples from 3 independent replicates (as the KO muscles are very scarce). Therefore, there was not a significance test.

      • Figure 7D: What are the MW of the bands shown on this blot? This experiment is by no means sufficient to demonstrate and confirm that ribosomal proteins are acetylated. An increase in RPL30 and RPS13 acetylation must be directly assessed.

      Answer: We thank the reviewer for suggesting the more direct assays to look at RPL30 and RPS13 acetylation. We have shown that the ribosome fractions were indeed hyperacetylated in the Fam210aMKO mouse compared to the WT control (Figure S6D). We agree that this result cannot lead to the conclusion that the RPL30 and RPS13 are specifically hyperacetylated. Indeed, we have tried to use Acetylated lysine antibody pull down and RPS13/RPL30 blot to show the increase in the acetylated RPS13/RPL30 protein. However, we cannot show a robust increase in the acetylation, potentially due to the low number of acetylation sites on RPS13 and RPL30 protein. We therefore have reworded the conclusion in the revised manuscript to better reflect the results.

      • Fig7E: This experiment is not properly executed and in its current state does not rigorously support that "hyper-acetylation of several small ribosomal proteins leads to ribosomal disassembly". A) UV profiles of the fractionation must be provided to assess the quality of the profile. B) Provide MW markers. Which band is RPL30? The Input and free fraction bands are not at the same size. RPL30 should at least be visible on the 60S and polysomes from the WT. C) These results do not match the acetylation MS data, which seem to show that the increase in acetylation is much greater for RPS13. However, RPS13 presence on polysomes (assuming they are polysomes) is not affected in the KO. D) This type of experiment must be done for three independent biological replicates, blots from single lanes must be quantified and normalized to total signal (from all the lanes) for the same antibody.

      Answer: we appreciate the great advice on improving the experiment. As suggested, we have now added proper experimentation (UV profile, and better WB), with the help of Dr. Kotaro Fujii (included as co-author in the revised manuscript). The following results showed that in the 4-week sample, there was a decrease in the 80S monosome and polysome in the Fam210aMKO mice compared to the WT. The change was more drastic at 6-week (Figure 4D-4G). Similarly, due to the scarce amount of muscle in the KO mice, we need to pool samples from the 6-week-old mice for the experiment, and hope the reviewer can understand the situation. With the clear peaks shown in the UV profile as well as the WB results, we provide more convincing evidence that the polysome assembly was indeed impaired in the Fam210aMKO (Figure 4D-4G).

      • Fig 7F: Global translation rates are assessed by puro incorporation at week 4, a time point when differences in protein acetylation were not observed. This does not support the hypothesis that increased acetylation of ribosomal proteins causes defect in protein translation. (Referencing the authors statement p.7 lines 321-24.).

      Answer: We thank the reviewer for this question. When we quantified the protein acetylation increase in the muscle at 4-weeks, we showed that there was a significant increase. But like the reviewer said, the ribosomal fractions were not significantly acetylated by WB at 4-week. We reasoned that, at early stages (4-weeks), the ISR signaling can lead to the translational arrest, along with the polysome formation defects, leading to the decreased protein translation. These are included in the discussion.

      • Other studies have implicated Fam210A in the regulation of mitochondrial protein synthesis through an interaction with EF-Tu. The authors also identified EF-Tu as an interactor in their LC-MS analysis (FigS4). A role for this interaction accounting for mitochondrial and translation defects seems to be underestimated and unexplored here.

      Answer: We agree with this point and believe the cytoplasmic translation defects are in addition to the mitochondrial translational defects. We have shown that FAM210A KO leads to the decrease of the MTCO1 which is encoded by the mitochondrial genome. Besides, we also showed by mitochondrial proteomics that TUFM was reduced in the KO, which also contributed to translational arrest in the mitochondria (Figure 5J). To answer whether mitochondrial encoded proteins are decreased in upon Fam210a KO, we blotted the protein lysates at different stages with antibodies for a few mitochondrial encoded proteins and showed that they decreased with ages (Response Figure 7).

      Response Figure 7. WB analysis and quantification of mitochondrial encoded proteins in WT and Fam210aMKO muscle at different ages.

      The mitochondrial proteins were indeed decreased in Fam210aMKO starting from 6-weeks of age compared to the WT.

      Minor comments:

      -What is known about FAM210A, other studies assessing its role, and the rational for studying its function should be better introduced.

      Answer: We thank the reviewer for the suggestion to have more information of FAM210A functions/mechanisms in the introduction. We have added more background to the introduction.

      -In the discussion the authors states: "Moreover, when the proportion of ribosomal protein phosphorylation buildup in the Fam210aMKO, the assembly of the translational machinery is impaired therefore further dampen the cellular translation". Do they mean acetylation and not phosphorylation?

      Answer: We are sorry about the typo and have changed it. We thank the reviewer for catching this.

      • Please use the term "mRNA translation" or "protein synthesis" instead of "protein translation" in the text.

      Answer: We thank the reviewer for the suggestion to properly refer to these processes. We have changed the terms in the manuscript.

      -The methods section for RT-qPCR: It should ne M-MLV RT and not M-MLC. If the qPCR data was normalized with 18S, please provide the sequence of the primers in the table. Information on how primer efficiency was tested must be included in the method section.

      Answer: We thank the reviewer for pointing this out. We have changed the texts. We also have provided 18S sequence and provide texts about how primer efficiency was tested.

      Reviewer #2 (Significance (Required)):

      General assessment: Previous genome-wide association studies have found that mutations in FAM210A were associated with sarcopenia and osteoporosis. Because FAM210A is not expressed in the bone and highly expressed in skeletal muscle, it suggests that FAM210A likely plays an important role in muscle, which could also affect bone regulation. The authors here provide further evidence of an important role for FAM210A in diseases affecting muscle function by demonstrating that the expression of FAM210A decreases with age and in patients affected by Pompe disease, Duchenne muscular dystrophy and hereditary recessive myopathy. FAM210A is a mitochondria-localized protein and given the crucial role of mitochondria in supporting muscle metabolism, elucidating the molecular function of FAM210A may provide important insights into diseases biology that could lead to the development of therapeutic approaches. Thus, a significant protein and regulatory pathway are explored in this study that can potentially impact human health. In this manuscript, the authors provide compelling evidence of the importance of Fam210a in muscle homeostasis with their newly generate mouse model. The experiments looking at muscle physiology, function and metabolism are well-executed and for the most part rigorous, which are the strengths of this manuscript. However, the conclusion that Fam210a deletion in skeletal muscle induces the hyper-acetylation of several small ribosomal proteins, which leads to ribosomal disassembly and translational deficiency is not supported by the data presented here. As noted in the comments above, these experiments need major improvement. Additionally, there are other concerns about general scientific rigor and conclusions inconsistent with the data presented as also noted in the comments section.

      Advance: Although a previous study explored the role of FAM210A using a skeletal muscle-specific KO induced at postnatal 28 days under a HSA promoter, the model used by the authors here provide a cleaner approach and more insights into the molecular functions of FAM210A in muscle physiology. The findings that Fam210a MKO disrupts the flow of TCA cycle, which leads to an abnormal accumulation of acetyl-CoA is interesting and provide new conceptual advance on the roles of FAM210A in mitochondria function in muscle. Acetyl-CoA production is an important source of acetyl-group that can be transferred to proteins and regulate gene expression programs. Thus, this is an important finding. However, molecular mechanism by which FAM210A regulates this process through an interaction with SUCLG2 is not provided and the nature this interaction is superficially explored.

      Audience: Findings from this manuscript are likely to interest both basic research and translational/clinical audiences as it explores the physiological and molecular function of a disease-linked protein. The findings are also likely to impact the fields of metabolism, mitochondria function and regulation of gene expression by protein acetylation (if concerns raised regarding these experiments are addressed).

      The fields of expertise of this reviewer are protein and RNA modifications, ribosome biogenesis and mRNA translation.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The authors state that in their manuscript "the role of mitochondria in regulating cytosolic protein translation in skeletal muscle cells (myofibers)" has been explored (Line 19-20). As experimental model, they used mice expressing Cre recombinase under the control of the myosin light chain 1 promoter. The first conclusion was that "FAM210A is positively associated with muscle mass in mice and humans". The authors say that the presented data "reveal a novel crosstalk between the mitochondrion and ribosome mediated by FAM210A".

      I recognize the potential of this work since the role of FAM210a has been more deeply investigated in skeletal muscle. In fact, the study by Tanaka et al, 2018 presented only a preliminary characterization of the role of FAM210a in muscle. However, I think that this work is not complete and each aspect that has been investigated is not well connected with each other. In particular, it is not clear whether the disrupted ribosomal assembly by hyperacetylation causes muscle atrophy or it is altered under catabolic states during atrophy (primary cause or consequence of?).

      Answer: We thank the reviewer for recognizing the importance of the study that characterizes the effect of FAM210A in muscle mass maintenance. In this study, we have shown that polysome formation was impaired at 4-week and therefore the translational efficiency was reduced in the muscle. This translational decrease coincides with the acetylation increase. Moreover, we showed by mitochondrial transfer experiment that the mitochondria from the Fam210aMKO mice can carry the phenotype and lead to acetylation increase in the recipient cells. Since muscle protein synthesis defects have been known to lead to muscle dystrophy, and we have shown that in the Fam210aMKO model, protein synthesis was indeed defective while there was not an induction of atrophy. Therefore, we conclude that the in the KO model, the protein synthesis defects lead to muscle atrophy.

      The other major point is represented by the fact that the Myl1-CRE expressing model provides selectivity in fast muscle fibers (see for example Barton PJR, Harris AJ, Buckingham M. Myosin light chain gene expression in developing and denervated fetal muscle in the mouse. Development. 1989;107: 819-824). Then the authors knocked out FAM210a only in fast fibers and they never take in consideration this key point! This is crucial since fast and slow muscles have different content of mitochondria with different size, shape, and metabolism! The muscle fibers can be classified based on the mitochondrial metabolism (see for example Chemello et al., 2019; PMID: 30917329).

      Regarding this point, they simply wrote at Line 75-76 "using a skeletal muscle specific Myl1 (myosin, light polypeptide 1) driven Cre recombinase specifically expressed in post-differentiation myocytes and multinucleated myofibers,...". It would be more correct to write multinucleated type 2 myofibers showing the reduction of FAM210a in different fiber types.

      I think that the authors must solve these aspect and then organize the findings accordingly. The data are in general interesting for broad type of audience.

      Answer:

      We appreciate the reviewer’s comment on the Myl1 knock-in Cre (Myl1Cre) model, which prompted us to more explicitly clarify some of the confusions around this model. We fully respect the validity of the 1989 study by Dr. Buckingham and other studies showing fast muscle specific expression of Myl1. However, we and others have shown that Myl1 not only mark the fast but also the slow myofibers (elaborated below). The discrepancy can be explained by the fact that using the Myl1Cre as a lineage marker is different from directly examining Myl1 expression at static timepoints by in situ hybridization (ISH). This is because Cre recombinase can accumulate and diffuse to all the myonuclei in a multinucleated myofiber, subsequently leading to deletion of LoxP-flanked DNA in all nuclei. Also, in the Cre/LoxP system, only a small amount of Cre recombinase is needed to induce the recombination of the target loxP sites and lead to gene KO. Another example of the discrepancy between the static mRNA pattern and the dynamic gene expression during development is the Hox gene expression. When the corresponding author (SK) of this manuscript was trained with Dr. Joshua R Sanes, he developed 3 Cre lines driven by three different Hox genes– that have been shown by ISH to be expressed in a specific rostral to caudal domain in the spinal cord during development. However, each of these Cre model ended up marking all the spinal cord without any domain specificity. In the case of Myl1Cre mouse model, we have previously published a paper on the lineage-tracing results using the Myl1Cre and showed that Myl1Cre marked all fast AND slow myofibers in mice (Wang et al, 2015, PMID: 25794679). In another lineage tracing study using nuclear GFP reporter, we report that Myl1Cre marks 96% nuclei in myofibers regardless of fiber types (Bi et al., 2016, PMID: 27644105), the remainder 4% non-marked nuclei potentially represent satellite cells. Other groups have also used the Myl1Cre model to induce KO in both fast and slow muscles (Pereira et al, 2020, PMID: 31916679). Therefore, we believe that the Myl1Cre mouse model allows us to efficiently knockout the Fam210a gene in both slow and fast muscle.

      To directly confirm that Fam210a was efficiently knocked out in both slow and fast muscles using the Myl1Cre mouse model, we isolated different muscle groups (Soleus and diaphragm that contains a large fraction of slow myofibers, TA and EDL that contain predominantly fast myofibers) and checked the expression level and the KO efficiency of Fam210a by WB. We have shown that even in slow muscles like diaphragm and SOL, the KO was very efficient, as there were no visible FAM210A bands in the WB (Figure S1C).

      In more detail:

      The data must be analyzed and discussed based on the fact that FAM210a has been deleted specifically in fast fibers. First the authors must show the protein levels of FAM210a in both fast, slow and mixed fast-slow muscles. Then for example in Figure S1C EDL, GAS and SOL muscles must be included.

      Answer: This is related to the misunderstanding of the Myl1Cre model. We understand the reviewer’s concern and therefore isolated proteins from different muscles in WT and Fam210aMKO mice at 4-weeks and checked the expression level of FAM210A. We have shown that regardless of fast or slow muscles, FAM210A was deleted.

      The blot in general must be repeated since it has poor quality (continuum of FAM210a band in the samples).

      Answer: We thank the reviewer for this suggestion and increase the data quality. We have changed the original blot with the following blots showing that FAM210A was not deleted in other non-muscle tissues (Figure S1C).

      Please provide staining of TA, GAS and SOL muscles to show how Myl1CRE-directed deletion of FAM210a affect the different myofibers.

      Answer: This point is also related to assumption that Myl1Cre only induce deletion in fast myofibers. We have done staining in both EDL and SOL muscle to show the relative changes in myofiber compositions. We found that the myofibers in EDL and SOL muscle have shifted to a more oxidative type upon Fam210a KO (Figure S3).

      In Figure 2F where decreased TA muscle weight was showed in the Fam210aMKO mice, the authors must include also the other muscles (EDL, GAS and SOL).

      Answer: We thank the reviewer for helping us be more rigorous on the phenotype examination. We understand that the reviewer initially raised this question because of the concern on Myl1Cre model. Now that we have shown the MylCremarks both the fast and slow muscles, we believe this question is no longer a concern. Besides, to indirectly answer the question, we would like for the reviewer to appreciate the size difference of the EDL as well as the SOL muscle in Figure S3 in the manuscript. As can be seen from the images, the size of the SOL muscles in the KO was significantly reduced compared to the WT, speaking in favor of the KO effect on slow muscles.

      In general, since the HSA-CRE model is generally used for gene manipulation in skeletal muscles the authors must characterize their model considering that the myosin light chain 1 promoter Myl1-Cre is mainly active in postmitotic type II myofibers. The last model can also give advantage for mosaic gene manipulation in muscles with mixed fiber types.

      Answer: We thank the reviewer for bringing this point up. We hope by the multiple lines of evidence that we provided in the previous questions, we can convince the reviewer that the KO model using the Myl1Cre does not lead to a mosaic gene manipulation in the muscle. On the contrary, the KO model is a homogeneous KO in both fast and slow muscles.

      Line 118-119 Fam210a level is positively corelated with muscle mass, as it is reduced in muscle atrophy conditions and increased in muscle hypertrophy conditions. Fig 1: I don't like since there are many different models in which the muscle mass reduction is associated with different mechanisms. Then independently of mechanisms associated with changes in muscle mass Fam210a is always linked to? Which common mechanism can explain this?

      Answer: We understand that the reviewer would like to pursue a conserved mechanism governing muscle mass maintenance, however, we by no means wanted to make a direct causal relationship between FAM210A level and different muscle disease/atrophy conditions. Indeed, the atrophic conditions presented have different mechanisms leading to muscle mass reduction, yet we wanted to present the possible connection that Fam210a level and muscle mass are co-regulated, and we later confirmed by KO mouse model that FAM210A KO indeed reduces muscle mass.

      Line 144-146 Hematoxylin and eosin (H&E) staining did not reveal any obvious myofiber pathology in the Fam210aMKO mice up to 8 weeks (Figure 2G). I totally disagree! It seems that there is more inflammation upon deletion of Fam210aMKO. Please check it.

      Answer: We thank the reviewer for pointing this out to help us more rigorously describe our results. We have changed the wording to better reflect the changes observed with H&E images.

      Fig3E-L there is a huge difference between EDL and SOL. The authors can't avoid to discuss their data considering the real expression of CRE upon Myl promoter: specific deletion in fast fibers. I think that the data in FIGS3 are very important and must be linked to data in Fig3. Organize in a different way all the presented data to really describe what is happening upon deletion of Fam210a.

      Again, the authors MUST organize better their data in the manuscript: to each paragraph must correspond data in the main figures. For example: at Line 189 Fam210aMKO mice exhibit systemic metabolic defects and at Line 208 Fam210aMKO increases oxidative myofibers and decreases glycolytic myofibers. These two paragraphs discuss data showed only in supplementary figures.

      Answer: We thank the reviewer for this suggestion. As shown in the previous responses, the Myl1Cre indeed induce efficient deletion of Fam210a in slow muscles. Therefore, we did not consider this to be a myofiber-specific deletion model. We consider these two results as the effect of a mitochondrial protein (FAM210A) on the myofiber metabolism (independent of myofiber type specific deletion), and that the deletion of Fam210a results in mitochondrial stress, which can lead to myofiber switch (Figure S3).

      Physical activity mast be monitored. Show respiratory exchange ratio (RER = VCO2/VO2) and discuss the results.

      Answer: We thank the reviewer for this suggestion. By these results, we would like to demonstrate that muscle homeostasis is important for the systemic metabolism, disruption of muscle mass maintenance in the Fam210aMKO mice leads to defects in the whole-body metabolism. We have now included the RER results (Figure S2F, S2G). The results show that the Fam210aMKO mice had significantly lower RER (VCO2/VO2) value at daytime, indicating that the mice rely more on utilizing fat as the fuel source. This is consistent with the proteomics results (Figure 5K) that the Fam210aMKOmice have increased FAO pathway. Unfortunately, our metabolic chamber does not have the capacity to monitor activity. We instead include data on heat production (Figure S2E).

      "Fam210aMKO increases oxidative myofibers and decreases glycolytic myofibers". The data mast be associated with the evaluation of the expression levels of FAM210 in different fiber type to really understand what is happening upon FAM210a loss.

      Answer: We understand the reviewer’s concern on the different expression level of Fam210a as well as the KO efficiency using the Myl1Cre model. We have shown that Fam210a is knocked out in fast and slow muscles, therefore, we did not consider the effects on fast and slow myofibers separately.

      As SDH activity in type 1 fibers is higher than type 2 the and since the authors are using a model in which Fam210a is deleted only in type 2 fiber they should understand what is happening: fiber 1

      Answer: We agree with the reviewer that the SDH activity is different in different myofibers. We have shown by western blot that FAM210A was similarly KO in both fast and slow muscles. When we performed fiber type staining in EDL and SOL muscle, we saw that there was a shift towards the slower myofiber types both in the EDL and SOL muscle, due to mitochondrial defects.

      Associate a cox assay with the sdh assay

      Answer: We thank the reviewer for this suggestion. We have shown by SDH staining as well as seahorse experiments using isolated mitochondria that the complex II activity was impaired in the muscle. We understand the reviewer would like to see a COX assay to show the defects of the mitochondrial function. Though we were not able to perform the COX assay, we showed from other aspects that the mitochondrial function was impaired by running WB of the mitochondrial encoded proteins (ATP6, MTCO2, mtCYB) and showed these proteins were decreased with ages. Along with the morphological changes of the mitochondria shown by electron microscope (Figure 5 and Figure S5), we conclude that these changes must have impacted mitochondrial function.

      Figure 4b blot tubulin and FAM210a look strange. Look especially at first and second and fourth form the left side.

      Answer: We are sorry about the mistake in the images, we have changed the Tubulin blot in the Oxphos blots.

      Figure 4B OXPHOS protein levels look similar between wt and KO. Include the quantification with the significance (min 3-5 mice per genotype).

      Answer: we have quantified the change between WT and KO on different proteins (Reponse Figure 8).

      Response Figure 8. Quantification of the OxPHOS proteins in WT and Fam210aMKO muscle at different ages.

      Quantification of the blots showed that indeed the mitochondrial proteins were decreased in the Fam210aMKO. The change of mitochondrial encoded protein MTCO1 was earlier detected in the Fam210aMKO.

      Provide TEM analysis for SOL muscle. I would understand whether mitochondria are differently affected in fast and slow muscles.

      Answer: We understand the reviewer was originally concerned about the KO efficiency of Fam210a in fast and slow muscles, based on the assumption about the MylCre model. We have shown that the FAM210A protein was similarly depleted in both fast and slow muscles by western blot. In this case, we would speculate that the mitochondrial change in fast and slow muscles would be similar because the mitochondrial changes were due to the inherent defects in the mitochondria.

      In all experiments must be clear which muscle type or types was/were used:

      Line 268: "isolated from WT and Fam210aMKO muscles at 6 weeks of age".

      Line 587 "Muscle lysate acetyl-CoA contents"

      For Seahorse Mitochondrial Respiration Analysis at Line 599 "isolated mitochondria from muscle"

      For TCA cycle metabolomics at Line 615 "muscle tissue was weighed and homogenized"

      For SCS activity assay at Line 632 "mitochondria from muscles were isolated"

      For LC-MS/MS at Line 647 "Mitochondria were purified from skeletal muscles and subjected to proteomics analysis".

      For Ribosome isolation at Line 676 "Skeletal muscle from mice"

      For Polysome profiling experiment at Line 696 "muscle tissues from mice were dissected"

      It is important to know which muscles were used since confounding effects of the specific deletion of FAM210a in type 2 fibers must be identified and discussed.

      Answer: We thank the reviewer for considering the different muscle groups in our mouse model. For experiments requiring a large amount of muscle tissue, such as ribosome isolation, mitochondrial isolation and polysome profiling, we used all the muscles from the mouse. For WB experiments, we used the TA muscle. We have included this information in the method section in the manuscript. Since we have shown that FAM210A was similarly depleted in different muscles (see previous responses), we think it is justified to pool muscles from the same mouse.

      Line 296-297 The authors wrote "Consistently, the mRNA levels of Atf4, Fgf21 and the associated transcripts were highly induced in the Fam210aMKO 296 both in the 4-week and 6-week-old muscle samples". Is Fgf21 responsible for the reduction of body weight? (see for example PMID: 28552492, PMID: 28607005 and PMID: 33944779). Measure the circulating Fgf21 protein in Ko and wt mice.

      Answer: We thank the reviewer for this great suggestion. Indeed, Fgf21 can potentially lead to body weight reduction, and this can explain the smaller body weight in our mouse model as well. However, we are more concerned about the muscle changes in our mouse model, therefore we did not further validate the changes of Fgf21 in the circulation.

      After careful considerations on the mechanism proposed in the study, we decided to remove qPCR data showing the modest increase of Fgf21 mRNA level. The removal of this data will not change the conclusions we draw nor lessen the significance of the mitochondria transfer experiment.

      Moreover the authors must check Opa1 total protein level and also the ratio between long and short isoforms. Is Fam210a interacting with Opa1?

      Answer: We thank the reviewer for this interesting question. Another publication from our lab has shown that Fam210a can modulate the cleavage of OPA1 in brown adipose tissue and influence the cold-induced thermogenesis (PMID: 37816711). Indeed, OPA1 deletion in muscle can lead to muscle atrophy and postnatal death at about day 10 (PMID: 28552492) through the induction of UPR (ISR) and the induction of Fgf21. We did not check the interaction between FAM210A and OPA1 in the muscle context, and FAM210A was not found to be interacting with OPA1 in brown adipose tissue (PMID: 37816711). However, the focus of this study was the acetylation change and the FAM210A effect on muscle mass maintenance. Therefore, we did not pursue the OPA1 related mechanism in skeletal muscle.

      The final part of the paper is really interesting but need to be discussed knowing exactly the used experimental model. Then check in which fiber types FAM210a is loss.

      Answer: We thank the reviewer for the stringency on the model used. Indeed, the mitochondria can be different from different muscle groups. However, since the muscle isolated from WT and KO mice was properly controlled and therefore can balance the effects of different mitochondria. We have consistently observed the increased acetylation when mutant mitochondria were transferred.

      Regarding the mitochondrial transplantation I'm surprise to see that it happens in the direction of unhealthy mitochondria to healthy cells. Are you able to rescue the phenotype of Fam210a KO cells with healthy mitochondria?

      Answer: We thank the reviewer for bringing this interesting yet important question up! Our mitochondrial transfer results support a “gain-of-function” model where excessive Acetyl CoA produced by the Fam210a-KO mitochondrial induces hyperacetylation. Regarding the question to transfer healthy mitochondria to rescue the KO cells, we reason that even when we transfer the healthy mitochondria to the KO cells, the healthy mitochondria will not stop the mutant mitochondria from making excessive amounts of acetyl-CoA and thus protein acetylation. A clean transfer would require depletion of the mitochondria in the KO cells and concomitant restoring FAM210A level in the KO cells (as the lack of Fam210a gene in the KO cells will eventually convert the transferred mitochondrial into mutants with the normal turnover of FAM210A). This is technically highly challenging and nearly impossible to do. We hope that the reviewer can understand the difficulties.

      Reviewer #3 (Significance (Required)):

      In conclusion, the strength of the presented paper is the novelty: the authors explored the role of FAM210a in skeletal muscle. However, the major limitation is represented by the fact that they did not show in which fiber types Fam210a is knocked out. In fact, the used CRE recombinase expressing model is well-known to be specific for type 2 fibers. Then since mitochondria and metabolism are central in this manuscript and they are different in the fast and slow fiber types, the authors must dissect in details this point.

      Moreover, there are many data but they are not linked each other and discussed properly. The paper must be completely re-organized.

      This manuscript can be interesting for a broad type of audience.

      I'm an expert on mitochondria, metabolism and skeletal muscle.

  4. Mar 2024
    1. of Mars on the sky

      Really -- ALL planets that undergo retrograde as well as why the Sun and the Moon do not. Mars may be the most dramatic example and the timing is such that epicycles can be falsified using Mars most easily, but I think we should be clearer that it's all planetary motion.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all the reviewers for their comments on our manuscript. We have attempted to address all the points raised by the reviewers and are happy to note that the manuscript is significantly strengthened with the additional experiments that we have performed and from significant restructuring of the manuscript.

      Reviewer #1

      Major Comments

      1. The choice of cells looks confusing. Drosophila are indeed widely used in research of neurodegeneration mechanisms, since they well reflect the behavioral characteristics of a wide range of brain diseases, but why authors used insect immune cells to study the effect of mHTT on cellular processes? Huntington's disease has a well-established site of origin, in the spiny neurons of the striatum, and they certainly have a different protein context than in insect cells. __Author's response: __We thank the reviewer for this comment. Patients with Huntington's disorder display a variety of symptoms affecting peripheral, non-neuronal cells, including alterations in the function of immune cells. Hemocytes isolated from Drosophila expressing pathogenic forms of Huntingtin also display altered immune responses. Through our manuscript we explore the effect of Huntingtin aggregates on cellular functions of hemocytes. Additionally, we have now included data showing that we are able to observe similar phenotypes in mammalian cells such as neuronal SHSY5Y and HEK293T (Supp. Fig. 3). This is indicative of similar effects being exerted by Huntingtin aggregates across cell types and organisms. Finally, we demonstrate that we are able to rescue neurodegeneration in the fly eye upon overexpression of either Hip1 or components of the Arp2/3 complex (Fig. 4F), further solidifying our results that Huntingtin aggregates alter CME in an actin-dependent manner and that this largely is responsible for the toxicity. This validates our observations that effects on CME appear to be independent of cell type and that non-neuronal cells such as hemocytes can also be used to study the effects of pathogenic aggregates.

      The interrelationship between mutant huntingtin and actin cytoskeleton and clathrin-mediated endocytosis that are convincingly demonstrated in other earlier studies in the m/s are described in rather morphological level and there is no description of molecular interactions of proteins belonging to three systems considered, Htt (control vs mutant), actin cytoskeleton and CME. Lack of these data renders the morphological observations unsupported

      __Author's response: __Previous data shown in Hosp et al, 2017 indicates that a large number of proteins involved in both actin remodelling and clathrin mediated endocytosis are sequestered within Huntingtin aggregates. While the mechanism of sequestration remains unknown, is has also been observed that loss of Huntingtin results in altered organization of the actin cytoskeleton. We have now added points discussing this in the results section.

      Three last figures of total eight demonstrate the effect of proteins, responsible for the initiation of certain neurodegenerative pathologies, on the activity of clathrin-mediated endocytosis, and on the properties of actin cytoskeletal system, however neither in the abstract nor in the introduction there is no any word about these proteins; in the discussion only a few words are devoted to one of these proteins TDP-43. When starting the article, did the authors plan to enter this data into the manuscript?

      __Author's response: __We have now amended this by revising the abstract and the text.

      It is important to work on the style of the manuscript, the article is difficult to read, it is a collection of data that does not seem related to each other.

      __Author's response: __We have reorganized the manuscript and have improved on the flow to make it easier for the reader. We apologize for the rather tedious and confusing flow in the previous draft.

      Reviewer #2

      This manuscript endeavours to explore the link between mutant Huntingtin, clathrin-mediated membrane transport and the actin cytoskeleton: both its dynamics and overall mechanics. As I read it, it carries the interesting idea that pathogenic protein aggregates alter actin cytoskeletal dynamics by sequestering Arp2/3 nucleator. This has two consequences in the authors' experiments: disruption of clathrin-coated vesicle movement and an increase in cellular stiffness. An interesting question is whether these two effects are related: Is the disruption of vesicular movement due to the change in cytoplasmic stiffness? Or could they be features that both reflect the underlying change in actin dynamics. This may be hard to tease apart and beyond the purview of this manuscript.

      I have some suggestions that could strengthen the MS.

      Major Comments

      1. Further characterizing Arp2/3 sequestration. The notion seems to be that actin nucleators would be sequestered (and inactivated) by mutant protein aggregates, as supported by co-localization studies. In addition, could the authors:
      2. a) Test if the dynamics of Arp2/3 are altered, comparing e.g. Arp3-GFP FRAP in the aggregates vs that elsewhere. Author's response: We indeed attempted the FRAP experiment. However due to some technical difficulties we were not convinced by the extent of FRAP in the transgenic fly line. It appeared as an artifact and we were not comfortable including the data in the manuscript. We have instead provided example files for the reviewer to examine.

      3. b) Test more directly if actin nucleation is altered in cells that have pathogenic mutant aggregates. This could be done by barbed-end labelling (e.g. measuring incorporation of labelled actin in live cells that are lightly permeabilized with saponin). __Author's response: __We have performed barbed-end labeling for HTT Q15 and HTT Q138 expressing cells. Images and quantification have now been added to the revised manuscript as Figures 2H and 2I. While this was a challenging experiment, it was deeply satisfying to observe such dramatic changes indicating a change in the state of the actin cytoskeleton.

      Does manipulating actin nucleation alter cellular mechanics as it does for clathrin-coated vesicle transport? For example, does inhibition of Arp2/3 (e.g. with CK666) increase cellular stiffness and would stiffness be amelioriated in mutant cells if Arp3 is overexpressed?

      __Author's response: __We have used LatA to look at whether alteration in the actin cytoskeleton affects cellular stiffness. We found that disruption of the actin cytoskeleton leads to a decrease in cellular stiffness in WT as well as in HTT Q138 expressing cells (shown in Figure 5 and discussed in the results section). We have also now performed AFM on CK666 treated cells and showed that treatment of CK666 leads to a decrease in cellular stiffness similar to LatA treated cells. This further strengthens our hypothesis that a 'Goldilocks' state of actin remodeling and consequently cellular stiffness is required for CME to proceed. We have not performed AFM on cells overexpressing Arp2/3 in HTT Q138 background. However, we believe that it will rescue cellular stiffness as overexpression of Arp2/3 rescues filopodia formation in HTT Q138 expressing cells (Figure 4E) as well as neurodegeneration. AFM data obtained from CK666 treated cells is now added in Supplementary figure 8.

      Although it may be difficult to determine if the defect in vesicle transport is due to the change in rheology, I wonder if the authors could reinforce their analysis by showing the overall relationship between the two features. It would be interesting if they could plot CCV velocity against elasticity for all the various conditions that they have tested. Would this cumulative analysis be informative?

      __Author's response: __This data is already present across the manuscript as part of different figures. We are not sure whether we can reuse the same data to put it as part of a different figure which plots the relationship between elasticity and CCS velocity. We would be grateful for advice on whether this is allowed and how to mention that the data is also part of different figures.

      Focus of the MS. I think that the MS is a little longer and more discursive than it needs to be. I rather struggled to find the focus of the story (which could well be me). There is a deal of repetition that could be profitably cut (the reader may actually find it easier to follow). As well, some anticipation and summaries could be shortened. The final paragraph of the introduction largely summarizes the paper; it could be shortened quite considerably, so that the reader can get directly into the Results themselves. Similarly, the final paragraph of the results is a summary which could work better elsewhere - perhaps, e.g. at the beginning of the discussion.

      __Author's response: __We have now trimmed and rearranged the text in the manuscript. We have reorganized the manuscript and have improved on the flow to make it easier for the reader. We apologize for the rather tedious and confusing flow in the previous draft. We are open to further suggestions to improve the writing style.

      Specific points

      i) Fig 3E. The changes in F-actin flow revealed by PIV are quite dramatic. How reproducible are these changes. (The data presented were from single cells?) __Author's response: __ Changes in F- actin flow obtained from PIV analysis (now figure 2J, 7E in MS) were performed on atleast 5 cells of each type, and the results were observed to be consistent across all. The representative figure is a true representative of the data observed.

      1. ii) If TPD43, does it also affect Arp2/3? Author's response: __We thank the reviewer for this comment. Unfortunately, __we could not perform this experiment due to the unavailability of a fluorescently tagged TDP43 fly line which which would enable us to visualize whether Arp3 was sequestered within the aggregates.

      Minor points:

      1. a) Fig 5a, b - why change the order on the x-axis? __Author's response: __We have fixed this now. We have removed figure 5b, since, in the revised MS we are only talking about stiffness instead of viscoelastic properties of the cells.

      Reviewer #2

      Overall, I think that the significance of the MS lies in its evidence that sequestration of actin nucleators may be a key effect of mutant protein aggregation, with implications for cellular function. This would provide a useful conceptual framework to understand the cell biological consquences of creating pathogenic protein aggregates.

      __Reviewer #3 __

      Summary

      In the paper, the authors showed that huntingtin aggregates, which play a critical role in initiating neurodegenerative diseases, impair clathrin-mediated endocytosis (CME). Using live cell imaging and AFM, the authors demonstrated that CME is affected by the alteration in actin cytoskeletal organization and cellular viscosity. Further, the authors concluded that there was a strong link between dynamic actin organization and functional CME in the context of neurodegeneration. While the data is interesting and novel, the study in its current form needs major revision before it is accepted.

      Major comments:

      1) Figure 2: The authors should show the compromised actin cytoskeleton structure after Lat A and cytoD treatment to back up the findings.

      __Author's response: __We have included the representative micrographs of compromised cytoskeleton in terms of filopodia formation upon treatment of LatA and CytoD in Supplementary figure 3E.

      2) Figure 2g and 2h: Quantification data of filopodia must be supported with representative images.

      __Author's response: __Figure number has been changed to 2D and 2E. Representative image for the quantification of filopodia has been now included in supplementary figure 3D.

      3) RNAi studies must be performed using control siRNA to check off-target effects.

      __Author's response: __Luc VAL10 was used as a control for all the RNAi experiments. However, data for RNAi is not shown as the phenotype for Luc VAL10 was comparable to WT. We have included Luc VAL10 as a control for Profilin RNAi in the FRAP experiment (Supplementary figure 4C).

      4) The result section needs to be reorganized to maintain flow. In the current format, the results of a similar set of experiments are spread across different figures, making it a bit difficult to understand.

      __Author's response: __We apologize for the inconvenience. This issue has been addressed now.

      5) Figure 3d: The expression level and spatial distribution of HTTQ138 transfection were not convincing compared to the httQ15 expression level and the distribution.

      Author's response Figure 3D (Figure 3A in this MS) shows the data obtained from hemocytes isolated from third instar larva of the same age. These are not transfected cells and are obtained from Drosophila larvae using the same Gal4 driver, Cg-Gal4. Thus, the level of expression will be same. However, the distribution may show a change due to the aggregating nature of HTT Q138, while HTT Q15 is non-aggregating and therefore remains diffused.

      6) Suppl Fig 2a data must be supported using images showing myosin VI distribution in wild-type vs. HTTQ138 transfected cells.

      __Author's response: __This data (Supplementary figure 4D in present MS) has been obtained from genetic knockdown of myosin VI. The aim of the experiment was to show that we see similar effects on CCSs movement as we see upon disruption of the actin cytoskeleton.

      7) Suppl movie videos are not labeled correctly in the source. It is not possible to locate them and know which videos are referred to in the manuscript.

      __Author's response: __We apologize for the inconvenience. This issue has been fixed now.

      8) Page 8: How do HTT aggregates sequester the actin-binding proteins? An explanation should be provided in the result section.

      __Author's response: __Previous data shown in Hosp et al, 2017 indicates that a large number of proteins involved in both actin remodelling and clathrin mediated endocytosis are sequestered within Huntingtin aggregates. While the mechanism of sequestration remains unknown, the types of proteins involved in actin remodelling are diverse and do not represent specific types or classes. We have now added points discussing this in the results section.

      9) Page 10: The authors concluded that "increasing the availability of proteins involved in actin reorganization is capable of restoring CME even in the presence of pathogenic aggregates." Since several actin-associated proteins are involved in actin reorganization, which types/classes of proteins are involved in CME restoration? The authors should expand it in the discussion.

      __Author's response: __As we have only investigated the roles of Hip1 and the Arp2/3 complex, we are confident of only reporting their roles in the context of this manuscript. However, previous data shown in Hosp et al, 2017 indicates that a large number of proteins involved in both actin remodelling and clathrin mediated endocytosis are sequestered within Huntingtin aggregates. While the mechanism of sequestration remains unknown, the types of proteins involved in actin remodelling are diverse and do not represent specific types or classes. Therefore this indicates that modulation of actin, through the sequestration of proteins involved in this process is affected in the presence of Huntingtin aggregates. We have added points detailing this in the results and discussion sections.

      10) The schematic of the proposed model depicting critical steps by which pathogenic proteins inhibit CME is required. It will help readers to understand the molecular mechanism easily.

      Author's response: We have now included a model in the manuscript (Fig. 8B).

      Minor:

      1) Figure panel referencing in the text needs to be more consistent, for example, fig 3e is referred to before fig 3d., and fig 2 panels are referred to before fig. 3 panels.

      __Author's response: __We have reordered the figures and maintained a consistent order throughout.

      2) The authors should use similar phrasing throughout the manuscript to avoid confusion. For instance, either use 'HTTQ138' or 'htt Q138'.

      __Author's response: __We apologize for this. We have now maintained uniform nomenclature through the text.

      3) Page 10: AFM indentation experimental part and its discussion in the result section is unnecessary. Shift it to the 'Materials and Method' section.

      __Author's response: __We have now trimmed this portion and we are now only showing elasticity data and not viscoelasticity.

      4) This statement looks a bit exaggerated. There is not sufficient evidence to support the statement- "It can be said that the cells in general behave like a soft glass. The presence of aggregates lowers the effective temperature pushing it nearer to the glass transition, affecting transport."

      __Author's response: __We have now removed all figures resulting from an analysis that assumes glassy behaviour. Instead, we have now provided a more conventional and well-established analysis to obtain Young's modulus of cells exhibiting different transport properties.

      5) Page 12: What is the basis for selecting proteins Aβ-42, FUSR521C, αSynA30P, αSynA53T, and TDP-43 over other proteins? An explanatory sentence must be added to support the selection.

      __Author's response: __We have modified the text to clarify this point.

    1. The social media platform itself is run with computer programs, such as recommendation algorithms (chapter 12).

      As a student majoring in communication, I am very interested in recommendation algorithms. Because we will find that many times Tik Tok or other software will push us things we like to watch or what we have searched for will always push us relevant content. Sometimes after browsing some clothing, you will even find discounts on the same items in other software. I think this is a good marketing tool but it may have some drawbacks. I am looking forward to studying chapter 12.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to Reviewers

      We are grateful to the three reviewers for their careful and constructive critiques of our preprint. We will address all of their comments and suggestions, which help to make our paper more precise and understandable. In our replies, we use 'Patterson, eLife (2021)' as shorthand for Patterson, Basu, Rees & Nurse, eLife 2021:10.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Novák and Tyson present a model-based analysis of published data that had claimed to demonstrate bistable activation of CDK at the G2/M transition in fission yeast. They point out that the published data does not distinguish between ultra-sensitive (switch-like, but reversible) and bistable (switch-like, but irreversible) activation. They back up their intuition with robust quantitative modeling. They then point out that, with a simple experimental modification, the published experiments could be repeated in a way that would test between the ultra-sensitive and bistable possibilities.

      This is an accurate and concise summary of our paper.

      Therefore, this is a rare paper that makes a specific modeling-based prediction and proposes a straightforward way to test it. As such, it will be of interest to a broad range of workers involved in the fields cell cycle and regulatory modeling.

      We agree that our work will be of interest to a broad range of scientists studying cell cycle regulation and mathematical modeling of bistable control systems.

      Nonetheless, attention to the following points would improve the manuscript. The authors should be more careful about how they describe protein abundance. They often refer to protein level. I believe in every case they mean protein concentration, but this is not explicitly stated; it could be interpreted as number of protein molecules per cell. The authors should either explicitly state that level means concentration or, more simply, use concentration instead of level.

      A valid criticism that has been addressed in the revised version.

      The authors should explain why they include stoichiometric inhibition of CDK by Wee1 in their model. Is it required to make the model work in the wild-type case, or only in the CDK-AF case? My intuition is it should only be required in the AF case, but I would like to know for sure. Also, they should state if there is any experimental data for such regulation.

      Bistability of the Tyr-phosphorylation switch requires 'sufficient' nonlinearity, which may come from the phosphorylation and dephosphorylation reactions that interconvert Cdk1, Wee1 and Cdc25. The easiest way to model these interconversion reactions is to use Hill- or Goldbeter-Koshland functions for the phosphorylation and dephosphorylation of Wee1 and Cdc25, but this approach is not appropriate for Gillespie SSA, which assumes elementary reactions. Both Wee1 and Cdc25 are phosphorylated on multiple sites, which we approximate by double phosphorylation; but this level of nonlinearity is not sufficient to make the switch bistable. In addition, stochiometric inhibition is a well-known source of nonlinearity, and in the Wee1:Cdk1 enzyme:substrate complex, Cdk1 is inhibited because Wee1 binds to Cdk1 near its catalytic site. In our model, stoichiometric inhibition of Cdk1 by Wee1 is required for bistability even in the wild-type case because the regulations of Wee1 and Cdc25 by phosphorylation are not nonlinear enough. There is experimental evidence that stoichiometric inhibition of Cdk1 by Wee1 is significant: mik1D wee1ts double mutant cells at the restrictive temperature (Lundgren, Walworth et al. 1991) are less viable than AF-Cdk1 (Gould and Nurse 1989). Furthermore, Patterson (eLife, 2021) found weak 'bistability' when they used AF-Cdk1 to induce mitosis. This puzzling observation suggests a residual feedback mechanism in the absence of Tyr-phosphorylation. Our model accounts for this weak bistability by assuming that free CDK1 can phosphorylate and inactivate the Wee1 'enzyme' in the Wee1:Cdk1 complex, which makes CDK1 and Wee1 mutual antagonists. This reaction is based on formation of a trimer, Cdk1:Wee1:Cdk1, which is possible since CDK1 phosphorylation of Wee1 occurs in its N-terminal region, which lies outside the C-terminal catalytic domain of Wee1 (Tang, Coleman et al. 1993). These ideas have been incorporated into the text in the subsection describing the model (see lines120-125).

      The authors should explicitly state, on line 131, that the fact that "the rate of synthesis of C-CDK molecules is directly proportional to cell volume" results in a size-dependent increase in the concentration of C-CDK.

      The accumulation of C-CDK molecules in fission yeast cells is complicated. In general, we may assume that larger cells have more ribosomes and make all proteins faster than do smaller cells. Absent other regulatory effects, the number of protein molecules is proportional to cell volume, and the concentration is constant. But, in Patterson's experiments, the number of C-CDK molecules is zero at the start of induction and rises steeply thereafter (see lines 147-148), and the rate of increase (#molec/time) is proportional to the size of the growing cell.

      The authors should explain, on line 100, why they are "quite sure the bistable switch is the correct interpretation".

      Line 105-106: "Although we suspect that the mitotic switch is bistable,.."

      On line 166, include the units of volume.

      Done

      On lines 152 and 237, "smaller protein-fusion levels "should be replaced with "lower protein-fusion concentrations".

      Done

      **Referee cross-commenting** *I concur with the other two reviews. *

      Reviewer #1 (Significance (Required)): *The paper is significant in that it points out an alternative interpretation for an important result in an important paper. Specifically, it points out that the published data is consistent with activation of CDK at the G2/M transition in fission yeast could be ultra-sensitive (switch-like, but reversible) instead of bistable (switch-like, but irreversible). The distinction is important because it has been claimed, by the authors of the submitted manuscript among others, that bistability is required for robust cell-cycle directionality. *

      We agree with this assessment.

      However, activation of CDK at the G2/M transition in other species has been shown to be bistable and the authors state that they are "quite sure the bistable switch is the correct interpretation". So, the paper is more likely an exercise in rigor than an opportunity to overturn a paradigm.

      We were the first authors to predict that the G2/M switch is bistable (J. Cell Sci., 1993) and among the first to prove it experimentally in frog egg extracts (PNAS, 2004). Our models (Novak and Tyson 1995, Novak, Pataki et al. 2001, Tyson, Csikasz-Nagy et al. 2002, Gerard, Tyson et al. 2015) of fission yeast cell-cycle control rely on bistability of the G2/M transition; so, understandably, we believe that the transition in fission yeast is a bistable switch. But the 'bistable paradigm' has never been directly demonstrated by experimental observations in fission yeast cells. The Patterson paper (eLife, 2021) claims to provide experimental proof, but we demonstrate in our paper that Patterson's experiments are not conclusive evidence of bistability. Furthermore, we suggest that a simple change to Patterson's protocol could provide convincing evidence that the G2/M switch is either monostable or bistable. We are not proposing that the switch is monostable; we would be quite surprised if the experiment, correctly done, were to indicate a reversible switch. Our point is simply that the published experiments are inconclusive. The point we are making is neither a mere 'exercise in rigor' nor a suggestion to 'overturn a paradigm.' Rather it is a precise theoretical analysis of a central question of cell cycle regulation that should be of interest to both experimentalists and mathematical modelers.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Summary: The manuscript asks whether the data reported in Patterson et al. (2021) is consistent with a bistable switch controlling the G2/M transition in fission yeast. Patterson et al. (2021) use an engineered system to decouple a non-degradable version of Cyclin-dependent kinase (CDK) from cell growth and concomitantly measure CDK activity (by the nuclear localization of a downstream target, Cut3p). They observe cells with indistinguishable CDK levels but two distinct CDK activities, which they posit shows bistable behavior. In this study, the authors ask if other models can also explain this data. The authors use both deterministic and Gillespie based stochastic simulations to generate relationships between CDK activities and protein levels for various cell sizes. They conclude that the experiments performed in Patterson et al. are insufficient to distinguish between a bistable switch and a reversible ultrasensitive switch. They propose additional experiments involving the use a degradable CDK construct to also measure the inactivation kinetics.

      This is an accurate summary of our paper.

      They propose that a bistable switch will have different forward (OFF->ON) and backward (ON->OFF) switching rates. A reversible ultrasensitive switch will have indistinguishable switching rates.

      Our analysis of Patterson's (2021) experiments is based on the well-known fact that the threshold for turning a bistable switch on is significantly different from the threshold for turning it off (in Patterson's case, the 'threshold' is the level of fusion protein in the cell when CDK is activated), whereas for a reversible, ultrasensitive switch, the two thresholds are nearly indistinguishable. The 'rate' at which the switch is made is a different issue, which we do not address explicitly. In the experiments and in our model, the switching rates are fast, whether the switch is bistable or monostable. The results are interesting and worth publication in a computational biology specific journal, as they might only appeal to a limited audience.

      We think our results should also be brought to the attention of experimentalists studying cell cycle regulation, because Patterson's paper (eLife, 2021) presents a serious misunderstanding of the existence and implications of 'bistability' of the G2/M transition in fission yeast. Whereas Patterson's work is an elegant and creative application of genetics and molecular biology to an important problem, it is not backed up by quantitative mathematical modeling of the experimental results. In that sense, Patterson's work is incomplete, and its shortcomings need to be addressed in a highly respected journal, so that future cell-cycle experimentalists will not make the same-or similar-mistakes.

      Several ideas need to be clarified and additional information needs to be provided about the specific parameters used for the simulations: Major comments: #1 The parameters need to be made more accessible by means of a supplementary table and appropriate references need to be cited.

      Two new supplementary tables (S1 and S2) summarize the dynamic variables and parameter values.

      It is not clear why Michaelis Menten kinetics will not be applicable to this system. Has it been demonstrated that the Km s of the enzymes are much greater than the substrate concentrations for all the reactions? If yes, please cite.

      MM kinetics are not appropriate for such protein interaction networks because one protein may be both an enzyme and a substrate for a second protein (e.g., Wee1 and CDK, or Cdc25 and CDK). So, the condition for validity of MM kinetics (enzyme concen ≪ substrate concen) cannot be satisfied for both reactions. Indeed, enzyme concen ≈ substrate concen is probably true for most reactions in our network. Hence, it is advisable to stick with mass-action rate laws. Furthermore, MM kinetics are a poor choice for 'propensities' in Gillespie SSA calculations, as has been shown by many authors (Agarwal, Adams et al. 2012, Kim, Josic et al. 2014, Kim and Tyson 2020).

      It will not be surprising if the simulation with Michaelis Menten would alter the dynamics shown in this study. A reversible switch with two different enzymes (catalyzing the ON->OFF and OFF->ON transitions) having different kinetics can give asymmetric switching rates. This would directly contradict what has been shown in Figure 7A-D.

      We don't follow the reviewer's logic here. The two transitions, off → on and on → off, are already driven by different molecular processes (dephosphorylation of inactive CDK-P by Cdc25 and phosphorylation of active CDK by Wee1, respectively). Positive feedback of CDK activity on Cdc25 and Wee1 (++ and −−, respectively) causes bistability and asymmetric switching thresholds. Switching rates, which are determined by the kinetic rate constants of the up and down processes, are of secondary importance to the primary question of whether the switch is monostable or bistable.

      #2 Line 427: The authors use a half-time of 6 hours in their model as Patterson et al. used a non-degradable construct. It is not clear why dilution due to cell growth has not been considered. The net degradation rate of a protein is the sum of biochemical degradation rate and growth dilution rate. The growth dilution rate seems significant (140 mins doubling time or 0.3 h-1 dilution rate) relative to assumed degradation rate (0.12 h-1). Please clarify why was the effect of dilution neglected in the model or show by sensitivity analysis this does not change the predicted CDK activation thresholds.

      The reviewer highlights an important effect, but it is not relevant to our calculations. In the deterministic model used to calculate the bifurcation diagrams, both cell volume and the concentration of the non-degradable Cdc13:Cdk1 dimer are kept constant; therefore, there is no dilution effect. The stochastic model deals with changing numbers of molecules per cell; the dilution effect is taken into account by the appearance of cell volume, V(t), at appropriate places in the propensity functions. In other words: in the deterministic model, which is written for concentration changes, the dilution term, −(x/V)(dV/dt), is zero because V=constant; in the stochastic model, written in terms of numbers of molecules, dilution effects are implicit in the propensity functions.

      *#3 Line 402 The authors state that the production rate of the Cdk protein is 'assumed' proportional to the cell volume. The word 'assumed' is incorrect here as a simple conversion of concentration-based differential equation (with constant production rate) to molecular numbers would show that production rate is proportional to the volume. This is not an assumption. *

      Correct; we modified the text (see line 450-462). The role of cell volume in production rate is more relevant to the case of Cdc25, where we assume that its production rate, Δconcentration/Δt, is proportional to V, because the concentration of Cdc25 in the cell increases as the cell grows. We added two references (Keifenheim, Sun et al. 2017, Curran, Dey et al. 2022) to justify this assumption. In the stochastic code, the propensity for synthesis of Cdc25 molecules is proportional to V2.

      #4 Line 423 Please cite the appropriate literature that shows that fission yeast growth during cell division is exponential. If the dynamics are more complicated, involving multiple phases of growth during cell division, please state so.

      We now acknowledge that volume growth in fission yeast, rather than exponential, is bilinear with a brief non-growing phase at mitosis (Mitchison 2003). However, we suggest that our simplifying assumption of exponential growth is appropriate for the purposes of these calculations. See line 473-476: "In our stochastic simulations, we assume that cell volume is increasing exponentially, V(t) = V0eμt. Although fission yeast cells actually grow in a piecewise linear fashion (Mitchison 2003), the simpler exponential growth law (with doubling time @ 140 min) is perfectly adequate for our purposes in this paper.."

      *#5 Line 250 The authors convert the bistable version of the CDK switch to reversible sigmoidal by assuming that Wee1 and Cdc25 phosphorylation is proportional to the CDK level rather than activity, which seems biochemically unrealistic. This invokes an altered circuit architecture where inactive CDK has enough catalytic activity to phosphorylate the two modifying enzymes (Wee1/Cdc25) but not enough to drive mitosis. This might be possible if the Km of CDK for Wee1/Cdc25 is lower relative to other downstream substrates that drive mitosis. The authors can reframe this section of the paper to state this possibility, which might be interesting to experimentalists. *

      The reviewer is correct that the molecular biology underlying our 'reversible sigmoidal' model is biochemically unrealistic. But, in our opinion, this is the simplest way to convert our bistable model into a monostable, ultrasensitive switch while maintaining the basic network structure in Fig. 1. Our purpose is to show that a monostable model-only slightly changed from the bistable model-can account for Patterson's experimental data equally well. If Nurse's group modifies the experimental protocol as we suggest and their new results indicate that the G2/M transition in fission yeast is bistable, then our reversible sigmoidal model, having served its purpose, can be forgotten. If they show that the transition is not bistable, then both experimentalists and theoreticians will have to think about biochemically realistic mechanisms that can account for the new data...and everything else we already know about the G2/M transition in fission yeast.

      #6 It is difficult to phenomenologically understand a bistable switch just based on differences in activation and inactivation thresholds. For example, a reversible ultrasensitive switch also shows a difference in activation and inactivation thresholds (Figure 7D). How much of a difference should be expected of a bistable switch versus reversible switch?

      We show how much of a difference can be expected by contrasting Fig. 7 to Fig. 8. For the largest cells (panel D of both figures), the difference is small and probably undetectable experimentally. For medium-sized cells (panel C), the difference is larger but probably difficult to distinguish experimentally. Only the smallest cells (panel B) provide an opportunity for clearly distinguishing experimentally between monostable and bistable switching.

      *Moreover, as the authors clearly understand (line 275), time-delays in activation and inactivation reactions can inflate these differences. In the future, if the authors can convert the equations to potential energy space as done in Acar et al. 2005 (Nature 435:228) in Figure 3c-d, it will be useful. Also, predicting the distribution of switching rates from the Gillespie simulation might be informative and can be directly compared to experimental measurements in the future (if the Cut3p levels in nucleus and cytosol equilibrates fast enough or other CDK biosensors are developed). *

      The famous paper by Acar et al. (2005) is indeed an elegant experimental and theoretical study of bistability ('cellular memory') in the galactose-signalling network of budding yeast. We have included a comparison of Patterson et al. with Acar et al. in our Conclusions section (lines 353-368):

      "It is instructive, at this point, to compare the work of Patterson et al. (2021) to a study by Acar et al. (Acar, Becskei et al. 2005) of the galactose-signaling network of budding yeast. Combining elegant experiments with sophisticated modeling, Acar et al. provided convincing proof of bistability ('cellular memory') in this nutritional control system. They measured PGAL1-YFP expression (the response) as a function of galactose concentration in the growth medium (the signal), analogous to Patterson's measurements of CDK activity as a function of C-CDK concentration in fission yeast cells. In Acar's experiments, the endogenous GAL80 gene was replaced by PTET-GAL80 in order to maintain Gal80 protein concentration at a constant value determined by doxycycline concentration in the growth medium. The fixed Gal80p concentration in Acar's cells is analogous to cell volume in Patterson's experiments. In Fig.3b of Acar's paper, the team plotted the regions of monostable-off, monostable-on and bistable signaling in dependence on their two control parameters, external galactose concentration and intracellular Gal80p concentration, analogous to our Fig.4. Because Acar's experiments explored both the off → on and on → off transitions, they could show that their observed thresholds (the red circles) correspond closely to both saddle-node bifurcation curves predicted by their model. On the other hand, Patterson's experiments (as analyzed in our Fig.4) probe only the off → on transition."

      The purpose of our paper is to show that Patterson-type experiments can and should be done so as to probe both thresholds, as was done by van Oudenaarden's team. They went further to characterize their bistable switch in terms of 'the concept of energy landscapes'. We think it is premature to pursue this idea in the context of the G2/M transition in fission yeast until there is firm, quantitative data characterizing the nature of the 'presumptive' bistable switch in fission yeast.

      Minor comments: #1 Line 2: Please replace "In most situations" to "In favorable conditions"

      Done.

      **Referee cross-commenting** I agree with Reviewer 1 that this falls more under pointing out an alternative interpretation of a single experiment than challenging widely supported orthodoxy about how the eukaryotic cell cycle leaves mitosis.

      As we said earlier, our 1993 paper in J Cell Sci is the source of this orthodox view, and it is widely supported at present because there is convincing experimental evidence for bistability in frog egg extracts, budding yeast cells and mammalian cells. Patterson's paper is not sound evidence for bistability of the G2/M transition in fission yeast cells. It is important for experimentalists to know why the experiments fail to confirm bistability, and important for someone to do the experiment correctly in order to confirm (or, what would be really interesting, to refute) the expectation of bistability at the G2/M transition in fission yeast cells.

      Reviewer #2 (Significance (Required)): Suitable for specialist comp bio journal eg PLoS Comp Bio

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The paper by Novak and Tyson revisits a recent paper from Nurse group on the bistability of mitotic switch in fission yeast using mathematical modelling. The authors extend their older models of mitotic entry check point and implement both deterministic and stochastic version of new model. They show this model does indeed possess bistability and show that combined with stochastic fluctuations the model can show bimodality for the cyclin-CDK activity at a particular cell size consistent with the recent experimental data. However, the authors also show alternative model that has mono-stable ultrasensitivity can also explain the data and suggest experiments that can prove the existence of hysteresis and therefore bistability.

      Right on.

      While the biological implication of the study is well explained, the authors can improve the presentation of their model and the underlying assumptions. I have the following comments and suggestions for improvement of the paper.

        • The cartoon of the mathematical model is confusing at places, for example the wee1-CDK complex according to the equations either dissociates back to wee1 and CDK or gives rise to pCDK and wee1, the arrow below is confusing as it implies it can also give rise to wee1p, the CDK phosphorylation of wee1 is already included in the diagram. Also, the PP2A is put on the arrow for all reactions but for wee1p2 to wee1p its action shown with a dashed line. Also, I wondered if wee1p and wee1p2 can also bind CDK and sequester or phosphorylate CDK?* We are sorry for the confusion and have improved Fig. 1.
      1. The rates and variables in the ODEs are not fully described. Also sometimes unclear what is parameter and what is a variable, I had to look at the code.*

      We now include tables of variables and parameter values, with explanatory notes.

      • The model has quite a few parameters, but these are not at all discussed in the paper. How did the authors come up with these particular set of parameters, has there been some systematic fitting, or tuning by hand to produce a good fit to the data? I could only see the value of the parameters in the code, but perhaps a table with the parameters of the model, what they mean and their value (and perhaps how the values is obtained) is missing.*

      The parameters were tuned by hand to fit Patterson's data, based, of course, on our extensive experience fitting mathematical models to myriad data sets on the cell division cycles of fission yeast, budding yeast, and frog egg extracts. We now provide a table of parameter values.

      • The authors are using the Gillespie algorithm with time varying parameters (as some rates depend on volume and volume is not constant). Algorithm needs to be modified slightly to handle this (see for example Shahrezaei et al Molecular Systems Biology 2008). *

      A valid criticism, but the rate of cell volume increase is very slow compared to the propensities of the biochemical reactions. We write (lines 492-498):

      "In each step of the SSA, the volume of the cell is increasing according to an exponential function, and, consequently, the propensities of the volume-dependent steps are, in principle, changing with time; and this time-dependence could be taken into account explicitly in implementing Gillespie's SSA (Shahrezaei, Ollivier et al. 2008). However, the step-size between SSA updates is less than 1 s compared to the mass-doubling time (140 min) of cell growth. So, it is warranted to neglect the change in V(t) between steps of the SSA, as in our code."

      • The authors correctly point out, ignoring mRNA has resulted in underestimation of noise, however another point is that mRNA life times are short and that also affects the timescale of fluctuations and this may be relevant to the switching rates between the bistable states. *

      A valid point, but to include mRNA's would double the size of the model. Furthermore, we have little or no data about mRNA fluctuations in fission yeast cells, so it would be impossible to estimate the values of all the new parameters introduced into the model. Finally, the switching rates between bistable states (or across the ultrasensitive boundary) are not the primary focus of Patterson's experiments or our theoretical investigations. So, we propose to delay this improvement to the model until the relevant experimental data is available.

      • In the introduction add, "In this study" to "Intrigued by these results, we investigated their experimental observations with a model of bistability in the activation of cyclin-CDK in fission yeast." *

      Done

      Reviewer #3 (Significance (Required)): Overall, this is an interesting study that revisits an old question and some recent experimental data. The use of stochastic modelling in explaining variability and co-existence of cell populations in the context of cell cycle and comparison to experimental data is novel and of interest to the communities of cell cycle researchers, systems biologists and mathematical biologists.

      We agree. Thanks for the endorsement

      References

      Acar, M., A. Becskei and A. van Oudenaarden (2005). "Enhancement of cellular memory by reducing stochastic transitions." Nature 435(7039): 228-232.

      Agarwal, A., R. Adams, G. C. Castellani and H. Z. Shouval (2012). "On the precision of quasi steady state assumptions in stochastic dynamics." J Chem Phys 137(4): 044105.

      Curran, S., G. Dey, P. Rees and P. Nurse (2022). "A quantitative and spatial analysis of cell cycle regulators during the fission yeast cycle." Proc Natl Acad Sci U S A 119(36): e2206172119.

      Gerard, C., J. J. Tyson, D. Coudreuse and B. Novak (2015). "Cell cycle control by a minimal Cdk network." PLoS Comput Biol 11(2): e1004056.

      Gould, K. L. and P. Nurse (1989). "Tyrosine phosphorylation of the fission yeast cdc2+ protein kinase regulates entry into mitosis." Nature 342(6245): 39-45.

      Keifenheim, D., X. M. Sun, E. D'Souza, M. J. Ohira, M. Magner, M. B. Mayhew, S. Marguerat and N. Rhind (2017). "Size-Dependent Expression of the Mitotic Activator Cdc25 Suggests a Mechanism of Size Control in Fission Yeast." Curr Biol 27(10): 1491-1497 e1494.

      Kim, J. K., K. Josic and M. R. Bennett (2014). "The validity of quasi-steady-state approximations in discrete stochastic simulations." Biophys J 107(3): 783-793.

      Kim, J. K. and J. J. Tyson (2020). "Misuse of the Michaelis-Menten rate law for protein interaction networks and its remedy." PLoS Comput Biol 16(10): e1008258.

      Lundgren, K., N. Walworth, R. Booher, M. Dembski, M. Kirschner and D. Beach (1991). "mik1 and wee1 cooperate in the inhibitory tyrosine phosphorylation of cdc2." Cell 64(6): 1111-1122.

      Mitchison, J. M. (2003). "Growth during the cell cycle." Int Rev Cytol 226: 165-258.

      Novak, B., Z. Pataki, A. Ciliberto and J. J. Tyson (2001). "Mathematical model of the cell division cycle of fission yeast." Chaos 11(1): 277-286.

      Novak, B. and J. J. Tyson (1995). "Quantitative Analysis of a Molecular Model of Mitotic Control in Fission Yeast." J Theor Biol 173: 283-305.

      Patterson, J. O., S. Basu, P. Rees and P. Nurse (2021). "CDK control pathways integrate cell size and ploidy information to control cell division." Elife 10.

      Shahrezaei, V., J. F. Ollivier and P. S. Swain (2008). "Colored extrinsic fluctuations and stochastic gene expression." Mol Syst Biol 4: 196.

      Tang, Z., T. R. Coleman and W. G. Dunphy (1993). "Two distinct mechanisms for negative regulation of the Wee1 protein kinase." EMBO J 12(9): 3427-3436.

      Tyson, J. J., A. Csikasz-Nagy and B. Novak (2002). "The dynamics of cell cycle regulation." Bioessays 24(12): 1095-1109.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Dear Editor,

      We have addressed the points and concerns raised by the reviewers and wish to thank them for their effort and time. We agree with all the comments and suggestions, which resulted in a significant improvement of the manuscript. Below, we provide a point-by-point response to all comments.

      Sincerely,

      Anders Hofer, corresponding author


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In their paper Ranjbarian and colleagues provide a tour de force at characterizing dAK from G. intestinalis using both enzymology and structural biology. G. intestinalis does not have RNR and therefore this organism relies on dAK which catalyzes formation of dAMP and ADP from deoxyadenosine and ATP (among other substrate pairs). The authors performed a terrific job at testing this reaction in depth using a recombinant dAK and a battery of various co-substrates (both natural as well as synthetic ones, Table 1). Extensive structural information on dAK was obtained using a combination of X-ray crystallography and cryo-EM. Overall, this work will be paramount aid in better understanding of the reaction mechanism, especially in the context of molecules which can be used as inhibitors of such a crucial enzyme (metabolic vulnerability for this parasite).

      This manuscript, in its current form does not require additional experiments but I would like to have a few aspects corrected/clarified, before it can be accepted for publication:

      Line 30: "whereas the affinities for deoxyguanosine, deoxyinosine and deoxycytidine were 400-2000 times lower." Better not to use term "affinity" when KM or kcat/KM are implied (unless ITC was used to measure true Kds).

      -This is a good point, and we are now using KM values in all instances were actual numbers are implied and only kept the word affinity in cases where it is discussed in more general terms.

      Line 31: "Deoxyadenosine analogues halogenated at the 2- and/or 2´-positions were also potent substrates, with comparable EC50 values as the main drug used today, metronidazole, but with the advantage of being usable on metronidazole-resistant parasites." Not sure this sentence is clear as written.

      -We have now rewritten the sentence as follows: "Deoxyadenosine analogues halogenated at the 2- and/or 2´-positions were also potent substrates with comparable EC50 values on cultured G. intestinalis cells as metronidazole, the first line treatment today, with the additional advantage of being effective against metronidazole-resistant parasites."

      Line 55: "..G. intestinalis (synonymous to G. lamblia and G. duodenalis)..". Very nice that authors provide this information as it is usually a point of confusion i.e. multiple names for the same organism.

      -Thanks a lot, we are happy that you liked it.

      Line 61 and above as well: "Treatment regimes are mainly based on metronidazole and to a lesser extent other 5-nitroimidazoles...". MT is introduced a bit sporadically, and not completely clear which enzyme it inhibits and its mode of action." Common knowledge is that MT is known for its action in aerobic parasites/bacteria and known as Flagyl, where it is mode of action was linked to "activation" due to microaerophilic conditions. Maybe MT can be introduced after text starting from Line 71?

      -The description of metronidazole is adjusted as following: "Metronidazole (Flagyl) is the most commonly used drug to treat giardiasis and selectively kills the parasite and other anaerobic organisms by forming free radicals under oxygen-limited conditions, but it has side effects such as nausea, abdominal pain, diarrhea, and in some cases neurotoxicity reactions."

      Line 70: what is "cyst-wall"?

      -It is a cell wall consisting of three major cyst wall proteins and N-acetylgalactosamine. We have adjusted the sentence to the following to make the term clearer: The trophozoites can also secrete material to form a cyst wall and go through two rounds of DNA replication to form cysts, which contain four nuclei and a 16N genome per cell (4N in each nucleus).

      Line 90: "The reaction is catalyzed by deoxyribonucleoside kinases (dNKs), which are.." I really do not like when in order to find a reaction which is catalyzed by an enzyme in a particular study one needs to dive into the literature, sometimes it requires a lot of time as in most of recent papers on the subject reactions catalyzed are not listed. Please add a Figure or a panel with reactions catalyzed by both dNKs families.

      -It is a good idea and we have now added a figure (Fig. 1), which compares the deoxyribonucleotide metabolism of G. intestinalis with mammalian cells. The different deoxyribonucleoside kinases in the parasite and mammalian cells are included in the figure.

      Line 96: "..was found to have a ~10-fold higher affinity to thymidine.." as I mentioned above I really do not like the usage of "affinity", when actually low KM is implied.

      -It is corrected now (see above).

      Line 113: "This does not match the current knowledge that there are three dNKs in total whereof one completely specific for thymidine. The lack of knowledge about these essential enzymes in the parasite has hampered the understanding of Giardia deoxyribonucleoside metabolism and hence its exploitation as a target for antiparasitic drugs." Very good rationale, as I mentioned above, I think a Figure needs to be introduced that depicts different enzymes involved in deoxyribonucleoside metabolism (both TK1 and non- TK1 members) in Giardia with clearly labeled all known paralogs and corresponding enzymatic reactions.

      -Thanks a lot for the suggestion. Information about the different dNKs in G. intestinalis with mammalian cells for comparison is included in the new figure (Fig. 1).

      Line 132: Odd designation of supplementary figures, usually it is "Fig. S1" etc. The legend for Fig. S1 is not adequate, please add description of species and name of enzymes for all sequences shown. Also each sequences in alignment should start with number (a.a. number) as it is not clear if a full sequence is shown or not. Overall comment about the multiple sequence alignment (relevant to Fig. S1): with such a small number of sequences it is very hard to make any substantial predictions about conserved regions etc.

      -Thanks for the suggestions. We have now included more sequences, sequence numbering, and description of species as well as enzyme names. Some other changes are also that we have now used the same G. intestinalis dAK sequence in the alignment as in the experiments (same strain and accession number), and that we have made a realignment using Clustal W instead of Clustal Omega (gives better alignment of the termini). The designation of supplementary figures is according to the style of PLoS journals.

      Fig. 1 and elsewhere: I will prefer that all bar graphs show individual values + the error bar (if possible);

      -We have now added individual values to the bar graphs.

      I do not have any issues with X-ray data and cryo-EM studies (refinement statistics, particles classification etc).

      **Referees cross-commenting**

      I also agree with all the comments provided by Reviewer 2 and very pleased to see that we were very similar in our evaluations.

      Reviewer #1 (Significance (Required)):

      In their paper Ranjbarian and colleagues provide a tour de force at characterizing dAK from G. intestinalis using both enzymology and structural biology. G. intestinalis does not have RNR and therefore this organism relies on dAK which catalyzes formation of dAMP and ADP from deoxyadenosine and ATP (among other substrate pairs). The authors performed a terrific job at testing this reaction in depth using a recombinant dAK and a battery of various co-substrates (both natural as well as synthetic ones, Table 1). Extensive structural information on dAK was obtained using a combination of X-ray crystallography and cryo-EM. Overall, this work will be paramount aid in better understanding of the reaction mechanism, especially in the context of molecules which can be used as inhibitors of such a crucial enzyme (metabolic vulnerability for this parasite).

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Ranjbarian et al. investigated a non-TK1-Like deoxyribonucleoside kinase (dNK) found in the protozoan parasite Giardia intestinalis. They used enzyme kinetic assays on heterologously expressed Gi dNK in E. coli to determine which deoxyribonucleotides were most likely physiological substrates for the enzyme. Their characterization revealed that this Gi dNK has a strong affinity to deoxyadenosine. They further investigated the affinity and activity of the dNK on deoxyadenosine analogues, some of which have known pharmaceutical utility. Finally, using a combination of crystallography, cryo-EM, chromatography, and mass photometry, they reveal that unlike other dNKs, Gi dNK forms a tetramer. They characterize important regions required for tetramerization and postulate that this tetramerization evolved to provide Gi dNK with a heightened affinity for deoxyadenosine.

      Major comments and questions:

      • The claims in this manuscript are well-supported, and I found no major issues with experimental methods. • The authors provide a structure of tetrameric dNK and suggest that this tetramer leads to the increased affinity to substrate compared to non-giardia dNKs. They also show through mutations that removing the novel dimerization regions decreases substrate affinity by 100-fold. However, I was left unclear about why the tetramer would lead to such high affinity for substrate compared to two dimers. This is especially notable, since the authors state that there are no signs of cooperativity, which is a common way that oligomerization may lead to heightened affinity. If the authors have no current evidence explaining this, they can consider adding a short amount of discussion speculating on the mechanism and future directions of study. -Thanks for this suggestion. We have now added a section in the last paragraph of the discussion where we speculate on the subject.

      Minor comments and questions:

      • The authors state that dATP acts as a mixed inhibitor and not a simple competitive inhibitor, and that previous studies have shown that this is because the dNTP competes in two locations (line 163). Is it also possible that competitive inhibition + allosteric regulation could be causing this behavior instead? -It is true that this can be theoretically explained in many ways. In fact, many allosteric regulators affect both the Vmax and Km values. However, in all studied dNKs, the dNTP acts as a dual competitor and no proper allosteric regulation with a separate allosteric site has ever been observed so far. We have rephrased this part as following to make it clear: "Mixed inhibition is often the result of allosteric regulation but studies of other dNKs have shown that this is not the case [17]. Instead, the far-end dNTP product gives a dual inhibition where the deoxyribonucleoside moiety competes with the substrate and the phosphate groups mimic those of ATP but coming from the opposite direction."

      • In the introduction (line 93), non-TK1-like dNKs are described as "not structurally related to TK1-like". This left me unclear, are they still interrelated among themselves? -We have added the following sentence for clarification: "The non-TK1-like dNKs are further subdivided into a monophyletic group of canonical non-TK1-like dNKs and a second group with thymidine kinases from Herpesviridae, which are structurally related to the canonical group but share very little amino acid sequence homology."

      • I was left confused by lines 106-116 in the introduction, where the specificities of dNKs in giardia are discussed. This is touched upon again in the discussion, but it was not clear here that there are several deoxyribonucleotides unaccounted for. -We think this should be clear now with the added Fig. 1 where the dNKs are shown.

      • When describing enzyme assays (Line 145), the authors say there is no salt dependence, but there looks to be MgCl2 always included in the assays (presumably for the ATP). -This is a good point and something we have overlooked when the sentence was written (Mg2+ is required). We have now corrected the sentence as follows: "Based on initial enzyme activity studies, it was confirmed that the assay did not have any specific requirements regarding K+, Na+, NH4+, acetate or reducing agents, and that it was linear with respect to time (S2 Fig)."

      • I was confused by the y-axis of Fig 2. How is enzyme activity lower when dAdo is added? I think I read "enzyme activity" as total substrate depleted, when it is actually referring exclusively to the given non-dAdo substrate in each column. -This is a very good point that we seem to have overlooked. We have now adjusted the y-axis title to "Indicated enzyme activity" and added the following sentence to the figure legend: "The recorded enzyme activities are for the substrates indicated on the x-axis (excluding the activity with the competing substrate)."

      • Lines 239 - 255 and Figure 3 were a little unclear to me. Specifically, I was having trouble following in the text which dimer is in the ASU, which is symmetry related, and matching those terms with which are canonical and non-canonical. -We agree with the reviewer and thank them for their comment. In order to improve the presentation of these results, we chose to extensively rearrange the figures and accompanying text. We now present the initial X-ray data together with the cryo-EM data in a new Figure 4 that focuses on the overall architecture of the tetramer. We realize that some of the nomenclature previously used in that figure was, as the reviewer pointed out, confusing and superfluous, and we have now simplified and unified it. The structural details of how the extended N- and C-termini interact with the neighboring subunit have been moved to the new figure 6 in order to present them just before the functional analyses of the consequences of truncating the termini. As a consequence of these changes to the figure layout, we made substantial changes in the organization of the text surrounding these figures, which also led to a clearer presentation. Since the changes to the figures and text are quite substantial, we would like to point out that they are only changes to the presentation, not to the data shown.

      • The authors suggest that in the experiment shown in Figure S9 (Line 285), low activity may be caused by minor impurities. I'm not sure why impurities would lower activity significantly. Could there be other differences in experimental conditions that are at play instead? -The sentence refers to a side activity (dATP dephosphorylation) which is not the normal reaction of deoxyadenosine kinase. We have rephrased the sentence to make it clearer: "The dATP-dephosphorylating activity was several orders of magnitude lower than the regular dAK activity (to phosphorylate deoxyadenosine) and was possibly catalyzed by other enzymes present as minor impurities in the protein preparation."

      • (Optional) From looking at the crystallography stats, I think the authors can potentially push the resolution more. At higher resolutions, Rmerge may become high, but depending on the data collection strategy, Eiger detectors can lead to high Rmerge just out of sheer data redundancy. Cc 1/2 can be a more useful metric in these contexts. -This is a good point and well spotted by the reviewer. Indeed, a CC1/2 of 0.802 suggests that the resolution can be pushed further. However, due to contaminating spots at higher resolutions the statistics significantly worsen when trying to push the resolution beyond 2.1 A, which is why we did not process the data to a higher resolution.

      • For Figure S8, the Polder map feature in Phenix is another option for showing ligand occupancy in an unbiased way. Did the authors try this? -We want to thank the reviewer for suggesting this. We have calculated a polder map using the Polder map feature in Phenix and both the resulting map and correlation coefficients support the presence of a dADP in the active site of monomer I. We added a section to the relative paragraph to include these new findings: "To increase our confidence that dADP was correctly placed within active site I, we calculated a polder map for dADP to test whether the b-phosphate density is correctly attributed or if it rather belongs to the bulk solvent. The resulting polder map and statistics support the placement of dADP in active site I with correlation coefficients of CC1,2=0.7627, CC1,3=0.9424, and CC2,3=0.7423 suggesting that the density does belong to dADP as CC1,3 > CC1,2 and CC1,3 > CC2,3 (S8 Fig.)."

      • It's disappointing that the tetramers show so much preferred orientation in the cryo-EM. With that said, while the nominal resolution is 4.8 Å, I think that with the streakiness the EM structure looks to have worse resolution than that. -We agree that the streakiness of the map is substantial. This is simply a result of the severe anisotropy of the map, which means that the resolution is probably worse than 4.8 Å in the "bad directions" of the map. The supplementary material (S9 Fig) clearly shows the preferred orientations leading to this problem. In the course of this study, we tried several methods to lessen the preferred orientation problem such as using graphene oxide-coated grids and collecting tilted data. However, when we got the crystal structure we saw no point in continuing these efforts. To address the comment of the reviewer, we extended the description of the EM map in the main text to say:

      "Due to strong preferred orientations, it was not possible to get an isotropic, high-resolution 3D structure of dAK using cryo-EM. The resulting 3D map had a nominal resolution of 4.8 Å, but a clearly anisotropic appearance probably reflecting lower resolution in the poorly resolved direction (S9 Fig)."

      **Referees cross-commenting**

      Overall, I agree with Reviewer #1's evaluation, and don't have any further suggestions or thoughts at this time.

      Reviewer #2 (Significance (Required)):

      Medical relevance: G. intestinalis is a parasite that causes 190 million cases of giardiasis per year. While treatable, there is evidence that giardia are developing a resistance to the main treatment at the moment, metronidazole. Thus, the authors provide a compelling case for the medical relevance of their investigation of Gi dNK for further pharmaceutical development. They provide further evidence for this by showing that several deoxyadenosine analogs bind the dNK and inhibit giardia growth. This work represents a very useful first step into a potential avenue for medical development. It's important to note that clinical studies are not within the purview of this research. However, in the discussion, the authors provide several comments on the promise of this avenue for future research.

      Conceptual, technical, and mechanistic relevance: Through biochemical and structural study, the authors provide a compelling framework to understand an enzyme that is very important to the unique lifestyle of giardia parasites. From an evolutionary standpoint, the authors provide insight into how giardia can survive even without major components of de novo DNA synthesis. The authors principally use well-established tools and techniques of the enzymology field. but do so to characterize a unique and previously uncharacterized enzyme system. This enzyme proves to be notable not just for its medical significance, but because it is unique among its family (non-TK1-like deoxynucleotide kinases) in its strong affinity for substrate and tetrameric quaternary structure. One relatively novel technique used in the study is mass photometry, which is a relatively new and exciting way to characterize native proteins at very low concentrations. Using this technique helps the authors overcome a common criticism of structural studies in which the high concentrations or crowding conditions of techniques like crystallography and cryo-EM may be inducing non-physiological oligomers.

      In summary, this work represents a meaningful addition to the protein structure-function literature. While it will principally be of interest to basic/fundamental researchers who study the mechanistic detail of protein function and evolution, it also provides a foundation for future translational work and antiparasitic drug design.

      Reviewer's background: I received my PhD in chemistry studying the structure and function of another enzyme key to DNA metabolism (except in giardia), ribonucleotide reductase. My background is in structural biology and biochemistry. I do not have sufficient expertise to comment on studies performed on G. intestinalis growth and susceptibility to deoxyadenosine analogs.

      • *
    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Roget et al. build on their previous work developing a simple theoretical model to examine whether ageing can be under natural selection, challenging the mainstream view that ageing is merely a byproduct of other biological and evolutionary processes. The authors propose an agent-based model to evaluate the adaptive dynamics of a haploid asexual population with two independent traits: fertility timespan and mortality onset. Through computational simulations, their model demonstrates that ageing can give populations an evolutionary advantage. Notably, this observation arises from the model without invoking any explicit energy tradeoffs, commonly used to explain this relationship.

      The model’s results are based on both numerical simulations and formal mathematical analysis.

      Additionally, the theoretical model developed here indicates that mortality onset is generally selected to start before the loss of fertility, irrespective of the initial values in the population. The selected relationship between the fertility timespan and mortality onset depends on the strength of fertility and mortality effects, with larger effects resulting in the loss of fertility and mortality onset being closer together. By allowing for a trans-generational effect on ageing in the model, the authors show that this can be advantageous as well, lowering the risk of collapse in the population despite an apparent fitness disadvantage in individuals. Upon closer examination, the authors reveal that this unexpected outcome is a consequence of the trans-generational effect on ageing increasing the evolvability of the population (i.e., allowing a more effective exploration of the parameter landscape), reaching the optimum state faster.

      The simplicity of the proposed theoretical model represents both the major strength and weakness of this work. On one hand, with an original and rigorous methodology, the logic of their conclusions can be easily grasped and generalised, yielding surprising results. Using just a handful of parameters and relying on direct competition simulations, the model qualitatively recapitulates the negative correlation between lifespan and fertility without requiring energy tradeoffs. This alone makes this work an important milestone for the rapidly growing field of adaptive dynamics, opening many new avenues of research, both theoretically and empirically.

      We thank the reviewers and editor for highlighting the importance of the work presented here.

      On the other hand, the simplicity of the model also makes its relationship with living organisms difficult to gauge, leaving open questions about how much the model represents the reality of actual evolution in a natural context.

      We presented both in results and discussion how the mathematical trade-offs between fertility and survival time give rise to (xb, xd) configuration representative of existing aging modes.

      In particular, a more explicit discussion of how the specifics of the model can impact the results and their interpretation is needed. For example, the lack of mechanistic details on the trans-generational effect on ageing makes the results difficult to interpret.

      We discussed the role of the transgenerational Lansing effect played to its function, there is no need for a particular mechanism beyond that function of transgenerational negative effect. We reinforce this in the discussion by adding the following sentence “Regarding the nature of the transgenerational effect, our model is agnostic and the mere transmission of any negative effect would be sufficient to exert the function.“

      Even if analytical results are obtained, most of the observations appear derived from simulations as they are currently presented. Also, the choice of parameters for the simulations shown in the paper and how they relate to our biological knowledge are not fully addressed by the authors.

      The long time limit of the system with and without the Lansing effect is based on analytical results later confirmed using numerical simulations. The choice of parameters is explained in the introduction as being the minimum ones for defining a living organism. As for the parameters’ values, our numerical analysis gives a solution for any ib, id, xb and xd on R+, making the choice of initial value a mere random decision.

      Finally, the conclusions of evolvability are insufficiently supported, as the authors do not show if the wider genotypic variability in populations with the ageing trans-generational effect is, in fact, selected.

      We do not show nor claim that evolvability per se is selected for but that the apparent advantage given by this transgenerational effect seems to be mediated by an increased genotypic/phenotypic variability conferred to the lineage that we interpreted as evolvability.

      Recommendations for the authors

      (1) The authors could use the lineage tracing results for the evolvability aspect. Specifically, within subpopulations featuring the Lansing effect, it would be valuable to explore whether individuals with parental age greater than the mortality onset (a > x_d) demonstrate higher fitness compared to individuals with a < x_d. Additionally, an examination of how this variation evolves over time could provide further insights into the dynamics of the proposed model.

      We thank the reviewer for this suggestion. This is an ongoing work in the group, especially in the context of varying environmental conditions.

      (2) In all simulations, I_b = I_d = 1, resulting in total fertility (x_b * I_b) equating to x_b, while x_d is proportional to life expectancy. Considering an exploration of the implications of this parameter setting, the authors could frame x_d as a 'lifespan cost', potentially allowing for the model to be conceptualised in terms of energetic tradeoffs. This might offer additional perspectives on the dynamics of the model and its alignment with biological principles.

      We discuss how the apparent trade-offs given by the model depending on ib and id values can be related to the interpretation of such trade-offs that has been accepted for most of the past century. Our claim here in the discussion is that one does not need such energetic trade-off for the fertility/longevity trade-offs to appear. Such energetic trade-off is not a “biological principle” but merely an accepted interpretation of a fertility/longevity trade-off that is not even a general mechanism.

      (3) Considering the necessity of variation in x_d for the observed patterns, an exploration could be undertaken by the authors to examine a model where x_d is simply variable without inheritance. This could involve centring x_d at some value d with some variance σ_d for all individuals. In such a scenario, it may be observed whether the same convergence of x_b - x_d occurs without requiring x_d to be selected. Furthermore, similar consequences of the Lansing effect could potentially be identified.

      This was done early on during our work and did not show any major changes in the model’s behaviour beyond the time of convergence. We did not include it to the final manuscript because of the low added value to an already long and complex manuscript.

      (4) While it may not be necessary to alter the model itself, it is suggested that the authors consider acknowledging the potential consequences of certain modelling decisions that might be perceived as biologically unrealistic. Notable examples include assumptions such as fertility from birth and zero mortality prior to x_d. These assumptions, such as infertility from birth, could be viewed as distinctive features, and it might be worth mentioning that parental care of offspring could have co-evolved with such features. This is particularly relevant considering the energy tradeoff hypothesis that has been postulated.

      Although inspired from results obtained in Drosophila, mice, nematodes and zebrafish, the model is so far haploid and asexual, thus involving individuals likely more similar to unicellulars. In these conditions, infertility from birth did not seem relevant to us. However, the model and codes are accessible online and we hope that others will use it to address such questions. It is interesting though to notice that ageing appears here without such constraint.

      Additionally, the consideration that all organisms face a non-trivial mortality rate at every age, not solely from physiological causes, reflects the reality within which selection operates.

      We thought this was the best way to reflect, an environment with a limited carrying capacity. A more complex model is under construction to take into account the fact that older individuals might be more sensitive to it than younger ones.

      (5) While acknowledging the technical rigour applied by the authors, it is suggested that further attention be given to conducting a comprehensive 'reality check' associated with the chosen parameters, particularly regarding the biological relevance of the results. For instance, the authors argue that offspring of old organisms do not, on average, live similarly to their parents. However, it is noted that studies in the haploid asexual organism yeast, akin to what the authors model (albeit not necessarily yeast), revealed that the average lifespan of yeast progeny born from young or old mothers is very similar.

      We do not claim that progeny of old parents live less long than that of younger parents on average, we say that it happens in the progeny of physiologically old parents, representing at most 10% of the population in our numerical simulations.

      The authors cite experimental evolution in Drosophila progeny conceived later in the life of the parent, indicating that the onset of mortality in these progeny occurs late, sometimes even after the end of the fertility period (Burke et al., 2016; Rose et al., 2002). While the authors report their own previous studies with divergent results, independent experiments have suggested an increase of x_d following an artificial increase of x_b (Luckinbill and Clare, 1985; Sgro et al., 2000). A more in-depth consideration of these contrasting observations and their potential implications for the current model could enhance the overall robustness of the study.

      The increase of x_d following an artificial increase of x_b is predicted by our model as discussed. The divergence of observations between studies is alas hard to assess.

      (6) To enhance readability and maintain consistency, it is suggested that the authors homogenise the description of key parameters, specifically x_b and x_d, throughout the text. This could contribute to improved clarity and rigour. One recommendation is to refer to x_b consistently as the 'fertility span' and x_d as the 'mortality onset' for the sake of uniformity in terminology.

      We have modified the text accordingly.

      (7) At various points in the text, the assertion is made that observations have indicated a tradeoff between fertility and longevity. It is recommended that the authors provide references or data to substantiate this claim. This addition would contribute to the empirical grounding of the mentioned tradeoff and strengthen the overall support for the assertions made in the study.

      We added the following references to the discussion Lemaitre et al., 2015, Kirkwood, 2005 and Rodrigues and Flatt 2016.

      (8) The statement claiming that the model is 'able to describe all types of ageing observed in the wild' should be moderated. As the authors themselves acknowledge, the model is referred to as a 'toy model,' and it is made clear that it cannot capture, nor is intended to capture, the entire diversity observed in life. Adjusting this statement to reflect the limited scope and purpose of the model would enhance precision and accuracy in the presentation of its capabilities.

      Although a toy model, its possible configurations encompass all the possible configurations described so far across the diversity of ageing throughout the tree of life from negligible senescence with no loss of fertility (x_b and x_d >> 0) to menopause-like configurations (x_b >> x_d) through fast mortality increase post reproduction (x_b = x_d). Replacing our current square functions would allow age-dependant decrease or increase of fertility and/or risks of mortality onsets.

      (9) To bolster the biological relevance of the study, it is strongly recommended that the authors cross-check the results of their simulations with previously published experimental findings. This approach would serve to strengthen the alignment between the model outcomes and observed biological phenomena. Additionally, placing greater emphasis on the biological relevance aspects throughout the text would contribute to a more robust and comprehensive exploration of the study's implications.

      In the present manuscript we have tried to cite a certain number of results from artificial selection experiments on life history traits in order to strengthen the interpretations of our model’s behaviour. There are numerous other studies, going in the same direction or not, but we do not think that it would be relevant to add an exhaustive list of them. Nevertheless, we added Stearns et al., 2000 that adds extrinsic high mortality to the evolution of life history traits.

      (1) For enhanced clarity, it is suggested that the x-axis in Figure 1 be labelled as 'age.' Considering this adjustment could contribute to clearer visual communication of the data.

      We agree with the reviewer and modified the figure accordingly.

      (!!) The addition of graphical legends is recommended for Figures 3-5, as well as the supplementary figures. Including these legends would provide essential context and improve the interpretability of the figures for readers.

      We agree with the reviewer and modified the figure accordingly.

      (12) For improved distinction of the ranges indicated by quantiles in Figure 3, it is suggested that the authors consider enhancing visual clarity. One approach could involve making the middle quantile thicker or using a different line type. Additionally, it is recommended to explore the calculation of the highest density 90% intervals rather than the 1-9 deciles. This adjustment could contribute to a clearer representation of the data distribution in the figure.

      We named the different deciles directly on the figure to improve readability.

      (13) It is observed that the mathematical proofs in Annex 1 are not displaying properly in the PDF. Additionally, there seem to be missing and broken references for the Annex. This issue may be related to LaTeX formatting. The authors could consider revisiting the formatting of Annex 1 to ensure the correct display of mathematical proofs and address the referencing concerns, possibly by checking and rectifying any LaTeX-related issues.

      The latex file of the supplementary was not correctly compiled. It is now corrected.

      (14) There is inconsistency in the text regarding the reference to the Annex, with both 'Annex' and 'Annexe' being used interchangeably. To maintain uniformity, it is suggested that the authors consistently use either 'Annex' or 'Annexe' throughout the text. This adjustment would contribute to a more polished presentation of the supplementary material.

      We corrected them accordingly.

      (15)There appears to be a typographical error in the name of Supplementary Figure 3.

      We corrected it accordingly.

    1. Locke: Everyone has a right to life, liberty, and property Jefferson in the Declaration of Independence [b30]: “We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.” Discussions of “human rights” fit in the Natural Rights ethics framework Key figures:

      I think the natural rights framework provide a view to understand human basic rights, but according on universal principles might not explain or solve complex ethical issues in a globalized context. Moreover, seeing everyone's rights as indivisible and equal seems unrealistic to me because it may overlook the impact of social and economic inequalities.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Wang et al investigated the evolution, expression, and function of the X-linked miR-506 miRNA family. They showed that the miR-506 family underwent rapid evolution. They provided evidence that miR-506 appeared to have originated from the MER91C DNA transposons. Human MER91C transposon produced mature miRNAs when expressed in cultured cells. A series of mouse mutants lacking individual clusters, a combination of clusters, and the entire X-linked cluster (all 22 miRNAs) were generated and characterized. The mutant mice lacking four or more miRNA clusters showed reduced reproductive fitness (litter size reduction). They further showed that the sperm from these mutants were less competitive in polyandrous mating tests. RNA-seq revealed the impact of deletion of miR-506 on the testicular transcriptome. Bioinformatic analysis analyzed the relationship among miR-506 binding, transcriptomic changes, and target sequence conservation. The miR-506-deficient mice did not have apparent effect on sperm production, motility, and morphology. Lack of severe phenotypes is typical for miRNA mutants in other species as well. However, the miR-506-deficient males did exhibit reduced litter size, such an effect would have been quite significant in an evolutionary time scale. The number of mouse mutants and sequencing analysis represent a tour de force. This study is a comprehensive investigation of the X-linked miR-506 miRNA family. It provides important insights into the evolution and function of the miR-506 family.

      The conclusions of this preprint are mostly supported by the data except being noted below. Some descriptions need to be revised for accuracy.

      L219-L285: The conclusion that X-linked miR-506 family miRNAs are expanded via LINE1 retrotransposition is not supported by the data. LINE1s and SINEs are very abundant, accounting for nearly 30% of the genome. In addition, the LINE1 content of the mammalian X chromosome is twice that of the autosomes. One can easily find flanking LINE1/SINE repeat. Therefore, the analyses in Fig. 2G, Fig. 2H and Fig. S3 are not informative. In order to claim LINE1-mediated retrotransposition, it is necessary to show the hallmarks of LINE1 retrotransposition, which are only possible for new insertions. The X chromosome is known to be enriched for testis-specific multi-copy genes that are expressed in round spermatids (PMID: 18454149). The conclusion on the LINE1-mediated expansion of miR-506 family on the X chromosome is not supported by the data and does not add additional insights. I think that the LINE1 related figure panels and description (L219-L285) need to be deleted. In discussion (L557558), "...and subsequently underwent sequence divergence via LINE1-mediated retrotransposition during evolution" should also be deleted. This section (L219-L285) needs to deal only with the origin of miR506 from MER91C DNA transposons, which is both convincing and informative.

      Reply: Agreed, the corresponding sentences were deleted.

      Fig. 3A: can you speculate/discuss why the miR-506 expression in sperm is higher than in round spermatids?

      Reply: RNAs are much less abundant in sperm than in somatic or spermatogenic cells (~1/100). Spermborne small RNAs represent a small fraction of total small RNAs expressed in their precursor spermatogenic cells, including spermatocytes and spermatids. Therefore, when the same amount of total/small RNAs are used for quantitative analyses, sperm-borne small RNAs (e.g., miR-506 family miRNAs) would be proportionally enriched in sperm compared to other spermatogenic cells. We discussed this point in the text (Lines 550-556).

      **Reviewer #2 (Public Review):

      In this paper, Wang and collaborators characterize the rapid evolution of the X-linked miR-506 cluster in mammals and characterize the functional reference of depleting a few or most of the miRNAs in the cluster. The authors show that the cluster originated from the MER91C DNA transposon and provide some evidence that it might have expanded through the retrotransposition of adjacent LINE1s. Although the animals depleted of most miRNAs in the cluster show normal sperm parameters, the authors observed a small but significant reduction in litter size. The authors then speculate that the depletion of most miRNAs in the cluster could impair sperm competitiveness in polyandrous mating. Using a successive mating protocol, they show that, indeed, sperm lacking most X-linked miR-506 family members is outcompeted by wild-type sperm. The authors then analyze the evolution of the miR-506 cluster and its predicted targets. They conclude that the main difference between mice and humans is the expansion of the number of target sites per transcript in humans.

      The conclusions of the paper are, in most cases, supported by the data; however, a more precise and indepth analysis would have helped build a more convincing argument in most cases.

      (1) In the abstracts and throughout the manuscript, the authors claim that "... these X-linked miRNA-506 family miRNA [...] have gained more targets [...] " while comparing the human miRNA-506 family to the mouse. An alternative possibility is that the mouse has lost some targets. A proper analysis would entail determining the number of targets in the mouse and human common ancestor.

      Reply: This question alerted us that we did not describe our conclusion accurately, causing confusion for this reviewer. Our data suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis. In other words, mice never lost any targets compared to humans, but per the miR-506 family miRNA tends to target more genes in humans than in mice.

      We revised the text to more accurately report our data. The pertaining text (lines 490-508) now reads: “Furthermore, we analyzed the number of all potential targets of the miR-506 family miRNAs predicted by the aforementioned four algorithms among humans, mice, and rats. The total number of targets for all the X-linked miR-506 family miRNAs among different species did not show significant enrichment in humans (Fig. S9C), suggesting the sheer number of target genes does not increase in humans. We then compared the number of target genes per miRNA. When comparing the number of target genes per miRNA for all the miRNAs (baseline) between humans and mice, we found that on a per miRNA basis, human miRNAs have more targets than murine miRNAs (p<0.05, t-test) (Fig. S9D), consistent with higher biological complexity in humans. This became even more obvious for the X-linked miR-506 family (p<0.05, t-test) (Fig. S9D). In humans, the X-linked miR-506 family, on a per miRNA basis, targets a significantly greater number of genes than the average of all miRNAs combined (p<0.05, t-test) (Fig. S9D). In contrast, in mice, we observed no significant difference in the number of targets per miRNA between X-linked miRNAs and all of the mouse miRNAs combined (mouse baseline) (Fig. S9D). These results suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis.”

      We also changed “have gained” to “have” throughout the text to avoid confusion.

      (2) The authors claim that the miRNA cluster expanded through L1 retrotransposition. However, the possibility of an early expansion of the cluster before the divergence of the species while the MER91C DNA transposon was active was not evaluated. Although L1 likely contributed to the diversity within mammals, the generalization may not apply to all species. For example, SINEs are closer on average than L1s to the miRNAs in the SmiR subcluster in humans and dogs, and the horse SmiR subcluster seems to have expanded by a TE-independent mechanism.

      Reply: Agreed. We deleted the data mentioned by this reviewer.

      (3) Some results are difficult to reconcile and would have benefited from further discussion. The miR-465 sKO has over two thousand differentially expressed transcripts and no apparent phenotype. Also, the authors show a sharp downregulation of CRISP1 at the RNA and protein level in the mouse. However, most miRNAs of the cluster increase the expression of Crisp1 on a reporter assay. The only one with a negative impact has a very mild effect. miRNAs are typically associated with target repression; however, most of the miRNAs analyzed in this study activate transcript expression.

      Reply: Both mRNA and protein levels of Crisp1 were downregulated in KO mice, and these results are consistent with the luciferase data showing overexpression of these miRNAs upregulated the Crisp1 3’UTR luciferase activity. We agree that miRNAs usually repress target gene expression. However, numerous studies have also shown that some miRNAs, such as human miR-369-3, Let-7, and miR-373, mouse miR-34/449 and the miR-506 family, and the synthetic miRNA miRcxcr4, activate gene expression both in vitro (1, 2) and in vivo (3-6). Earlier reports have shown that these miRNAs can upregulate their target gene expression, either by recruiting FXR1, targeting promoters, or sequestering RNA subcellular locations (1, 2, 6). We briefly discussed this in the text (Lines 605-611).

      (4) More information is required to interpret the results of the differential RNA targeting by the murine and human miRNA-506 family. The materials and methods section needs to explain how the authors select their putative targets. In the text, they mention the use of four different prediction programs. Are they considering all sites predicted by any method, all sites predicted simultaneously by all methods, or something in between? Also, what are they considering as a "shared target" between mice and humans? Is it a mRNA that any miR-506 family member is targeting? Is it a mRNA targeted by the same miRNA in both species? Does the targeting need to occur in the same position determined by aligning the different 3'UTRs?

      Reply: Since each prediction method has its merit, we included all putative targets predicted by any of the four methods. The "shared target" refers to a mRNA that any miR-506 family member targets because the miR-506 family is highly divergent among different species. We have added the information to the “Large and small RNA-seq data analysis” section in Materials and Methods (Lines 871-882).

      (5) The authors highlight the particular evolution of the cluster derived from a transposable element. Given the tendency of transposable elements to be expressed in germ cells, the family might have originated to repress the expression of the elements while still active but then remained to control the expression of the genes where the element had been inserted. The authors did not evaluate the expression of transcripts containing the transposable element or discuss this possibility. The authors proposed an expansion of the target sites in humans. However, whether this expansion was associated with the expansion of the TE in humans was not discussed either. Clarifying whether the transposable element was still active after the divergence of the mouse and human lineages would have been informative to address this outstanding issue.

      Reply: Agreed. The MER91C DNA transposon is denoted as nonautonomous (7); however, whether it was active during the divergence of mouse and human lineages is unknown. To determine whether the expansion of the target sites in humans was due to the expansion of the MER91C DNA transposon, we analyzed the MER91C DNA transposon-containing transcripts and associated them with our DETs. Of interest, 28 human and 3 mouse mRNAs possess 3’UTRs containing MER91C DNA sequences, and only 3 and 0 out of those 28 and 3 genes belonged to DETs in humans and mice, respectively (Fig. S9E), suggesting a minimal effect of MER91C DNA transposon expansion on the number of target sites. We briefly discussed this in the text (Lines 511-518).

      Post-transcriptional regulation is exceptionally complex in male haploid cells, and the functional relevance of many regulatory pathways remains unclear. This manuscript, together with recent findings on the role of piRNA clusters, starts to clarify the nature of the selective pressure that shapes the evolution of small RNA pathways in the male germ line.

      Reply: Agreed. We appreciate your insightful comments.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors conducted a comprehensive study of the X-linked miR-506 family miRNAs in mice on its origin, evolution, expression, and function. They demonstrate that the X-linked miR-506 family, predominantly expressed in the testis, may be derived from MER91C DNA transposons and further expanded by retrotransposition. By genetic deletion of different combinations of 5 major clusters of this miRNA family in mice, they found these miRNAs are not required for spermatogenesis. However, by further examination, the mutant mice show mild fertility problem and inferior sperm competitiveness. The authors conclude that the X-linked miR-506 miRNAs finetune spermatogenesis to enhance sperm competition.

      Strengths:

      This is a comprehensive study with extensive computational and genetic dissection of the X-linked miR506 family providing a holistic view of its evolution and function in mice. The finding that this family miRNAs could enhance sperm competition is interesting and could explain their roles in finetuning germ cell gene expression to regulate reproductive fitness.

      Weaknesses:

      The authors specifically addressed the function of 5 clusters of X-link miR-506 family containing 19 miRNAs. There is another small cluster containing 3 miRNAs close to the Fmr1 locus. Would this small cluster act in concert with the 5 clusters to regulate spermatogenesis? In addition, any autosomal miR-506 like miRNAs may compensate for the loss of X-linked miR-506 family. These possibilities should be discussed.

      Reply: The three FmiRs were not deleted in this study because the SmiRs are much more abundant than the FmiRs in WT mice (Author Response image 1, heatmap version of Fig. 5C). Based on small RNA-seq, some FmiRs, e.g., miR-201 and miR-547, were upregulated in the SmiRs KO mice, suggesting that this small cluster may act in concert with the other 5 clusters and thus, worth further investigation. To our best knowledge, all the miR-506 family miRNAs are located on the X chromosome, although some other miRNAs were upregulated in the KO mice, they don’t belong to the miR-506 family. We briefly discussed this point in the text (Lines 635-638).

      Author response image 1.

      sRNA-seq of WT and miR-506 family KO testis samples.

      Direct molecular link to sperm competitiveness defect remains unclear but is difficult to address.

      Reply: In this study, we identified a target of the miR-506 family, i.e. Crisp1. KO of Crisp1 in mice, or inhibition of CRISP1 in human sperm (7, 8), appears to phenocopy the quinKO mice, displaying largely normal sperm motility but compromised ability to penetrate eggs. The detailed mechanism warrants further investigation in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Lines 84-85: "Several cellular events are unique to the male germ cells, e.g., meiosis, genetic recombination, and haploid male germ cell differentiation (also called spermiogenesis)". This statement is not accurate. Please revise. Meiosis and genetic recombination are common to both male and female germ cells. They are highly conserved in both sexes in many species including mouse.

      Reply: Agreed. We have revised the sentence and it now reads: “Several cellular events are unique to the male germ cells, e.g., postnatal formation of the adult male germline stem cells (i.e., spermatogonia stem cells), pubertal onset of meiosis, and haploid male germ cell differentiation (also called spermiogenesis) (9)” (Lines 83-86).

      Lines 163-164: "we found that Slitrk2 and Fmr1 were syntenically linked to autosomes in zebrafish and birds (Fig. 1A), but had migrated onto the X chromosome in most mammals". This description is not accurate. Chr 4 in zebrafish and birds is syntenic to the X chromosome in mammals. The term "migrated" is not appropriate. Suggestion: Slitrk2 and Fmr1 mapped to Chr 4 (syntenic with mammalian X chromosome) in zebrafish and birds but to the X chromosome in most mammals.

      Reply: Agreed. Revised as suggested.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the significance statement, the authors mention that the mutants are "functionally infertile," although the decrease in competitiveness is partial. I suggest referring to them as "functionally sub-fertile."

      Reply: Agreed. Revised as suggested.

      (2) I will urge the authors to explain in more detail how some figures are generated and what they mean. Some critical information needs to be included in various panels.

      (2a) Figure S1. The phastCons track does not seem to align as expected with the rest of the figure. The highest conservation peak is only present in humans, and the sequence conserved in the sea turtle has the lowest phastCons score. I was expecting the opposite from the explanation.

      Reply: The tracks for phyloP and phastCons are the scores for all 100 species, whereas the tracks with the species names on the left are the corresponding sequences aligned to the human genome. We have revised our figure to make it clearer.

      (2b) Figure 2A and Figure S2C. Although all the functional analysis of the manuscript has been done in mice, the alignments showing sequence conservation do not include the murine miRNAs. Please include the mouse miRNAs in these panels.

      Reply: The mouse has Mir-506-P7 with the conserved miRNA-3P seed region, which was included in the lower panel in Figure S2C. However, mice do not have Mir-506-P6, which may have been lost or too divergent to be recognized during the evolution and thus, were not included in Figure 2A and the upper panel in Figure S2C.

      (2c) Figure S7H. The panel could be easier to read.

      Reply: Agreed. We combined all the same groups and turned Figure S7H (now Figure S6H) into a heatmap.

      (2d) The legend of Figure 6G reads, "The number of target sites within individual target mRNAs in both humans and mice ." Can the author explain why the value 1 of the human "Number of target sites" is connected to virtually all the "Number of target sites" values in mice?

      Reply: Sorry for the confusion. For example, for gene 1, we have 1 target site in the human and 1 target site in the mouse; but for gene 2, we have 1 target site in the human and multiple sites in the mouse; therefore, the value 1 is connected to more than one value in the mouse.

      Reviewer #3 (Recommendations For The Authors):

      CRISP1 and EGR1 protein localization in WT and mutant sperm by immunostaining would be helpful.

      Reply: Agreed. We performed immunostaining for CRISP1 on WT sperm, and the new results are presented in Figure S8D. CRISP1 seems mainly expressed in the principal piece and head of sperm.

      The detailed description of the generation of various mutant lines should be included in the Methods.

      Reply: We added more details on the generation of knockout lines in the Materials and Methods (686701).

      References:

      (1) S. Vasudevan, Y. Tong, J. A. Steitz, Switching from repression to activation: microRNAs can upregulate translation. Science 318, 1931-1934 (2007).

      (2) R. F. Place, L. C. Li, D. Pookot, E. J. Noonan, R. Dahiya, MicroRNA-373 induces expression of genes with complementary promoter sequences. Proc Natl Acad Sci U S A 105, 1608-1613 (2008).

      (3) Z. Wang et al., X-linked miR-506 family miRNAs promote FMRP expression in mouse spermatogonia. EMBO Rep 21, e49024 (2020).

      (4) S. Yuan et al., Motile cilia of the male reproductive system require miR-34/miR-449 for development and function to generate luminal turbulence. Proc Natl Acad Sci U S A 116, 35843593 (2019).

      (5) S. Yuan et al., Oviductal motile cilia are essential for oocyte pickup but dispensable for sperm and embryo transport. Proc Natl Acad Sci U S A 118 (2021).

      (6) M. Guo et al., Uncoupling transcription and translation through miRNA-dependent poly(A) length control in haploid male germ cells. Development 149 (2022).

      (7) V. G. Da Ros et al., Impaired sperm fertilizing ability in mice lacking Cysteine-RIch Secretory Protein 1 (CRISP1). Dev Biol 320, 12-18 (2008).

      (8) J. A. Maldera et al., Human fertilization: epididymal hCRISP1 mediates sperm-zona pellucida binding through its interaction with ZP3. Mol Hum Reprod 20, 341-349 (2014).

      (9) L. Hermo, R. M. Pelletier, D. G. Cyr, C. E. Smith, Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 1: background to spermatogenesis, spermatogonia, and spermatocytes. Microsc Res Tech 73, 241-278 (2010).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The present study by Berger et al. analyzes to what extent memory formation is dependent on available energy reserves. This has been dealt with extensively in the case of aversive memory formation, but only very sparsely in the case of appetitive memory formation. It has long been known that an appetitive memory in flies can only be formed by starvation. However, the authors here additionally show that not only the duration of starvation plays a role, but also determines which form of memory (short- or long-term memory) is formed. The authors demonstrated that internal glycogen stores play a role in this process and that this is achieved through insulin-like signaling in octopaminergic reward neurons that integrates internal energy stores into memory formation. Here, the authors suggest that octopamine plays a role as a negative regulator of different forms of memory.

      The study sheds light on an old question, to what extent the octopaminergic neuronal system plays a role in the formation of appetitive memory, since in recent years only the dopaminergic system has been in focus. Furthermore, the data are an interesting contribution to the ongoing debate whether insulin receptors play a role in neurons themselves or in glial cells. The experiments are very well designed and the authors used a variety of behavioural experiments, genetic tools to manipulate neuronal activity and state-of-the-art imaging techniques. In addition, they not only clearly demonstrated that octopamine is a negative regulator of appetitive memory formation, but also proposed a mechanism by which the insulin receptor in octopaminergic neurons senses the internal energy status and then controls the activity of those neurons. The conclusions are mostly supported by the data, but some aspects related to the experimental design, some explanations and literature references need more clarification and revision.

      (1) Usually, long-term memory (LTM) is tested 24 hours after training. Here, the authors usually refer to LTM as a memory that is tested 6 hours after training. The addition of a control experiment to show that LTM that the authors observe here lasts longer would increase the power of this study immensely.

      We thank the reviewer for this comment, as it helped greatly to clarify the matter.

      We measured memory of control and mutant flies 24 h after the training and included the data into the manuscript (Figure 1B and summarized in a model in Figure 2C). We show that control flies develop an intermediate type of memory, that is depending on the length of starvation either anesthesia-sensitive or resistant. Mutants lacking octopamine develop either anesthesia-sensitive or resistant long-term memory.

      (2) The authors define here another consolidated memory component as ARM, when they applied a cold-shock 2 hours after training. However, some publications showed that LTM is formed after only one training cycle (Krashes et al 2008, Tempel et al 1983). This makes it difficult to determine, whether appetitive ARM can be formed. Furthermore, one study showed that appetitive ARM is absent after massed training (Colomb et al 2009). Therefore, the conclusion could be also, that different starvation protocols, would lead to different stabilities of LTM. Therefore, additional experiments could help to clarify this opposing explanation. From these results, it can then be concluded either that different stable forms of LTM are formed depending on the starvation state, or that two differently consolidated memory phases (LTM, ARM) are formed, as has already been shown for aversive memory. This is also important for other statements in the manuscript, and therefore the authors should address this. For example, the findings about the insulin receptor (is it two opposing memories or different stabilities of LTM).

      The flies indeed develop different types of memory depending on the length of starvation and the internal energy supply.

      Reviewer #2 (Public Review):

      How organism physiological state modulates establishment and perdurance of memories is a timely question that the authors aimed at addressing by studying the interplay between energy homeostasis and food-related conditioning in Drosophila. Specifically, they studied how starvation modulates the establishment of short-term vs long-term memories and clarified the role of the monoamine Octopamine in food-related conditioning, showing that it is not per se involved in formation of appetitive short-term memories but rather gates memory formation by suppressing LTM when energy levels are high. This work clarifies previously described phenotypes and provides insight about interconnections between energy levels, feeding and formation of short-term and long-term food-related memories. In the absence of population-specific manipulation of octopamine signaling, it however does not reach a circuit-level understanding of how these different processes are integrated.

      Strengths

      • Previous studies have documented the impact of Octopamine on different aspects of food-related behaviors (regulation of energy homeostasis, feeding, sugar sensing, appetitive memory...), but we currently lack a clear understanding of how these different functions are interconnected. The authors have used a variety of experimental approaches to systematically test the impact of internal energy levels in establishment of appetitive memory and the role of Octopamine in this process.

      • The authors have used a range of approaches, performed carefully controlled experiments and produced high quality data.

      Weaknesses

      (1) In the tbh mutant flies, Tyramine -to- Octopamine conversion is inhibited, resulting not only in a lack of Octopamine, but also in elevated levels of Tyramine. If and how elevated levels of Tyramine contributes to the described phenotypes is unclear. In the current version of the manuscript, only one set of experiments (Figure 2) has been performed using Octopamine agonist. This is particularly important in light of recent published data showing that starvation modifies Tyramine levels. (2) Octopamine (and its precursor Tyramine) have been implicated in numerous processes, complicating the analysis of the phenotypes resulting from a general inhibition of tbh.

      We thank the reviewer for raising these points. The observed memory defects of the Tbh mutants can be solely explained by loss of octopamine. We included models into the manuscript to illustrate this (Figure 2 C and Figure 7E).

      To address whether the elevated levels of tyramine observed in Tbh mutants interfere with food consumption, we analyzed the effect of increased levels of tyramine and octopamine on food consumption. We included the data (Figure S2). An increase in tyramine levels did not result in a change in food intake, rather the increase in octopamine levels reduced food intake. Our data show that the reduction of food intake observed in starved Tbh mutants is due to the increased internal energy supply.

      (3) The manuscript explores various aspects of the impact of energy levels on food-related behaviors and the underlying sensing and effector mechanism, both in wild-type and tbh mutants, making it difficult to follow the flow of the results.

      We included models illustrating the results to clarify the content of the manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, Berger et al. study how internal energy storage influence learning and memory. Since in Drosophila melanogaster, octopamine (OA) is involved in the regulation of energy homeostasis they focus on the roles of OA. To do so they use the tyramine-β-hydroxylase (Tbh) mutant that is lacking the neurotransmitter OA and study short term memory (STM), long-term memory (LTM) and anesthesia-resistant memory (ARM). They show that the duration of starvation affects the magnitude of both short- and long-term memory. In addition, they show that OA has a suppressive effect on learning and memory. In terms of energy storage, they show that internal glycogen storage influences how long sucrose is remembered and high glycogen suppresses memory. Finally, they show that insulin-like signaling in octopaminergic neurons, which is also related to internal energy storage, suppresses learning and memory.

      This is an important study that extends our knowledge on OA activity in learning and memory and the effects the metabolic state has on learning and memory. The authors nicely use the genetic tools available in flies to try and unravel the complex circuitry of metabolic state level, OA activity and learning and memory.

      Nevertheless, I do have some comments that I think require attention:

      (1) The authors use RNAi to reduce the level of glycogen synthase or glycogen phosphorylase. These manipulations are expected to affect the level of glycogen. Using specific drivers the authors attempt to manipulate glycogen level at the muscles and fat bodies and examine how this affects learning and memory. The conclusions of the authors arise solely from the manipulation intended (i.e. the genetics). However, the authors also directly measured glycogen levels at these organs and those do not follow the manipulation intended, i.e. the RNAi had very limited effect on the glycogen level. Nevertheless, these results are ignored.

      We agreed with the reviewer and repeated the experiments. While we could not detect differences in whole animals, we detected differences in tissues enriched for muscles or fat, e.g. thorax or abdomen. We added the data.

      (2) The authors claim in the summary that OA is not required for STM. However, according to one experiment OA is required for STM as Tbh mutants cannot form STM. In another experiment OA is suppressive to STM as wt flies fed with OA cannot form STM. Therefore, it is very difficult to appreciate the actual role of OA on STM.

      During mild starvation, the internal energy supply is greater in Tbh mutants than in control flies. This information is integrated into the reward system via insulin receptor signaling. Therefore, the association between the odorant and sucrose is not meaningful to the mutants and no STM is formed. At the same time there is no release of octopamine and therefore no repression of LTM. In starved animals, octopamine suppresses food intake (we added the data). This is consistent with a function of Octopamine as a signal for the presence of food. Depending on when the signal comes, this might suppress the formation of STM or LTM.

      (3) The authors use t-test and ANOVA for most of the statistics, however, they did not perform normality tests. While I am quite sure that most datasets will pass normality test, nevertheless, this is required.

      Thanks for pointing this out. We have included a description in the “Materials and Methods” section that explains how we tested the data for normal distribution. We corrected the figure legends accordingly.

      “We used the Shapiro-Wilk test (significance level P < 0.05) followed by a QQ-Plot chart analysis to determine whether the data were normal distributed. “

      (4) While it is logical to assume that OA neurons are upstream to R15A04 DA neurons, I am not sure this really arises from the experiment that is presented here. It is well established that without activity in R15A04 DA neurons there is no LTM. Since OA acts to decrease LTM, can one really conclude anything about the location of OA effect when there is no learning?

      Normally control flies did not form memory 6 h after training, only Tbh mutants. We wanted to investigate what kind of memory develops in Tbh mutants. During the experiments of the manuscript, we kept the training procedure constant.

      (5) It is unclear how expression of a dominant negative form of insulin receptor (InR) in OA neurons can rescue the lack of OA due to the Tbh mutation. If OA neurons cannot release anything to the presumably downstream DA neurons, how can changing their internal signaling has any effect?

      The expression of the dominant negative form of the insulin receptor signals no food or low energy levels and activation of the insulin receptor that there is enough food. The reward is a source of food, but the energy content is not high enough to fill the energy stores. The insulin receptor activation can activate at least three different signaling cascades, one of which might regulate octopamine release.

      While I stressed some comments that need to be addressed, the overall take-home message of the manuscript is supported and the authors do show that the metabolic state of the animal affects learning and memory. I do think though, that some more caution is required for some of the conclusions.

      We added additional data to address the points raised.

      Recommendations for the authors:

      We addressed all points raised by the reviewers, clarified the content or added more data.

      Reviewer #1 (Recommendations For The Authors):

      (1) Throughout the manuscript, the full stop of a sentence is always placed before the references.

      We fixed this.

      (2) I find the English in the manuscript not yet sufficient for publication. I suggest that the authors carefully revise the manuscript. I think if the sentences are structured a little more clearly, this paper has enormous potential to be read by your broad community.

      We agree and revised the manuscript. We hope the manuscript is now clearer.

      (3) Sentences l114 to l117 are misleading. The authors imply that they tested the same flies for changes in odor perception or sucrose sensitivity. I assume that the authors meant that they analyzed different groups of animals.

      We clarified the sentence as follows:

      “To ensure that the observed differences in learning and memory were not due to changes in odorant perception, odorant evaluation or sucrose sensitivity, different fly populations of the same genotypes were tested for their odorant acuity, odorant preference and their sucrose responsiveness (Table S1).”

      (4) In the title as well as in the abstract the influence of octopamine on appetitive memory formation is described in more detail, this is also the main focus of this study. However, in the introduction, the influence of the insulin receptor on memory formation is discussed first. Personally, I would describe this later in the manuscript, ideally in the results section. At this point in the manuscript, this leads to an interruption in the flow of reading.

      Thanks for the suggestion. We changed the order in the introduction.

      (5) The authors could consider, since they only used Drosophila melanogaster, changing "Drosophila melanogaster" to "Drosophila" throughout the manuscript.

      We modified the text accordingly.

      (6) All evaluations and statistical tests are state of the art. However, I have one comment. For each statistical test, a correction should be made depending on the number of tests. However, I could not determine whether this was also done for the parametric or non-parametric one-sample t-test. From the results and the methods section, I would guess not. Here I would recommend a Bonferroni correction or even better a Sidak-Holm correction. Furthermore, the authors could also go into more detail about which non-parametric one-sample t-test they used.

      We described the statistic used in more detail in the material and method section.

      “We used the Shapiro-Wilk test (significance level P < 0.05) followed by a QQ-Plot chart analysis to determine whether the data were normal distributed. For normal distributed data, we used the Student’s t test to compare differences between two groups and the one-way ANOVA with Tukey’s post hoc HSD test for differences between more than two groups. For nonparametric data, we used the Mann-Whitney U test for differences between two groups and for more than two groups the Kruskal-Wallis test with post hoc Duenn analysis and Bonferroni correction. The nonparametric one-sample sign test was used to analyze whether behavior was not based on random choice and differed from zero (P < 0.5). The statistical data analysis was performed using statskingdom (https://www.statskingdom.com).”

      (7) In nearly all figure legends the sentence "The letter "a" marks a significant difference from random choice as determined by a one-sample sign test (P < 0.05; P< 0.01)" occur. This is correctly indexed in the figures. However, I do not understand here what then P < 0.05; P**< 0.01 means. The significance level should be described here. I would strongly recommend the authors to make the definition clearer.

      We corrected this in the figure legends (see also above).

      (8) In Fig. 1B the labelling is a bit confusing. I interpret the two right groups as the mutants for octopamine, but there is still w[1118] in front.

      We modified the Figure 1B.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions

      (1) Assessing the contribution of Tyramine in the observed phenotypes (for example by reducing the levels of Tyramine or its specific receptor) would help understand the contribution of Tyramine in the observed phenotypes.

      See comments above.

      We thank the reviewer for raising these points. The observed memory defects of the Tbh mutants can be solely explained by loss of octopamine. We included models into the manuscript to illustrate this (Figure 2 C and Figure 7E).

      To address whether the elevated levels of tyramine observed in Tbh mutants interfere with food consumption, we analyzed the effect of increased levels of tyramine and octopamine on food consumption. We included the data (Figure S2). An increase in tyramine levels did not result in a change in food intake, rather the increased octopamine levels reduced food intake. Our data show that the reduction of food intake observed in starved Tbh mutants is due to the increased internal energy supply.

      (2) Cell-specific inhibition of octopamine receptors should thus be performed to precisely interpret the observed phenotypes and dissect how interconnected the different phenotypes are, which is the object of this publication.

      We observed that the time point and duration of octopamine application changes the behavioral output. The behavior analyzed depends on pulses of octopamine and differences of the internal energy status. A cell-specific inhibition via RNAi knock down of octopamine receptors might not clarify the issue.

      (3) Defining of streamline and progressively integrating the different observations into a unifying model would improve the clarity and flow of the manuscript.

      We included models explaining the observed results (Figure 2C and Figure 7E).

      Minor comments

      Line 129: Figure 1B should be mentioned, not 2B.

      Figure 1 legend: E should be replaced by C (after A,B).

      Figure S5: what are the arrows pointing to? Why are the Inr foci visible in A not seen in B? It should be mCD8-GFP and not mCD on top of the images.

      We fixed this.

      Reviewer #3 (Recommendations For The Authors):

      Major:

      (1) Can one really conclude from Figure 2A that OA acts on R15A04 DA neurons? It is well established that without activity in these DA neurons there is no LTM. Since OA acts to decrease learning, how one can conclude anything about the location of OA effect when there is no learning? With STM the situation was opposite, OA supported learning and this was abolished when DA neurons were silenced. I think some supporting experiment are required, i.e. how OA affects DA neurons activity or, alternatively, tone down a bit the writing.

      Normally control flies did not form memory 6 h after training, only Tbh mutants. We wanted to investigate what kind of memory develops in Tbh mutants. During the experiments of the manuscript, we kept the training procedure constant. The inhibition of dopaminergic neurons blocks the memory of Tbh mutants. Taken together the duration of the memory, the cold-shock experiments and the inhibition of the dopaminergic neurons, Tbh develops LTM after training. This training does not evoke memory in controls.

      The loss of STM in mildly starved Tbh mutants depends on the integration of the high internal energy levels via InR signaling. Reducing the internal energy levels further by extension of starvation result in STM supporting that OA is not directly involved in the formation of STM.

      (2) Figure 4 requires some clarifications. In Supplementary Figure S2 the authors show that they could not manipulate glycogen levels in muscles. However, in Figure 4B they show that "Increasing glycogen levels in the muscles did not change short-term memory in 16 h starved flies, but the reduction in glycogen significantly improved memory strength (Figure 4B)" (lines 231-233). How can this be reconciled?

      While we could not detect differences in whole animals, we detected differences in glycogen content in body parts enriched with muscles or fat, e.g. thorax or abdomen when using UAS-GlyP-RNAi or UAS-GlyS-RNAi under the control of the respective Gal4 drivers.

      We added the data.

      Likewise, the authors write that "Increasing or decreasing glycogen levels in the fat bodies had no effect on memory performance (Figure 4C)" Line (233-234). However, in Figure S2 they show that they can only increase glycogen levels but not decrease them.

      As explained above the conclusion of Figure 4 "Thus, low levels of glycogen in the muscles upon starvation positively influence appetitive short-term memory, while high levels of glycogen in the muscles and fat body reduce short-term memory" lines 245-246, is not supported by the direct measurements of glycogen presented in Figure S2.

      We added the data showing that the reduction or increase can be measured when analyzing the specific body parts enriched in muscles tissue or fat tissue.

      (3) In cases where mutant flies do not display learning, a control should be done to see if they ate the sugar (with dye). Especially since the genetic manipulation affects metabolism.

      We analyzed how much sucrose the animals consumed in the behavioral test. Tbh and controls fed and there was no difference in feeding behavior between the mutants and the controls.

      “We next determined whether differences in preferences influence sucrose intake during training. Therefore, we measured the sucrose intake of starved flies in the behavioral set up. We used a food-colored sucrose solution and evaluated the presence of food in the abdomen of the fly after two 2 min (Table S1). Flies fed sucrose within 2 min and there was no difference between w1118 and TβhnM18 flies. “

      (4) The use of t-test requires the data to be normally distributed. If I am not mistaken this was not demonstrated for any of the datasets used. I did a quick check on one of the datasets provided in the excel sheet and it is normally distributed. Therefore, please add normality test for all data sets. If some do not pass normality, please use a suitable non-parametric test.

      We added normality test to all data sets and used non-parametric tests for non-normal distributed data. We clarify this in the material and method section and the figure legends.

      (5) The authors show that OA suppresses also STM. This result is in contradiction to previous published results. This by itself is not a problem. However, this result also seems to me in contradiction to the authors own results. According to Figure 1B, OA is required for STM as it absence in the tbh mutant results in loss of STM. According to Figure 2C, OA is reducing STM as wt flies fed with OA just prior to learning do not form STM. This appears in other places in the manuscript as well.

      In addition, in the text lines 178-180, the authors write "A short pulse of octopamine before the training inhibits the STM. Thus, octopamine is a negative regulator of appetitive dopaminergic neuron-dependent long-term memory and can block STM." But in the summary they write "Octopamine is not required for short-term memory, since octopamine deficient mutants form appetitive short-term memory to sucrose and to other nutrients depending on the internal energy status." So, the take-home message regarding OA and STM is unclear.

      The authors need to better clarify this point.

      We clarified these points. See comments above. The loss of memory in Tbh mutants is not due to loss of octopamine, but increased energy levels that changes the reward properties of sucrose.

      (6) The manuscript is very difficult to follow. The authors constantly change between 16 and 40 hours starvation, short term memory, 3 hour memory and 6 hour memory. I think it would have been better to have a more focused manuscript. However, if this is not possible, I recommend preparing a diagram with the different neurons or signaling pathways (i.e. insulin) and how they affect each other. Also, perhaps add to each figure a panel describing exactly the experimental conditions. I think also simplifying the text and adding more conclusions throughout the results section will help the readers to follow. Finally, I think that it would help understanding the conclusions if the authors can add a diagram of the flow that they think occurs. For example, the authors show that glycogen suppresses learning as its reduction increases learning. They also show that InR activity receptor suppresses learning as its KD also increases learning. If I am not mistaken the link between the two is not straight forward (but I may be wrong here). A diagram of the flow would be very helpful.

      We prepared diagrams summarizing and explaining the results.

      Minor

      (1) I may not have understood correctly as I am not sure that I found Table S1.

      Also, there was no legend for Table S1.

      Nevertheless, if I understood correctly, the authors write that "Before the experiments, flies were tested to determine whether they perceived the odorants, preferred one odorant over other and responded to the reward similarly to ensure that the observed differences in behavior were not due to changes in odorant perception or sucrose sensitivity (Table S1)." However, according to the Table that I found it seems that following 40h starvation wt flies show preference to OCT whereas this does not occur for the mutant. Also, it seems that at 16h the mutant has a much higher preference to the odors than after 40h. This is a bit odd. I am also not sure what the balance value refers to. Finally, the mutant shows really low 2M sucrose preference after 40h. In general, this set of experiments requires a bit more explanation.

      I think it is better to show these experiments using graphs and add this to the supplementary figures.

      We clarified the experiments in the result section as follows and added an explanation to the material and method section. We tested the odorant acuity and sucrose preference for all genotypes used in the manuscript and added the data to the Table S1.

      “The flies of the different genotypes sensed the odorants and evaluated them as similar salient in comparison. This is important to a avoid a bias in the situation where flies have to choose between the two odorants after training. They also sensed sucrose. We next determined whether differences in preferences influence sucrose intake during training. Therefore, we measured the sucrose intake of starved flies in the behavioral set up. We used a food-colored sucrose solution and evaluated the presence of food in the abdomen of the fly after two 2 min (Table S1). Flies fed sucrose within 2 min and there was no difference between w1118 and TβhnM18 flies.”

      (2) Line 129 should be Figure 1B

      Is corrected.

      (3) Line 133, Figure 1C, how can one explain the negative reinforcement? I can understand no reinforcement, but negative?

      The effect of glucose might be doses dependent. 0.15 M sucrose is a much closer to a realistic concentration found in fruits than 2 M sucrose and might therefore elicit aversion. When animals are starved enough they might find any food source attractive, even when the concentrations of sucrose is unrealistic.

      (4) Figure 1, why are the graphs different between panel B and C?

      Is corrected.

      (5) In Figure S1, are the TβhnM18 groups differ significantly from zero? I think they are, so better to state this somewhere. If not, the claims in lines 134-135 are not supported by the data.

      We added the significance and added the data to Figure 1.

      Figure S1 legend: there is no A panel. Also "below box blots" should be box plots.

      Thanks for pointing that out. We corrected it.

      (6) It is not clear what is the duration of starvation used in Figure 2A. I assume that 16h and sucrose 2M used were used, but I would state that explicitly.

      We added the information to the figure legends.

      (7) Figure 2A is missing a control of flies with both the driver and UAS shibirets at the permissive temperature.

      We added the controls to the supplement (Figure S1).

      (8) It seems to me that Figure 3B, in which the author state that "Only after 40 h of starvation did TβhnM18 mutants show a similar preference to control sucrose consumption" (line 198) is somewhat in contradiction to Table S1 in which I see Sucrose preference for wt 0.36 and for tbh 0.17. I think this comment arise because I did not understand Table S1 correctly, so please better explain.

      We rewrote this section.

      (9) In Figure 3C, consider not using std as this stands for standard deviation and may be confusing.

      We now use the term “food” instead of “std” and explained in the legend that food means standard fly food.

      We fixed this.

      (10) Please check the Supplementary Figures. I think Figures S2 and S3 are switched.

      We fixed this.

      (11) There is a mistake in Figure S3A. The right column should have another "+" sign.

      Thanks, we fixed this.

      (12) I am somewhat puzzled by Figures 4 and 5. If I understand correctly figure 4B w1118 mef2-G4 is exactly the same experiment as Figure 5A w1118 mef2-G4 and yet in Figure 4B performance index is 0.2 and in Figure 5A about 0.4. According to other comparisons it seems to me that these will be significantly different and yet it is the same experiment.

      They are two independent experiments done at different times. The controls were independently repeated.

      (13) Line 273 should be Figure 5C.

      Is corrected.

      (14) I don't think this is a correct sentence "Virgin females remembered sucrose significantly better than mated females." Line 274.

      Reads now:

      “Virgin females remembered the odorant paired with sucrose significantly better than mated females.”

      (15) Line 340 there is no Figure 1E

      Is fixed (1 C)

      (16) The data excel file is difficult to follow. In Figure 2 there are references to Figure 5. The graphs are pointing to other files. Text is not always in English. It is not clear what W stands for. I recommend making it more accessible.

      We corrected the data excel files.

      (17) The manuscript is difficult to follow. I recommend preparing a diagram with the different neurons or signaling pathways (i.e. insulin) and how they affect each other.

      We improved the data presentation by

      a) adding a model showing the kinetics of memory formation in controls and mutants (Figure 2C)

      b) a model explaining how the internal state is integrated into the formation of memory (Figure 7D).

    1. It is there-fore to be expected that the initial cost of the card system is nota fair criterion of its cost when in working order.

      Setting up and learning a note taking or card index system has a reasonably large up-front cost, but learning it well and being able to rely on it over long periods of time will eventually reap larger and cheaper long-term outcomes and benefits.

      Unless changing systems creates dramatically larger improvements, the cost of change will surely swamp the benefits making the switch useless. This advice given by Kaiser is still as true today as it was in 1908, we tend not to think about the efficiency as much now as he may have then however and fall trap to shiny object syndrome.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This study reports important evidence that infants' internal factors guide children's attention and that caregivers respond to infants' attentional shifts during caregiver-infant interactions. The authors analyzed EEG data and multiple types of behaviors using solid methodologies that can guide future studies of neural responses during social interaction in infants. However, the analysis is incomplete, as several methodological choices need more adequate justification.

      Reviewer #1

      Public Review:

      The authors bring together multiple study methods (brain recordings with EEG and behavioral coding of infant and caregiver looking, and caregiver vocal changes) to understand social processes involved in infant attention. They test different hypotheses on whether caregivers scaffold attention by structuring a child's behavior, versus whether the child's attention is guided by internal factors and caregivers then respond to infants' attentional shifts. They conclude that internal processes (as measured by brain activation preceding looking) control infants' attention, and that caregivers rapidly modify their behaviors in response to changes in infant attention.

      The study is meticulously documented, with cutting-edge analytic approaches to testing alternative models; this type of work provides a careful and well-documented guide for how to conduct studies and process and analyze data for researchers in the relatively new area of neural response in infants in social contexts.

      We are very pleased that R1 considers our work an important contribution to this developing field, and we hope that we have now addressed their concerns below.

      Some concerns arise around the use of terms (for example, an infant may "look" at an object, but that does not mean the infant is actually "attending); collapsing of different types of looks (to people and objects), and the averaging of data across infants that may mask some of the individual patterns.

      We thank the reviewer for this feedback and their related comments below, and we feel that our manuscript is much stronger as a result of the changes we have made. Please see blow for a detailed description of our rationale for defining and analysing the attention data, as well as the textual changes made in response to the author’s comments.

      Recommendations For The Authors

      This paper is rigorous in method, theoretically grounded, and makes an important contribution to understanding processes of infant attention, brain activity, and the reciprocal temporal features of caregiver-infant interactions. The alternative hypothesis approach sets up the questions well (although authors should temper any wording that suggests attention processes are one or the other. That is, certain bouts of infant attention can be guided by exogenous factors such as social input, and others be endogenous; so averaging across all bouts can actually mask the variation in these patterns). I appreciated the focus on multiple types of behavior (e.g., gaze, vocal fluctuations in maternal speech); the emphasis on contingent responding; and the very clear summaries of takeaways after each section. Furthermore, methods and analyses are well described, details on data processing and so on are very thorough, and visualizations aptly facilitate data interpretation. However, I am not an expert on infant neural responses in EEG and assume that a reviewer with such expertise will weigh in on the treatment and quality of the data; therefore, my comments should be interpreted in light of this lack of knowledge.

      We thank R1 for these very positive and insightful comments on our analyses which are the result of a number of years of methodological and technical developmental work.

      We do agree with R1 that we should more carefully word parts of our argument in the Introduction to make clear the fact that shifts in infant attention could be driven by a combination of interactive and endogenous influences. As a result of this comment, we have made direct changes to parts of the Introduction; removing any wording that suggests that these processes are ‘alternative’ or ‘separate’, and our overall aim states: ‘Here, recording EEG from infants during naturalistic interactions with their caregiver, we examined the (inter)-dependent influences of infants’ endogenous oscillatory neural activity, and inter-dyadic behavioural contingencies in organising infant attention’.

      Examining variability between infant attention episodes in the factors that influence the length and timing of the attention episode is an important area for future investigation. We now include a discussion on this on page 38 of the Discussion section, with suggestions for how this could be examined. Investigating different subtypes of infant attention is methodologically challenging, given the number of infant behaviours that would need to inform such an analysis- all of which are time consuming to code. Developing automated methods for performing these kinds of analyses is an important avenue for future work.

      Here, I review various issues that require revision or elaboration based on my reading of what I consider to otherwise be a solid and important research paper.

      Problem in the use of the term attention scaffolding. Although there may be literature precedent in the use of this term, it is problematic to narrowly define scaffolding as mother-initiated guidance of attention. A mother who responds to infant behaviors, but expands on the topic or supports continued attention, and so on, is scaffolding learning to a higher level. I would think about a different term because it currently implies a caregiver as either scaffolding OR responding contingently. It is not an either-or situation in conceptual meaning. In fact, research on social contingency (or contingent responsiveness), often views the follow-in responding as a way to scaffold learning in an infant.

      Yes, we agree with R1 that the term ‘attention scaffolding’ could be confusing given the use of this term in previous work conducted with children and their caregivers in problem-solving tasks, that emphasise modulations in caregiver behaviour as a function of infant behaviour. As a result of this suggestion, we have made direct edits to the text throughout, replacing the term attentional scaffold with terms such as ‘organise’ and ‘structure’ in relation to the caregiver-leading or ‘didactic’ perspective, and terms such as ‘contingent responding’ and ‘dynamic modulation’ in relation to the caregiver-following perspective. We feel that this has much improved the clarity of the argument in the Introduction and Discussion sections.

      Do individual data support the group average trends? My concern with unobservable (by definition) is that EEG data averages may mask what's going on in individual brain response. Effects appear to be small as well, which occurs in such conditions of averaging across perhaps very variable response patterns. In the interest of full transparency and open science, how many infants show the type of pattern revealed by the average graph (e.g., do neural markers of infant engagement forward predict attention for all babies? Majority?). Non-parametric tests on how many babies show a claimed pattern would offer the litmus test of significance on whether the phenomenon is robust across infants or pulled by a few infants with certain patterns of data. Ditto for all data. This would bolster my confidence in the summaries of what is going on in the infant brain. (The same applies as I suggest to attention bouts. To what extent does the forward-predict or backward-predict pattern work for all bouts, only some bouts, etc.?). I recognize that to obtain power, summaries are needed across infants and bouts, but I want to know if what's being observed is systematic.

      We thank R1 for this comment and understand their concern that the overall pattern of findings reported in relation to the infants’ EEG data might obscure inter-individual variability in the associations between attention and theta power. Averaging across individual participant EEG responses is, however, the gold standard way to perform both event-locked (Jones et al., 2020) and continuous methods (Attaheri et al., 2020) of EEG analysis that are reported in the current manuscript. EEG data, and, in particular, naturalistic EEG data is inherently noisy, and averaging across participants increases the signal to noise ratio (i.e. inconsistent, and, therefore, non-task-related activity is averaged out of the response (Cohen, 2014; Noreika et al., 2020)). Examining individual EEG responses is unlikely to tell us anything meaningful, given that, if a response is not found for a particular participant, then it could be that the response is not present for that participant, or that it is present, but the EEG recording for that participant is too noisy to show the effect. Computing group-level effects, as is most common in all neuroimaging analyses, is, therefore, most optimal to examining our main research questions.

      The findings reported in this analysis also replicate previous work conducted by our lab which showed that infant attention to objects significantly forward-predicted increases in infant theta activity during joint table-top play with their caregiver, involving one toy object (compared to our paradigm which involved 3;Wass et al., 2018). More recent work conducted by our lab has also shown continuous and time-locked associations between infant look durations and infant theta activity when infants play with objects on their own (Perapoch Amadó et al., 2023). To reassure readers of the replicability of the current findings, we now reference the Wass et al. (2018) study at the beginning of the Discussion section.

      Could activity artifacts lead to certain reported trends? Babies typically look at an object before they touch or manipulate the object, and so longer bouts of attention likely involve a look and then a touch for lengthier time frames. If active involvement with an object (touching for example) amplifies theta activity, that may explain why attention duration forward predicts theta power. That is, baby looks, then touches, then theta activates, and coding would show visual gaze preceding the theta activation. Careful alignment of infants' touches and other such behaviors with the theta peak might help address this question, again to lend confidence to the robustness of the interpretation.

      Yes, again this is a very important point, and the removal of movement-related artifact is something we have given careful attention to in the analysis of our naturalistic EEG data (Georgieva et al., 2020; Marriott Haresign et al., 2021). As a result of this comment we have made direct changes to the Results section on page 18 to more clearly signal the reader to our EEG pre-processing section before presenting the results of the cross-correlation analyses.

      As we describe in the Methods section of the main text, movement-related artifacts are removed from the data with ICA decomposition, utilising an automatic-rejection algorithm, specially designed for work with our naturalistic EEG data (Marriott Haresign et al., 2021). Given that ICA rejection does not remove all artifact introduced to the EEG signal, additional analysis steps were taken to reduce the possibility that movement artifacts influenced the results of the reported analyses. As explained in the Methods section, rather than absolute theta power, relative theta was used in all EEG analyses, computed by dividing the power at each theta frequency by the summed power across all frequencies. Eye and head movement-related artifacts most often associate with broadband increases in power in the EEG signal (Cohen, 2014): computing relative theta activity therefore further reduces the potential influence of artifact on the EEG signal.

      It is also important to highlight that previous work examining movement artifacts in controlled paradigms with infants has shown that limb movements actually associate with a decrease in power at theta frequencies, compared to rest (Georgieva et al., 2020). It is therefore unlikely that limb movement artifacts explain the pattern of association observed between theta power and infant attention in the current study.

      That said, examining the association between body movements and fluctuations in EEG activity during naturalistic interactions is an important next step, and something our lab is currently working on. Given that touching an object is most often the end-state of a larger body movement, aligning the EEG signal to the onset of infant touch is not all that informative to understanding how body movements associate with increases and decreases in power in the EEG signal. Our lab is currently working on developing new methods using motion tracking software and arousal composites to understand how data-derived behavioural sub-types associate with differential patterns of EEG activity.

      The term attention may be misleading. The behavior being examined is infant gaze or looks, with the assumption that gaze is a marker of "attention". The authors are aware that gaze can be a blank stare that doesn't reflect underlying true "attention". I recommend substitution of a conservative, more precise term that captures the variable being measured (gaze); it would then be fine to state that in their interpretation, gaze taken as a marker for attention or something like that. At minimum, using term "visual attention" can be a solution if authors do not want to use the precise term gaze. As an example, the sentence "An attention episode was defined as a discrete period of attention towards one of the play objects on the table, or to the partner" should be modified to defined as looking at a play object or partner.

      We thank the reviewer for this comment, and we understand their concern with the use of the term ‘attention’ where we are referring to shifts in infant eye gaze. However, the use of this term to describe patterns of infant gaze, irrespective of whether they are ‘actually attending’ or not is used widely in the literature, in both interactive (e.g. Yu et al., 2021) and screen-based experiments examining infant attention (Richards, 2010). We therefore feel that its use in our current manuscript is acceptable and consistent with the reporting of similar interaction findings. On page 39 of the Discussion we now also include a discussion on how future research might further investigate differential subtypes of infant looks to distinguish between moments where infants are attending vs. just looking.

      Why collapse across gaze to object vs. other? Conceptually, it's unclear why the same hypotheses and research questions on neural-attention (i.e., gaze in actuality) links would apply to looks to a mom's face or to an object. Some rationale would be useful to the reader as to why these two distinct behaviors are taken as following the same principles in ordering of brain and behavior. Perhaps I missed something, however, because later in the Discussion the authors state that "fluctuations in neural markers of infants' engagement or interest forward-predict their attentiveness towards objects", which suggests there was an object-focused variable only? Please clarify. (Again, sorry if I missed something).

      This is a really important point, and we agree with R1 that it could have been more clearly expressed in our original submission – for which, we apologise. In the cross-correlation analyses conducted in parts 2 and 3 which examines forwards-predictive associations between infant attention durations and infant endogenous oscillatory activity (part two), and caregiver behaviour (part three), as R1 describes, we include all infant looks towards objects and their partner. Including all infant look types is necessary to produce a continuous variable to cross-correlate with the other continuous variables (e.g. theta activity, caregiver vocal behaviours), and, therefore, does not concentrate only on infant attention episodes towards objects.

      We take the reviewers’ point that different attention and neural mechanisms may be associated with looks towards objects vs. the partner, which we now acknowledge directly on page 10 of the Introduction. However, our focus here is on the endogenous and interactive mechanisms that drive fluctuations in infant engagement with the ongoing, free-flowing interaction. Indeed, previous work has shown increases in theta activity during sustained episodes of infant attention to a range of different stimuli, including cartoon videos (Xie et al., 2018), real-life screen-based interactions (Jones et al., 2020), as well as objects (Begus et al., 2016). In the second half of part 2, we go on to address the endogenous processes that support infant attention episodes specifically towards objects.

      As a result of this comment, we have made direct changes to the Introduction on page 10 to more clearly explain the looking behaviours included in the cross-correlation analysis, and the rationale behind the analysis being conducted in this way – which is different to the reactive analyses conducted in the second half of parts one and three, which examines infant object looks only. Direct edits to the text have also been made throughout the Results and Methods sections as a result of this comment, to more clearly specify the types of looks included in each analysis. Now, where we discuss the cross-correlation analyses we refer only to infant ‘attention durations’ or infant ‘attention’, whilst ‘object-directed attention’ and ‘looks towards objects’ is clearly specified in sections discussing the reactive analyses conducted in parts 2 and 3. We have also amended the Discussion on page 31so that the cross-correlation analyses is interpreted relative to infant overall attention, rather than their attention towards objects only.

      Why are mothers' gazes shorter than infants' gazes? This was the flip of what I'd expect, so some interpretation would be useful to understanding the data.

      This is a really interesting observation. Our findings of the looking behaviour of caregivers and infants in our joint play interactions actually correspond to much previous micro-dynamic analysis of caregiver and infant looking behaviour during early table-top interactions (Abney et al., 2017; Perapoch Amadó et al., 2023; Yu & Smith, 2013, 2016). The reason for the shorter look durations in the adult is due to the fact that the caregivers alternate their gaze between their infant and the objects (i.e. they spend a lot of the interaction time monitoring their infants’ behaviours). This can be seen in Figure 2 (see main text) which shows that caregiver looks are divided between looks to their infants and looks towards objects. In comparison, infants spend most of their time focussing on objects (see Figure 2, main text), with relatively infrequent looks to their caregiver. As a result, infant looks are, overall, longer in comparison to their caregivers’.

      Minor points

      Use the term association or relation (relationships is for interpersonal relationships, not in statistics).

      This has now been amended throughout.

      I'm unsure I'd call the interactions "naturalistic" when they occur at a table, with select toys, EEG caps on partners, and so on. The term seems more appropriate for studies with fewer constraints that occur (for example) in a home environment, etc.

      We understand R1s concern with our use of the term ‘naturalistic’ to refer to the joint play interactions that we analyse in the current study. However, we feel the term is appropriate, given that the interactions are unstructured: the only instruction given to caregivers at the beginning of the interaction is to play with their infants in the way that they might do at home. The interactions, therefore, measure free-flowing caregiver and infant behaviours, where modulations in each individual’s behaviour are the result of the intra- and inter-individual dynamics of the social exchange. This is in comparison to previous work on early infant attention development which has used more structured designs, and modulations in infant behaviour occur as a result of the parameters of the experimental task.

      Reviewer #2

      Public Review

      Summary:

      This paper acknowledges that most development occurs in social contexts, with other social partners. The authors put forth two main frameworks of how development occurs within a social interaction with a caregiver. The first is that although social interaction with mature partners is somewhat bi-directional, mature social partners exogenously influence infant behaviors and attention through "attentional scaffolding", and that in this case infant attention is reactive to caregiver behavior. The second framework posits that caregivers support and guide infant attention by contingently responding to reorientations in infant behavior, thus caregiver behaviors are reactive to infant behavior. The aim of this paper is to use moment-to-moment analysis techniques to understand the directionality of dyadic interaction. It is difficult to determine whether the authors prove their point as the results are not clearly explained as is the motivation for the chosen methods.

      Strengths

      The question driving this study is interesting and a genuine gap in the literature. Almost all development occurs in the presence of a mature social partner. While it is known that these interactions are critical for development, the directionality of how these interactions unfold in real-time is less known.

      The analyses largely seem to be appropriate for the question at hand, capturing small moment-to-moment dynamics in both infant and child behavior, and their relationships with themselves and each other. Autocorrelations and cross-correlations are powerful tools that can uncover small but meaningful patterns in data that may not be uncovered with other more discretized analyses (i.e. regression).

      We are pleased that R2 finds our work to be an interesting contribution to the field, which utilises appropriate analysis techniques.

      Weaknesses

      The major weakness of this paper is that the reader is assumed to understand why these results lead to their claimed findings. The authors need to describe more carefully their reasoning and justification for their analyses and what they hope to show. While a handful of experts would understand why autocorrelations and cross-correlations should be used, they are by no means basic analyses. It would also be helpful to use simulated data or even a simple figure to help the reader more easily understand what a significant result looks like versus an insignificant result.

      We thank the reviewer for this comment, and we agree that much more detail should be added to the Introduction section. As a result of this comment, we have made direct changes to the Introduction on pages 9-11 to more clearly detail these analysis methods, our rationale for using these methods; and how we expect the results to further our understanding of the drivers of infant attention in naturalistic social interactions.

      We also provide a figure in the SM (Fig. S6) to help the reader more clearly understand the permutation method used in our statistical analyses described in the Methods, on page 51, which depicts significant vs. insignificant patterns of results against their permutation distribution.

      While the overall question is interesting the introduction does not properly set up the rest of the paper. The authors spend a lot of time talking about oscillatory patterns in general but leave very little discussion to the fact they are using EEG to measure these patterns. The justification for using EEG is also not very well developed. Why did the authors single out fronto-temporal channels instead of using whole brain techniques, which are more standard in the field? This is idiosyncratic and not common.

      We very much agree with R2 that the rationale and justification for using EEG to understand the processes that influence infants’ attention patterns is under-developed in the current manuscript. As a result of this comment we have made direct edits to the Introduction section of the main text on pages 7-8 to more clearly describe the rationale for examining the relationship between infant EEG activity and their attention during the play interactions with their caregivers.

      As we describe in the Introduction section, previous behavioural work conducted with infants has suggested that endogenous cognitive processes (i.e. fluctuations in top-down cognitive control) might be important in explaining how infants allocate their attention during free-flowing, naturalistic interactions towards the end of the first year. Oscillatory neural activity occurring at theta frequencies (3-6Hz), which can be measured with EEG, has previously been associated with top-down intrinsically guided attentional processes in both adulthood and infancy (Jones et al., 2020; Orekhova, 1999; Xie et al., 2018). Measuring fluctuations in infant theta activity therefore provides a method to examine how endogenous cognitive processes structure infant attention in naturalistic social interactions which might be otherwise unobservable behaviourally.

      It is important to note that the Introduction distinguishes between two different oscillatory mechanisms that could possibly explain the organisation of infant attention over the course of the interaction. The first refers to oscillatory patterns of attention, that is, consistent attention durations produced by infants that likely reflect automatic, regulatory functions, related to fluctuations in infant arousal. The second mechanism is oscillatory neural activity occurring at theta frequencies, recorded with EEG, which, as mentioned above, is thought to reflect fluctuations in intrinsically guided attention in early infancy. We have amended the Introduction to make the distinction between the two more clear.

      A worrisome weakness is that the figures are not consistently formatted. The y-axes are not consistent within figures making the data difficult to compare and interpret. Labels are also not consistent and very often the text size is way too small making reading the axes difficult. This is a noticeable lack of attention to detail.

      This has now been adjusted throughout, where appropriate.

      No data is provided to reproduce the figures. This does not need to include the original videos but rather the processed and de-identified data used to generate the figures. Providing the data to support reproducibility is increasingly common in the field of developmental science and the authors are greatly encouraged to do so.

      This will be provided with the final manuscript.

      Minor Weaknesses

      Figure 4, how is the pattern in a not significant while in b a very similar pattern with the same magnitude of change is? This seems like a spurious result.

      The statistical analysis conducted for all cross-correlation analyses reported follows a rigorous and stringent permutation-based temporal clustering method which controls for family-wise error rate using a non-parametric Monte Carlo method (see Methods in the main text for more detail). Permutations are created by shuffling data sets between participants and, therefore, patterns of significance identified by the cluster-based permutation analysis will depend on the mean and standard deviation of the cross-correlations in the permutation distribution. Fig. S6 now depicts the cross-correlations against their permutation distributions which should help readers to understand the patterns of significance reported in the main text.

      The correlations appear very weak in Figures 3b, 5a, 7e. Despite a linear mixed effects model showing a relationship, it is difficult to believe looking at the data. Both the Spearman and Pearson correlations for these plots should be clearly included in the text, figure, or figure legend.

      We thank the reviewer for this comment, and agree that reporting the correlations for these plots would strengthen the findings of the linear mixed effects models reported in text. As a result, we have added both Spearman and Pearson correlations to the legends of Figures 3b, 5a and 7e, corresponding to the statistically significant relationships examined in the linear mixed effects models. The strength of the relationships are entirely consistent with those documented in other previous research that used similar methods (e.g. Piazza et al., 2018). How strong the relationship looks to the observer is entirely dependent on the graphical representation chosen to represent it. We have chosen to present the data in this way because we feel that it is the most honest way to represent the statistically significant, and very carefully analysed, effects that we have observed in our data.

      Linear mixed effects models need more detail. Why were they built the way they were built? I would have appreciated seeing multiple models in the supplementary methods and a reasoning to have landed on one. There are multiple ways I can see this model being built (especially with the addition of a random intercept). Also, there are methods to test significance between models and aid in selection. That being said, although participant identity is a very common random effect, its use should be clearly stated in the main text.

      We very much agree with R2 that the reporting of the linear mixed effects models needs more detail and this has now been added to the Method section (page 54). Whilst it is true that there are multiple ways in which this model could be built, given the specificity of our research questions, regarding the reactive changes in infant theta activity and caregiver behaviours that occur after infant look onsets towards objects (see pages 9-11 of the Introduction), we take a hypothesis driven approach to building the linear mixed effects models. As a result, random intercepts are specified for participants, as well as uncorrelated by-participant random slopes (Brown, 2021; Gelman & Hill, 2006; Suarez-Rivera et al., 2019). In this way, infant look durations are predicted from caregiver behaviours (or infant theta activity), controlling for between participant variability in look durations, as well as the strength of the effect of caregiver behaviours (or infant theta activity) on infant look durations.

      Some parentheses aren't closed, a more careful re-reading focusing on these minor textual issues is warranted.

      This has now been corrected.

      Analysis of F0 seems unnecessarily complex. Is there a reason for this?

      Computation of the continuous caregiver F0 variable may seem complex but we feel that all analysis steps are necessary to accurately and reliably compute this variable in our naturalistic, noisy and free-flowing interaction data. For example, we place the F0 only into segments of the interaction identified as the mum speaking so that background noises and infant vocalisations are not included in the continuous variable. We then interpolate through unvoiced segments (similar to Räsänen et al., 2018), and compute the derivative in 1000ms intervals as a measure of the rate of change. The steps taken to compute this variable have been both carefully and thoughtfully selected given the many ways in which this continuous rate of change variable could be computed (cf. Piazza et al., 2018; Räsänen et al., 2018).

      The choice of a 20hz filter seems odd when an example of toy clacks is given. Toy clacks are much higher than 20hz, and a 20hz filter probably wouldn't do anything against toy clacks given that the authors already set floor and ceiling parameters of 75-600Hz in their F0 extraction.

      We thank the reviewer for this comment and we can see that this part of the description of the F0 computation is confusing. A 20Hz low pass filter is applied to the data stream after extracting the F0 with floor and ceiling parameters set between 75-600Hz. The 20Hz filter therefore filters modulations in the caregivers’ F0 that occur at a modulation frequency greater than 20Hz. The 20Hz filter does not, therefore, refer to the spectral filtering of the speech signal. The description of this variable has been rephrased on page 48 of the main text.

      Linear interpolation is a choice I would not have made. Where there is no data, there is no data. It feels inappropriate to assume that the data in between is simply a linear interpolation of surrounding points.

      The choice to interpolate where there was no data was something we considered in a lot of detail, given the many options for dealing with missing data points in this analysis, and the difficulties involved with extracting a continuous F0 variable in our naturalistic data sets. As R2 points out, one option would be to set data points to NaN values where no F0 is detected and/ or the Mum is not vocalising. A second option, however, would be to set the continuous variable to 0s where no F0 is detected and/ or the Mum is not vocalising (where the mum is not producing sound there is no F0 so rather than setting the variable to missing data points, really it makes most objective sense to set to 0).

      Either of these options (setting parts where no F0 is detected to NaN or 0) makes it difficult to then meaningfully compute the rate of change in F0: where NaN values are inserted, this reduces the number of data points in each time window; where 0s are inserted this creates large and unreal changes in F0. Inserting NaN values into the continuous variable also reduces the number of data points included in the cross-correlation and event-locked analyses. It is important to note that, in our naturalistic interactions, caregivers’ vocal patterns are characterised by lots of short vocalisations interspersed by short pauses (Phillips et al., in prep), similar to previous findings in naturalistic settings (Gratier et al., 2015). Interpolation will, therefore, have largely interpolated through the small pauses in the caregiver’s vocalisations.

      The only limitation listed was related to the demographics of the sample, namely saying that middle class moms in east London. Given that the demographics of London, even east London are quite varied, it's disappointing their sample does not reflect the community they are in.

      Yes we very much agree with R2 that the lack of inclusion of caregivers from wider demographic backgrounds is disappointing, and something which is often a problem in developmental research. Our lab is currently working to collect similar data from infants with a family history of ADHD, as part of a longitudinal, ongoing project, involving families from across the UK, from much more varied demographic backgrounds. We hope that the findings reported here will feed directly into the work conducted as part of this new project.

      That said, demographic table of the subjects included in this study should be added.

      This is now included in the SM, and referenced in the main text.

      References

      Abney, D. H., Warlaumont, A. S., Oller, D. K., Wallot, S., & Kello, C. T. (2017). Multiple Coordination Patterns in Infant and Adult Vocalizations. Infancy, 22(4), 514–539. https://doi.org/10.1111/infa.12165

      Attaheri, A., Choisdealbha, Á. N., Di Liberto, G. M., Rocha, S., Brusini, P., Mead, N., Olawole-Scott, H., Boutris, P., Gibbon, S., Williams, I., Grey, C., Flanagan, S., & Goswami, U. (2020). Delta- and theta-band cortical tracking and phase-amplitude coupling to sung speech by infants [Preprint]. Neuroscience. https://doi.org/10.1101/2020.10.12.329326

      Begus, K., Gliga, T., & Southgate, V. (2016). Infants’ preferences for native speakers are associated with an expectation of information. Proceedings of the National Academy of Sciences, 113(44), 12397–12402. https://doi.org/10.1073/pnas.1603261113

      Brown, V. A. (2021). An Introduction to Linear Mixed-Effects Modeling in R.

      Cohen, M. X. (2014). Analyzing neural time series data: Theory and practice. The MIT Press.

      Gelman, A., & Hill, J. (2006). In Data Analysis using Regression and mulilevel/Hierachical Models. Cambridge University Press.

      Georgieva, S., Lester, S., Noreika, V., Yilmaz, M. N., Wass, S., & Leong, V. (2020). Toward the Understanding of Topographical and Spectral Signatures of Infant Movement Artifacts in Naturalistic EEG. Frontiers in Neuroscience, 14, 352. https://doi.org/10.3389/fnins.2020.00352

      Gratier, M., Devouche, E., Guellai, B., Infanti, R., Yilmaz, E., & Parlato-Oliveira, E. (2015). Early development of turn-taking in vocal interaction between mothers and infants. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01167

      Jones, E. J. H., Goodwin, A., Orekhova, E., Charman, T., Dawson, G., Webb, S. J., & Johnson, M. H. (2020). Infant EEG theta modulation predicts childhood intelligence. Scientific Reports, 10(1), 11232. https://doi.org/10.1038/s41598-020-67687-y

      Marriott Haresign, I., Phillips, E., Whitehorn, M., Noreika, V., Jones, E. J. H., Leong, V., & Wass, S. V. (2021). Automatic classification of ICA components from infant EEG using MARA. Developmental Cognitive Neuroscience, 52, 101024. https://doi.org/10.1016/j.dcn.2021.101024

      Noreika, V., Georgieva, S., Wass, S., & Leong, V. (2020). 14 challenges and their solutions for conducting social neuroscience and longitudinal EEG research with infants. Infant Behavior and Development, 58, 101393. https://doi.org/10.1016/j.infbeh.2019.101393

      Orekhova, E. (1999). Theta synchronization during sustained anticipatory attention in infants over the second half of the first year of life. International Journal of Psychophysiology, 32(2), 151–172. https://doi.org/10.1016/S0167-8760(99)00011-2

      Perapoch Amadó, M., Greenwood, E., James, Labendzki, P., Haresign, I. M., Northrop, T., Phillips, E., Viswanathan, N., Whitehorn, M., Jones, E. J. H., & Wass, S. (2023). Naturalistic attention transitions from subcortical to cortical control during infancy. [Preprint]. Open Science Framework. https://doi.org/10.31219/osf.io/6z27a

      Piazza, E. A., Hasenfratz, L., Hasson, U., & Lew-Williams, C. (2018). Infant and adult brains are coupled to the dynamics of natural communication [Preprint]. Neuroscience. https://doi.org/10.1101/359810

      Räsänen, O., Kakouros, S., & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? – Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206. https://doi.org/10.1016/j.cognition.2018.05.015

      Richards, J. E. (2010). The development of attention to simple and complex visual stimuli in infants: Behavioral and psychophysiological measures. Developmental Review, 30(2), 203–219. https://doi.org/10.1016/j.dr.2010.03.005

      Suarez-Rivera, C., Smith, L. B., & Yu, C. (2019). Multimodal parent behaviors within joint attention support sustained attention in infants. Developmental Psychology, 55(1), 96–109. https://doi.org/10.1037/dev0000628

      Wass, S. V., Noreika, V., Georgieva, S., Clackson, K., Brightman, L., Nutbrown, R., Covarrubias, L. S., & Leong, V. (2018). Parental neural responsivity to infants’ visual attention: How mature brains influence immature brains during social interaction. PLOS Biology, 16(12), e2006328. https://doi.org/10.1371/journal.pbio.2006328

      Xie, W., Mallin, B. M., & Richards, J. E. (2018). Development of infant sustained attention and its relation to EEG oscillations: An EEG and cortical source analysis study. Developmental Science, 21(3), e12562. https://doi.org/10.1111/desc.12562

      Yu, C., & Smith, L. B. (2013). Joint Attention without Gaze Following: Human Infants and Their Parents Coordinate Visual Attention to Objects through Eye-Hand Coordination. PLoS ONE, 8(11), e79659. https://doi.org/10.1371/journal.pone.0079659

      Yu, C., & Smith, L. B. (2016). The Social Origins of Sustained Attention in One-Year-Old Human Infants. Current Biology, 26(9), 1235–1240. https://doi.org/10.1016/j.cub.2016.03.026

      Yu, C., Zhang, Y., Slone, L. K., & Smith, L. B. (2021). The infant’s view redefines the problem of referential uncertainty in early word learning. Proceedings of the National Academy of Sciences, 118(52), e2107019118. https://doi.org/10.1073/pnas.2107019118

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their appreciation of our study and thoughtful comments. In response to the main concern raised by all reviewers regarding the potential influences of external noise factors on intuitive inference, such as external disturbances or imperfect observations, we have conducted three new experiments suggested by the reviewers. These experiments were designed to: (1) assess the influence of external forces on humans’ judgments by implementing a wall to block wind disturbances from one direction, (2) examine human accuracy in predicting the landing position of a falling ball when its trajectory is obscured, and (3) evaluate the effect of object geometry on human judgment of stability. The findings from these experiments consistently support our proposal of the stochastic world model on gravity embedded in human mind. Besides, we have also addressed the rest comments from the reviewers in a one-by-one fashion.

      Reviewer #1 (Recommendations For The Authors):

      As mentioned in the public review, I did not find it entirely convincing that the study shows evidence for a Gaussian understanding of gravity. There are two studies that would bolster this claim: 1. Replicate experiment 1, but also ask people to infer whether there was a hidden force. If people are truly representing gravity as proposed in the paper, you should get no force inferences. However, if the reason the Gaussian gravity model works is that people infer unseen forces, this should come out clearly in this study.

      Author response image 1.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      R1: We thank the reviewer for this suggestion. To directly test whether participants’ judgments were influenced by their implicit assumptions about external forces, we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). Before the start of the experiment, we explicitly informed the participants that the wall was designed to block wind, ensuring that any potential wind forces from the direction of the wall would not influence the collapse. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants tested (1 female; ages: 24-30), similar to the experiment without the wall (Supplementary Figure 4B). Therefore, the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, not shaped by external forces or explicit instructions.

      This new experiment has been added to the revised manuscript

      Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”

      (2) Similarly, you can imagine a simple study where you drop an object behind a floating occluder and you check where people produce an anticipatory fixation (i.e., where do they think the object will come out?). If people have a stochastic representation of gravity, this should be reflected in their fixations. But my guess is that everyone will look straight down.

      Author response image 2.

      Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.

      R2: We thank the reviewer for suggesting this thought experiment. However, when predicting the landing point of a falling object, participants may rely more on learned knowledge that an unimpeded object continues to fall in a straight line, rather than drawing on their intuitive physics. To avoid this potential confounding factor, we designed a similar experiment where participants were asked to predict the landing point of a parabolic trajectory, obscured by an occluder (Author response image 2A). In each trial, participants used a mouse (clicking the left button) to predict the landing point of each parabolic trajectory, and there were 100 trials in total. This design not only limits the impact of direct visual cues but also actively engages the mental simulation of intuitive physics. All three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.

      (3) I believe the correct alternative model should be the one that has uncertainty over unseen forces, which better captures current proposals in the field, and controls for the amount of uncertainty in the models.

      R3: We thank the reviewers for the above-mentioned suggestions, and the findings from these two new experiments reinforce our proposal regarding the inherent stochastic characteristic of how the mind represents gravity.

      (4) I was not convinced that the RL framework was set up correctly to tackle the questions it claims to tackle. What this shows is that you can evolve a world model with Gaussian gravity in a setup that has no external perturbations. That does not imply that that is how humans evolved their intuitive physics, particularly when creatures have evolved in a world full of external perturbations. Showing that when (1) there are hidden perturbations, and (2) these perturbations are learnable, but (3) the model nonetheless just learns stochastic gravity, would be a more convincing result.

      R4: We completely agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity. In fact, introducing additional external noise into the RL framework likely heightens the uncertainty in learning gravity’s direction, potentially amplifying, rather than diminishing, the stochastic nature of mental gravity.

      In revision, we have clarified the role of the RL framework

      Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.

      Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”

      (5) Some comments on the writing:

      The word 'normality' is used to refer to people's judgments about whether a tower collapsed looked 'normal'. I was a bit confused by this because normality can also mean 'Gaussian' and the experiments are also sampling from Gaussian distributions. There were several points where it took me a second to figure out which sense of 'normality' the paper was using. I would recommend using a different term.

      R5: We are sorry for the confusion. In revision, the term “normality” has been replaced with “confidence level about normal trajectory”.

      (6) One small comment is that Newton's laws are not a faithful replica of the "physical laws of the world" they are a useful simplification that only works at certain timescales. I believe some people propose Newtonian physics as a model of intuitive physics in part because it is a rapid and useful approximation of complex physical systems, and not because it is an untested assumption of perfect correspondence.

      R6: We are sorry for the inaccurate expression. We have revised our statements in the manuscript Line 15-16: “We found that the world model on gravity was not a faithful replica of the physical laws, but instead encoded gravity’s vertical direction as a Gaussian distribution.”

      (7) Line 49-50: Based on Fig 1d, lower bound of possible configurations for 10 blocks is ~17 in log-space, which is about 2.5e7. But the line here says it's 3.72e19, which is much larger. Sorry if I am missing something.

      R7: We thank the reviewer to point out this error. We re-calculated the number of possible configurations using the formula (3) in the appendix, and the number of configurations with 10 blocks is:

      Thus,

      This estimated number is much larger than that in our previous calculation, which has been corrected in the revised text.

      Line 827-829: “d) The lower bound of configurations’ possible number and the number of blocks in a stack followed an exponential relationship with a base of 10. The procedure can create at least 1.14×1050 configurations for stacks consisting of 10 blocks.”

      Line 49-50: “… but the universal cardinality of possible configurations is at least 1.14×1050 (Supplementary Figure 1), …”

      Line 1017-1018: “… the number of configurations can be estimated with formula (9), which is 1.14×1050.”

      (8) Lines 77-78: "A widely adopted but not rigorously tested assumption is that the world model in the brain is a faithful replica of the physical laws of the world." This risks sounding like you are asserting that colleagues in the field do not rigorously test their models. I think you meant to say that they did not 'directly test', rather than 'rigorously test'. If you meant rigorous, you might want to say more to justify why you think past work was not rigorous.

      R8: We apologize for the inappropriate wording, the sentence has been revised and we illustrate the motivation more comprehensively in the revised text,

      Line 76-92: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach.”

      (9) Lines 79-84 States that past models encode gravity downward. It then says that alternatively there is consensus that the brain uses data from sensory organs and adds meaning to them. I think there might be a grammatical error here because I did not follow why saying there is 'consensus' on something is a theoretical alternative. I also had trouble following why those two statements are in opposition. Is any work on physics engines claiming the brain does not take data from sensory organs and add meaning to them?

      R9: We are sorry for the confusion. Here we intend to contrast the deterministic model (i.e., the uncertainty comes from outside the model) with the stochastic model (i.e., the uncertainty is inherently built into the model). In revision, we have clarified the intention. For details, please see R8.

      (10) Lines 85-88: Following on the sentence above, you then conclude that the representation of the world may therefore not be the same as reality. I did not understand why this followed. It seems you are saying that, because the brain takes data from sensory organs, therefore its representations may differ from reality.

      R10: Again, we are sorry about the confusion. Please see the revised text in R8.

      (11) Lines 190-191: I had trouble understanding this sentence. I believe you are missing an adjective to clarify that participants were more inclined to judge taller stacks as more likely to collapse.

      R11: We are sorry for the confusion. What we intended to state here is that participants’ judgment was biased, showing a tendency to predict a collapse for stacks regardless of their actual stability. We have revised this confusing sentence in the revision. Line 202–204: “However, the participants showed an obvious bias towards predicting a collapse for stacks regardless of their actual stability, as the dots in Fig 2b are more concentrated on the lower side of the diagonal line.”

      (12) Line 201: I don't think it's accurate to say that MGS "perfectly captured participants' judgments" unless the results are actually perfect.

      R12: We agree, and in revision we have toned down the statement Line 213–214: “…, the MGS, in contrast to the NGS, more precisely reflected participants’ judgments of stability …”

      Reviewer #2 (Recommendations For The Authors):

      I think this is an impressive set of experiments and modeling work. The paper is nicely written and I appreciate the poetic license the authors took at places in the manuscript. I only have clarification points and suggest a simple experiment that could lend further support to their conclusions. 1. In my opinion, the impact of this work is twofold. First, the suggestion that gravity is represented as a distribution of the world and not a result of (inferred) external perturbations. Second, that the distribution is advantageous as it balances speed and accuracy, and lessens computational processing demands (i.e., number of simulations). The second point here is contingent on the first point, which is really only supported by the RL model and potentially the inverted scene condition. I am somewhat surprised that the RL model does not converge on a width much smaller than ~20 degrees after 100,000 simulations. From my understanding, it was provided feedback with collapses based on natural gravity (deterministically downward). Why is learning so slow and the width so large? Could it be the density of the simulated world model distribution? If the model distribution of Qs was too dense, then Q-learning would take forever. If the model distribution was too sparse, then its final estimate would hit a floor of precision. Could the authors provide more details on the distribution of the Qs for the RL model?

      Author response image 3.

      RL learning curves as a function of θ angle with different sampling densities and learning rates. Learning rates were adjusted to low (a), intermediate (b) and high (c) settings, while sampling densities were chosen at four levels: 5x5, 11x11, 31x31, and 61x61 shown from the left to the right. Two key observations emerged from the simulations as the reviewer predicted. First, higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances. Second, increased sampling density necessitated more iterations for convergence. Note that in all simulations, we limited the iterations to 1,000 times (as opposed to 100,000 times reported in the manuscript) to demonstrate the trend without excessive computational demands.

      R1: To illustrate the distribution of the Q-values for the RL model, we re-ran the RL model with various learning rates and sampling densities (Author response image 3). These results support the reviewer’s prediction that higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances, and increased sampling density requires more iterations for convergence.

      This simulation also elucidates the slower learning observed in the experiment described in the text, where the force sphere was divided into 61x61 angle pairs, and the learning rate was set to 0.15. This set of parameters ensured convergence within a reasonable brief timeframe while maintaining high-resolution force assessments.

      Besides, the width of the Gaussian distribution is mainly determined by the complexity of stacks. As shown in Figure 3c and Supplementary Figure 9, stacks with fewer blocks (i.e., less complex) caused a larger width, whereas those with more blocks resulted in a narrower spread. In the study, we used a collection of stacks varying from 2 to 15 blocks to simulate the range of stacks humans typically encounter in daily life.

      In revision, we have incorporated these insights suggested by the reviewer to clarify the performance of the RL framework:

      Line 634-639: “The angle density and learning rate are two factors that affect the learning speed. A larger angle density prolongs the time to reach convergence but enables a more detailed force space; a higher learning rate accelerates convergence but incurs larger variance during training. To balance speed and convergence, we utilized 100,000 configurations for the training.”

      Line 618-619: “…, separately divided them into 61 sampling angles across the spherical force space (i.e., the angle density).”

      (2) Along similar lines, the authors discuss the results of the inverted science condition as reflecting cognitive impenetrability. However, do they also interpret it as support for an intrinsically noisy distribution of gravity? I would be more convinced if they created a different scene that could have the possibility of affecting the direction of an (inferred) external perturbation - a previously held explanation of the noisy world model. For example, a relatively simple experiment would be to have a wall on one side of the scene such that an external perturbation would be unlikely to be inferred from that direction. In the external perturbation account, phi would then be affected resulting in a skewed distribution of angle pairs. However, in the authors' stochastic world model phi would remain unaffected resulting in the same uniform distribution of phi the authors observed. In my opinion, this would provide more compelling evidence for the stochastic world model.

      Author response image 4.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      R2: We thank the reviewer for this suggestion. Following the reviewer’s concern, we designed the experiment with the addition of a wall implemented on one side (Supplementary figure 4A). We explicitly informed the participants that the wall was designed to block wind before the start of the experiment, ensuring no potential wind forces from the direction of the wall to influence the collapse trajectory of configurations. Participants need to judge if the trajectory was normal. If participants’ judgments were influenced by external noises, we would expect to observe a skewed angle distribution. However, our results still showed a normal distribution across all participants tested, consistent with the experiment without the wall (Supplementary figure 4B). This experiment suggested the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, rather than shaped by external forces or explicit instructions.

      We revised the original manuscript, and added this new experiment

      Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”

      (3) I didn't completely follow the authors' explanation for the taller objects illusion. On lines 229-232, the authors state that deviations from gravity's veridical direction are likely to accumulate with the height of the objects. Is this because, in the stochastic world model account, each block gets its own gravity vector that is sampled from the distribution? The authors should clarify this more explicitly. If this is indeed the author's claim, then it would seem that it could be manipulated by varying the dimensions of the blocks (or whatever constitutes an object).

      R3: We are sorry for the confusion caused by the use of the term ‘accumulate’. In the study, there is only one gravity vector sampled from the distribution for the entire structure, rather than each block having a unique gravity vector. The height illusion is attributed to the fact that the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction. This is especially true for objects consisting of multiple blocks stacked atop one another. In revision, we have removed the confusing term ‘accumulate’ for clarification.

      Line 242-244: “…, because the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction during humans’ internal simulations.”

      (4) The authors refer to the RL simulations as agent-environment interactions, but in reality, the RL model does not interact with the blocks. Would experience-dependent or observation be more apropos?

      R4: We completely agree. Indeed, the RL model did not manipulate stacks; rather, it updated its knowledge of natural gravity based on the discrepancies between the RL model’s predictions and observed outcomes. In revision, we have removed the confusing term ‘agent-environment interactions’ and clarified its intended meaning.

      Line 19-22: “Furthermore, a computational model with reinforcement learning revealed that the stochastic characteristic likely originated from experience-dependent comparisons between predictions formed by internal simulations and the realities observed in the external world, …”

      Reviewer #3 (Public Review):

      (1) In spite of the fact that the Mental Gravity Simulation (MGS) seems to predict the data of the two experiments, it is an untenable hypothesis. I give the main reason for this conclusion by illustrating a simple thought experiment. Suppose you ask subjects to determine whether a single block (like those used in the simulations) is about to fall. We can think of blocks of varying heights. No matter how tall a block is, if it is standing on a horizontal surface it will not fall until some external perturbation disturbs its equilibrium. I am confident that most human observers would predict this outcome as well. However, the MSG simulation would not produce this outcome. Instead, it would predict a non-zero probability of the block to tip over. A gravitational field that is not perpendicular to the base has the equivalent effect of a horizontal force applied on the block at the height corresponding to the vertical position of the center of gravity. Depending on the friction determined by the contact between the base of the block and the surface where it stands there is a critical height where any horizontal force being applied would cause the block to fall while pivoting about one of the edges at the base (the one opposite to where the force has been applied). This critical height depends on both the size of the base and the friction coefficient. For short objects this critical height is larger than the height of the object, so that object would not fall. But for taller blocks, this is not the case. Indeed, the taller the block the smaller the deviation from a vertical gravitational field is needed for a fall to be expected. The discrepancy between this prediction and the most likely outcome of the simple experiment I have just outlined makes the MSG model implausible. Note also that a gravitational field that is not perpendicular to the ground surface is equivalent to the force field experienced by the block while standing on an inclined plane. For small friction values, the block is expected to slide down the incline, therefore another prediction of this MSG model is that when we observe an object on a surface exerting negligible friction (think of a puck on ice) we should expect that object to spontaneously move. But of course, we don't, as we do not expect tall objects that are standing to suddenly fall if left unperturbed. In summary, a stochastic world model cannot explain these simple observations.

      Author response image 5.

      Differentiating Subjectivity from Objectivity. In both Experiment 1 (a) and Experiment 2 (b), participants were instructed to determine which shape appeared most stable. Objectively, in the absence of external forces, all shapes possess equal stability. Yet, participants typically perceived the shape on the left as the most stable because of its larger base area. The discrepancy between objective realities and subjective feelings, as we propose, is attributed to the human mind representing gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.

      R1: We agree with the reviewer that objects will remain stable until disturbed by external forces. However, in many cases, this is a clear discrepancy between objective realities and subjective feelings. For example, electromagnetic waves associated with purple and red colors are the farthest in the electromagnetic space, yet purple and red are the closest colors in the color space. Similarly, as shown in Supplementary Figure 4, in reality all shapes possess equal stability in the absence of external forces. Yet, humans typically perceive the shape on the left as more stable because of its larger base area. In this study, we tried to explore the mechanism underlying this discrepancy by proposing that the human mind represents gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.

      In revision, we have clarified the rationale of this study

      Line 76-98: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach. Here, we investigated these two alternative hypotheses regarding the construction of the world model in the brain by examining how gravity’s direction is represented in the world model when participants judged object stability.”

      (2) The question remains as to how we can interpret the empirical data from the two experiments and their agreement with the predictions of the stochastic world model if we assume that the brain has internalized a vertical gravitational field. First, we need to look more closely at the questions posed to the subjects in the two experiments. In the first experiment, subjects are asked about how "normal" a fall of a block construction looks. Subjects seem to accept 50% of the time a fall is normal when the gravitational field is about 20 deg away from the vertical direction. The authors conclude that according to the brain, such an unusual gravitational field is possible. However, there are alternative explanations for these findings that do not require a perceptual error in the estimation of the direction of gravity. There are several aspects of the scene that may be misjudged by the observer. First, the 3D interpretation of the scene and the 3D motion of the objects can be inaccurate. Indeed, the simulation of a normal fall uploaded by the authors seems to show objects falling in a much weaker gravitational field than the one on Earth since the blocks seem to fall in "slow motion". This is probably because the perceived height of the structure is much smaller than the simulated height. In general, there are even more severe biases affecting the perception of 3D structures that depend on many factors, for instance, the viewpoint.

      R2: We thank the reviewer for highlighting several potential confounding factors in our study. We address each of these concerns point-by-point:

      (a) Misinterpretation of the 3D scene and motion. In Response Figure 4 shown above, there is no 3D structure, yet participants’ judgment on stability still deviated from objective realities. In addition, the introduction of 3D motion was to aid in understanding the stacks’ 3D structure. Previous studies without 3D motion have reported similar findings (Allen et al., 2020). Therefore, regardless of whether objects are presented in 2D or 3D, or in static or in motion formats, humans’ judgment on object stability appears consistent.

      (b) Errors in perceived height. While there might be discrepancies between perceived and simulated heights, such errors are systematic across all conditions. Therefore, they may affect the width of the Gaussian distribution but do not fundamentally alter its existence.

      (c) The viewpoint. In one experiment, we inverted gravity’s direction to point upward, diverging from common daily experience. Despite this change in viewpoint, the Gaussian distribution was still observed. That is, the viewpoint appears not a key factor in influencing how gravity’s direction is represented as a Gaussian distribution in our mental world.

      In summary, both our and previous studies (Allen et al., 2020; Battaglia et al., 2013) agree that humans’ subjective assessments of objects’ stability deviate from actual stability due to noise in mental simulation. Apart from previous studies, we suggest that this noise is intrinsic, rather than stemming from external forces or imperfect observations.

      (3) Second, the distribution of weight among the objects and the friction coefficients acting between the surfaces are also unknown parameters. In other words, there are several parameters that depend on the viewing conditions and material composition of the blocks that are unknown and need to be estimated. The authors assume that these parameters are derived accurately and only that assumption allows them to attribute the observed biases to an error in the estimate of the gravitational field. Of course, if the direction of gravity is the only parameter allowed to vary freely then it is no surprise that it explains the results. Instead, a simulation with a titled angle of gravity may give rise to a display that is interpreted as rendering a vertical gravitational field while other parameters are misperceived. Moreover, there is an additional factor that is intentionally dismissed by the authors that is a possible cause of the fall of a stack of cubes: an external force. Stacks that are initially standing should not fall all of a sudden unless some unwanted force is applied to the construction. For instance, a sudden gust of wind would create a force field on a stack that is equivalent to that produced by a tilted gravitational field. Such an explanation would easily apply to the findings of the second experiment. In that experiment subjects are explicitly asked if a stack of blocks looks "stable". This is an ambiguous question because the stability of a structure is always judged by imagining what would happen to the structure if an external perturbation is applied. The right question should be: "do you think this structure would fall if unperturbed". However, if stability is judged in the face of possible external perturbations then a tall structure would certainly be judged as less stable than a short structure occupying the same ground area. This is what the authors find. What they consider as a bias (tall structures are perceived as less stable than short structures) is instead a wrong interpretation of the mental process that determines stability. If subjects are asked the question "Is it going to fall?" then tall stacks of sound structure would be judged as stable as short stacks, just more precarious.

      R3: Indeed, the external forces suggested by the reviewer certainly influence judgments of objects’ stability. The critical question, however, is whether humans’ judgments on objects’ stability accurately mirror the actual stability of objects in the absence of external forces. To address this question, we designed two new experiments.

      Experiment 1: we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). We explicitly informed the participants that the wall could block wind, ensuring that no potential wind from the direction of the wall could influence the configuration. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants (Age: 25-30, two females), which is similar to the experiment without the wall (Supplementary Figure 4B).

      Author response image 6.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      Experiment 2: The second experiment adopted another paradigm to test the hypothesis of stochastic mental simulation. Consider humans to infer the landing point of a parabolic trajectory that was obscured by an occlude (Author response image 2A), the stochastic mental simulation predicted that humans’ behavior follows a Gaussian distribution. However, if humans’ judgments were influenced by external noise, the landing points could not be Gaussian. The experiment consists of 100 trials in total, and in each trial participants used a mouse to predict the landing point of each trajectory by clicking the left button. Our results found all three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.

      Author response image 7.

      Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.

      (4) The RL model used as a proof of concept for how the brain may build a stochastic prior for the direction of gravity is based on very strong and unverified assumptions. The first assumption is that the brain already knows about the force of gravity, but it lacks knowledge of the direction of this force of gravity. The second assumption is that before learning the brain knows the effect of a gravitational field on a stack of blocks. How can the brain simulate the effect of a non-vertical gravitational field on a structure if it has never observed such an event?

      R4: We agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity.

      In revision, we have clarified the role of the RL framework

      Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.

      Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”

      (5) The third assumption is that from the visual input, the brain is able to figure out the exact 3D coordinates of the blocks. This has been proven to be untrue in a large number of studies. Given these assumptions and the fact that the only parameters the RL model modifies through learning specify the direction of gravity, I am not surprised that the model produces the desired results.

      Author response image 8.

      Perception Uncertainty in 3D stacks structures. (a) Experimental design. A pair of two stacks with similar placements of blocks were presented sequentially to participants, who were instructed to judge whether the stacks were identical and to rate their confidence in this judgment. Each stack was presented on the screen for 2 seconds. (b) Behavior Performance. Three participants (2 males, age range: 24-30) were recruited to the experiment. The confidence in determining whether a pair of stacks remained unchanged rapidly decreased when each block had a very small displacement, suggesting humans could keenly perceive trivial changes in configurations. The x-axis denotes the difference in block placement between stacks, with the maximum value (0.4) corresponding to the length of a block’s short side. The Y-axis denotes humans’ confidence in reporting no change. The red curve illustrates the average confidence level across 4 runs, while the yellow curve is the confidence level of each run.

      R5: Indeed, uncertainty is inevitable when perceiving the external world, because our perception is not a faithful replica of external reality. A more critical question pertains to the accuracy of our perception in representing the 3D coordinates of a stack’s blocks. To address this question, we designed a straightforward experiment (Author response image 5a), where participants were instructed to determine whether a pair of stacks were identical. The position of each block was randomly changed horizontally. We found that all participants were able to accurately identify even minor positional variations in the 3D structure of the stacks (Author response image 5b). This level of perceptual precision is adequate for locating the difference between predictions from mental simulations and actual observations of the external world.

      (6)Finally, the argument that the MGS is more efficient than the NGS model is based on an incorrect analysis of the results of the simulation. It is true that 80% accuracy is reached faster by the MGS model than the 95% accuracy level is reached by the NGS model. But the question is: how fast does the NGS model reach 80% accuracy (before reaching the plateau)?

      R6: Yes. The NGS model achieved 80% accuracy as rapidly as the MGS model. However, the NGS model required a significantly longer period to reach the plateau crucial for decision-making. In revision, this information is now included.

      Line 348-350: “…, while the initial growth rates of both models were comparable, the MGS reached the plateau crucial for decision-making sooner than the NGS.”

      We greatly appreciate the thorough and insightful review provided by all three reviewers, which has considerably improved our manuscript, especially in terms of clarity in the presentation of the approach and further validation of the robustness implications of our results.

      Reference: Allen KR, Smith KA, Tenenbaum JB. 2020. Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences 117:29302–29310.

      Battaglia PW, Hamrick JB, Tenenbaum JB. 2013. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences 110:18327–18332.

      Friston K, Moran RJ, Nagai Y, Taniguchi T, Gomi H, Tenenbaum J. 2021. World model learning and inference. Neural Networks 144:573–590.

      Kriegeskorte N, Douglas PK. 2019. Interpreting encoding and decoding models. Current opinion in neurobiology 55:167–179.

      MacKay DM. 1956. The epistemological problem for automataAutomata Studies.(AM-34), Volume 34. Princeton University Press. pp. 235–252.

      Matsuo Y, LeCun Y, Sahani M, Precup D, Silver D, Sugiyama M, Uchibe E, Morimoto J. 2022. Deep learning, reinforcement learning, and world models. Neural Networks.

      Naselaris T, Kay KN, Nishimoto S, Gallant JL. 2011. Encoding and decoding in fMRI. Neuroimage 56:400–410.

      Zhou L, Smith K, Tenenbaum J, Gerstenberg T. 2022. Mental Jenga: A counterfactual simulation model of physical support.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study combines a comparative approach in different synapses with experiments that show how synaptic vesicle endocytosis in nerve terminals regulates short-term plasticity. The data presented support the conclusions and make a convincing case for fast endocytosis as necessary for rapid vesicle recruitment to active zones. Some aspects of the description of the data and analysis are however incomplete and would benefit from a more rigorous approach. With more discussion of methods and analysis, this paper would be of great interest to neurobiologists and biophysicists working on synaptic vesicle recycling and short-term plasticity mechanisms.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study examines the role of release site clearance in synaptic transmission during repetitive activity under physiological conditions in two types of central synapses, calyx of Held and hippocampal CA1 synapses. After the acute block of endocytosis by pharmacology, deeper synaptic depression or less facilitation was observed in two types of synapses. Acute block of CDC42 and actin polymerization, which possibly inhibits the activity of Intersectin, affected synaptic depression at the calyx synapse, but not at CA1 synapses. The data suggest an unexpected, fast role of the site clearance in counteracting synaptic depression.

      Strengths:

      The study uses an acute block of the molecular targets with pharmacology together with precise electrophysiology. The experimental results are clear-cut and convincing. The study also examines the physiological roles of the site clearance using action potential-evoked transmission at physiological Ca and physiological temperature at mature animals. This condition has not been examined.

      Weaknesses:

      Pharmacology may have some off-target effects, though acute manipulation should be appreciated. Although this is a hard question and difficult to address experimentally, reagents may affect synaptic vesicle mobilization to the release sites directly in addition to blocking endocytosis.

      To acutely block vesicle endocytosis, we utilized two different pharmacological tools, Dynasore and Pitstop-2, after testing their blocking spectra and potencies at the calyx presynaptic terminals and collected data of their common effects on target functions. Since the recovery from STD was faster at the calyx synapses in the presence of both endocytic blockers in physiological 1.3 mM [Ca2+] (Figure 2B), but not in 2.0 mM [Ca2+] (Figure S4), they might facilitate vesicle mobilization in physiological condition.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Mahapatra and Takahashi report on the physiological consequences of pharmacologically blocking either clathrin and dynamin function during compensatory endocytosis or of the cortical actin scaffold both in the calyx of Held synapse and hippocampal boutons in acute slice preparations

      Strengths:

      Although many aspects of these pharmacological interventions have been studied in detail during the past decades, this is a nice comprehensive and comparative study, which reveals some interesting differences between a fast synapse (Calyx of Held) tuned to reliably transmit at several 100 Hz and a more slow hippocampal CA1 synapse. In particular, the authors find that acute disturbance of the synaptic actin network leads to a marked frequency-dependent enhancement of synaptic depression in the Calyx, but not in the hippocampal synapse. This striking difference between both preparations is the most interesting and novel finding.

      Weaknesses:

      Unfortunately, however, these findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee et al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      The concept of FRP and SRP are derived from voltage-clamp step-depolarization experiments at calyces of Held in pre-hearing rodents at RT, which cannot be directly dissected in data of action-potential evoked EPSCs at post-hearing calyces at physiological conditions. However, we dissected as much by referring to related literatures in new paragraphs in Result section (p9-10), particularly on the different effects of Latrunculin application and experimental conditions by adding a new supplementary Figure (now S5). Regarding F-actin role in vesicle replenishment at cerebellar synapses, we added sentences in Discussion section (p14, last paragraph).

      Reviewer #3 (Public Review):

      General comments:

      (1) While Dynasore and Pitstop-2 may impede release site clearance due to an arrest of membrane retrieval, neither Latrunculin-B nor ML-141 specifically acts on AZ scaffold proteins. Interference with actin polymerization may have a number of consequences many of which may be unrelated to release site clearance. Therefore, neither Latrunculin-B nor ML-141 can be considered suitable tools for specifically identifying the role of AZ scaffold proteins (i.e. ELKS family proteins, Piccolo, Bassoon, α-liprin, Unc13, RIM, RBP, etc) in release site clearance which was defined as one of the principal aims of this study.

      In this study, we focused our analysis on the downstream activity of scaffold protein intersectin by comparing the common inhibitory effects of CDC42 and actin polymerization, by use of ML141 and Latrunculin B, respectively, on vesicle endocytosis and synaptic depression/ facilitation without addressing diverse individual drug effects. To avoid confusion we removed “AZ” from scaffold protein.

      (2) Initial EPSC amplitudes more than doubled in the presence of Dynasor at hippocampal SC->CA1 synapses (Figure S2). This unexpected result raises doubts about the specificity of Dynasor as a tool to selectively block SV endocytosis.

      It is possible that Dynasore might have unknown or off-target effects. However, the main conclusion is backed up by Pitstop-2.

      (3) In this study, the application of Dynasore and Pitstop-2 strongly decreases 100 Hz steady-state release at calyx synapses while - quite unexpectedly - strongly accelerates recovery from depression. A previous study found that genetic ablation of dynamin-1 actually enhanced 300 Hz steady-state release while only little affecting recovery from depression (Mahapatra et al., 2016). A similar scenario holds for the Latrunculin-B effects: In this study, Latrunculin-B strongly increased steady-state depression while in Babu et al. (2020), Latrunculin-B did not affect steady-state depression. In Mahapatra et al. (2016), Latrunculin-B marginally enhanced steady-state depression. The authors need to make a serious attempt to explain all these seemingly contradicting results.

      The latrunculin effect on STD can vary according to the condition of application and external [Ca2+], which we show in a new supplemental Figure S5. The latrunculin effect on the recovery from STD also varies with temperature, [Ca2+], and animal age, which affect Ca2+-dependent fast recovery component from depression. We added paragraphs for this issue in Results section (p9-10).

      (4) The experimental conditions need to be better specified. It is not clear which recordings were obtained in 1.3 mM and which (if any?) in 2 mM external Ca. It is also unclear whether 'pooled data' are presented (obtained from control recordings and from separate recordings after pre-incubation with the respective drugs), or whether the data actually represent 'before'/'after' comparisons obtained from the same synapses after washing in the respective drugs. The exact protocol of drug application (duration of application/pre-incubation?, measurements after wash-out or in the continuous presence of the drugs?) needs to be clearly described in the methods and needs to be briefly mentioned in Results and/or Figure legends.

      We added methodological explanations and reworded sentences in the text to be clear for pharmacological data derived from non-sequential separate experiments.

      (5) The authors compare results obtained in calyx with those obtained in SC->CA1 synapses which they considered examples for 'fast' and 'slow' synapses, respectively. There is little information given to help readers understand why these two synapse types were chosen, what the attributes 'fast' and 'slow' refer to, and how that may matter for the questions studied here. I assume the authors refer to the maximum frequency these two synapse types are able to transmit rather than to EPSC kinetics?

      Yes, the “fast and slow” naming features maximum operating frequency these synapses can transmit. We reworded “fast and slow” to “fast-signaling and slow-plastic” and added explanation in the text.

      (6) Strong presynaptic stimuli such as those illustrated in Figures 1B and C induce massive exocytosis. The illustrated Cm increase of 2 to 2.5 pF represents a fusion of 25,000 to 30,000 SVs (assuming a single SV capacitance of 80 aF) corresponding to a 12 to 15% increase in whole terminal membrane surface (assuming a mean terminal capacitance of ~16 pF). Capacitance measurements can only be considered reliable in the absence of marked changes in series and membrane conductance. Since the data shown in Figs. 1 and 3 are central to the argumentation, illustration of the corresponding conductance traces is mandatory. Merely mentioning that the first 450 ms after stimulation were skipped during analysis is insufficient.

      Conductance trace is shown with a trace of capacitance change induced by a square pulse in our previous paper (Yamashita et al, 2005 Science).

      (7) It is essential for this study to preclude a contamination of the results with postsynaptic effects (AMPAR saturation and desensitization). AMPAR saturation limits the amplitudes of initial responses in EPSC trains and hastens the recovery from depression due to a 'ceiling effect'. AMPAR desensitization occludes paired-pulse facilitation and reduces steady-state responses during EPSC trains while accelerating the initial recovery from depression. The use of, for example, 1 mM kynurenic acid in the bath is a well-established strategy to attenuate postsynaptic effects at calyx synapses. All calyx EPSC recordings should have been performed under such conditions. Otherwise, recovery time courses and STP parameters are likely contaminated by postsynaptic effects. Since the effects of AMPAR saturation on EPSC_1 and desensitization on EPSC_ss may partially cancel each other, an unchanged relative STD in the presence of kynurenic acid is not necessarily a reliable indicator for the absence of postsynaptic effects. The use of kynurenic acid in the bath would have had the beneficial side effect of massively improving voltage-clamp conditions. For the typical values given in this MS (10 nA EPSC, 3 MOhm Rs) the expected voltage escape is ~30 mV corresponding to a change in driving force of 30 mV/80 mV=38%, i.e. initial EPSCs in trains are likely underestimated by 38%. Such large voltage escape usually results in unclamped INa(V) which was suppressed in this study by routinely including 2 mM QX-314 in the pipette solution. That approach does, however, not reduce the voltage escape.

      Glutamate released during AP-evoked EPSCs does not saturate or desensitize postsynaptic receptors at post-hearing calyces of Held (Ishikawa et al, 2002; Yamashita et al, 2003) although it does in pre-hearing calyces (Yamashita et al, 2009). In fact, as shown in Figure S3, our results are essentially the same with or without kynurenate.

      (8) In the Results section (pages 7 and 8), the authors analyze the time course into STD during 100 Hz trains in the absence and presence of drugs. In the presence of drugs, an additional fast component is observed which is absent from control recordings. Based on this observation, the authors conclude that '... the mechanisms operate predominantly at the beginning of synaptic depression'. However, the consequences of blocking or slowing site clearing are expected to be strongly release-dependent. Assuming a probability of <20% that a fusion event occurs at a given release site, >80% of the sites cannot be affected at the arrival of the second AP even by a total arrest of site clearance simply because no fusion has yet occurred. That number decreases during a train according to (1-0.2)^n, where n is the number of the AP, such that after 10 APs, ~90% of the sites have been used and may potentially be unavailable for new rounds of release after slowing site clearance. Perhaps, the faster time course into STD in the presence of the drugs isn't related to site clearance?

      Enhanced depression at the beginning of stimulation indicates the block of rapid SV replenishment mechanism, which includes endocytosis-dependent site-clearance and scaffold-dependent vesicle translocation to release sites.

      (9) In the Discussion (page 10), the authors present a calculation that is supposed to explain the reduced size of the second calyx EPSC in a 100 Hz train in the presence of Dynasore or Pitstop-2. Does this calculation assume that all endocytosed SVs are immediately available for release within 10 ms? Please elaborate.

      We do not assume rapid endocytosed vesicle reuse within 10 ms as it requires much longer time for glutamate refilling (7s at PT; Hori & Takahashi, 2012). Instead, already filled reserved vesicles can rapidly replenish release sites if sites are clean and scaffold works properly. Results shown in Figure S6 also indicate that block of vesicle transmitter refilling has no immediate effect on synaptic responses.

      (10) It is not clear, why the bafilomycin/folimycin data is presented in Fig. S5. The data is also not mentioned in the Discussion. Either explain the purpose of these experiments or remove the data.

      These v-ATPase blockers, which block vesicular transmitter refilling, are reported to enhance EPSC depression at hippocampal synapses at RT and 2 mM [Ca2+] presumably because of lack of filled vesicles undergoing rapid vesicle recycling (eg Kiss & Run). We thought it important to determine whether these data have physiological relevance since such a mechanism might also regulate synaptic strength during repetitive transmission. However, our results did not support its physiological relevance. Since these results are not within our main questions, the negative results are shown it in supplementary Figure 6 and explained in the last paragraph of Result section (p11), but were not discussed further in Discussion section.

      (11) The scheme in Figure 7 is not very helpful.

      We updated the scheme to summarize our conclusion that vesicle replenishment through endocytosis-dependent site-clearance and scaffold-dependent mechanism independently co-operate to strengthen synaptic efficacy during repetitive transmission at calyx fast-signaling synapses. However, endocytic site clearance is solely required to support facilitation at slow-plastic hippocampal SC-CA1 synapses.

      Recommendations for the authors:

      First, my deep apologies for the long delay in reviewing your paper. All reviewers are now in agreement that the paper has valuable new information, but some methods are not described well and some results appear to be incompatible with previous results in the literature. The discussion of previous literature is also incomplete and not well-balanced. With more discussion of methods and literature strengthened this paper would be of great interest to neurobiologists and biophysicists working on synaptic vesicle recycling and short-term plasticity mechanisms. We ask that you address the comments and revise your paper before we can fully recommend the paper as being an important contribution with compelling evidence and a strong data set that supports the conclusions.

      We explained methods more explicitly. Apparent incompatibility with previous results is now explained and discussed with new supplementary data.

      Major:

      (1) In this study, the application of Dynasore and Pitstop-2 strongly decreased 100 Hz steady-state release at calyx synapses while - quite unexpectedly - it strongly accelerated recovery from depression. A previous study found that genetic ablation of dynamin-1 actually enhanced 300 Hz steady-state release while only little affecting recovery from depression (Mahapatra et al., 2016). A similar scenario holds for the Latrunculin-B effects: In this study, Latrunculin-B strongly increased steady-state depression while in Babu et al. (2020), Latrunculin-B did not affect steady-state depression. In Mahapatra et al. (2016), Latrunculin-B marginally enhanced steady-state depression. The authors need to make a serious attempt to explain all these seemingly contradicting results.

      Lack of change in the recovery from depression in dynamin-1 knockout mice by Mahapatra et al (2016) is consistent with results in Figure S4 in 2 mM [Ca2+], whereas accelerated recovery by Dynasore (Figure 2B2) is observed in 1.3 mM [Ca2+] suggesting that it is masked in 2 mM [Ca2+] but revealed in physiological [Ca2+] (p7, top paragraph). In both cases, however, recovery from STD is not prolonged unlike Hosoi et al (2009).

      The latrunculin issues are discussed in Results section with newly added Supplementary Figure S5 (p9-10).

      (2) The experimental conditions need to be better specified. It is not clear which recordings were obtained in 1.3 mM and which (if any?) in 2 mM external Ca. It is also unclear whether 'pooled data' are presented (obtained from control recordings and from separate recordings after pre-incubation with the respective drugs), or whether the data actually represent 'before'/'after' comparisons obtained from the same synapses after washing in the respective drugs. The exact protocol of drug application (duration of application/pre-incubation?, measurements after wash-out or in the continuous presence of the drugs?) needs to be clearly described in the methods and needs to be briefly mentioned in Results and/or Figure legends.

      We made these points clearer in Method section and Result section.

      (3) Please cite and discuss briefly previous papers that have shown fast endocytosis in the calyx of Held with membrane capacitance measurements like Renden and von Gersdorff, J Neurophysiology, 98:3349, 2007 and Taschenberger et al., Neuron, 2002. These papers first showed exocytosis and endocytosis kinetics in more mature (hearing) mice calyx of Held and at higher physiological temperatures.

      One of these literatures relevant to the present study is quoted in p4.

      (4) The findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee et al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      We added discussions on the issue of latrunculin in Result section by quoting previous literatures (p9-10). Since there is no direct evidence (by vesicle imaging) for the presence of FRP and SRP, these definitions derived from voltage clamp step-depolarization studies are difficult to incorporate into the dissection of synaptic depression in physiological conditions.

      Reviewer #1 (Recommendations For The Authors):

      I have no major comments, but the following issues may be addressed.

      (1) The term "fast and slow" synapses may be relative and a bit confusing. I do not think hippocampal synapses are slow synapses.

      We have replaced “fast and slow” by “fast-signaling and slow-plastic” to represent their functions and added explanation in the text.

      (2) Off-target effects of pharmacological effects may be discussed. In this respect, bafilomycin experiments can be used to argue against the slow effects of vesicle cycling such as endocytosis, and vesicle mobilization. However, the effects on rapid vesicle mobilization cannot be excluded entirely. Because I cannot exclude the absence of off-target effects either (can be addressed by looking at single vesicle imaging at nano-scale, which is hard to do or looking at EM level quantitatively?), I feel this is a matter of discussion.

      It is possible that Dynasore might have unknown or off-target effects. However, the main conclusion is backed up by Pitstop-2.

      (3) Fig2 A2, B2 and Fig 4 A2 and B2. It is easier to plot the recovery only normalized to the initial value. Subtracting steady-state is somewhat confusing because the recovery looks faster after deeper depression, but this may be just apparent.

      We have given values for both types of plots in Table 2, which indicates no essential difference in the recovery parameters.

      Reviewer #2 (Recommendations For The Authors):

      Line 51: Rajappa et al. (2016) investigated clearance deficits in synaptophysin KO mice (not synaptobrevin).

      Corrected.

      Line 54: intersectin is introduced as AZ scaffold protein, although in most of the literature, it is referred to as an endocytic scaffold protein (also in the cited one, e.g. Sakaba et al. 2013). At least, this should be discussed.

      Since blockers of intersectin downstream protein activity has no effect on vesicle endocytosis (Figure 3 and Sakaba et al, 2013), we called it (presynaptic) scaffold protein instead of endocytic scaffold protein.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments

      Page 1, Title: I don't think the presented data address the role of the presynaptic scaffold in SV replenishment. In addition, 'SV replenishment' and 'site clearance' should not be used synonymously as it seems to be implied here.

      In this study our focus was on the downstream activity of scaffold protein intersectin and since block of its downstream effector proteins CDC42 and actin activities do not obstruct the endocytic activity (Fig 3, and Sakaba et al., 2013), instead of naming it as “endocytic scaffold protein”, we adopted “presynaptic scaffold protein”.

      We have corrected it in the text.

      Page 2, Abstract: Clarify 'physiologically optimized condition' here and elsewhere in the manuscript.

      Abstract: in physiologically optimized condition → in physiological temperature and Ca2+.

      Page 3, line 62: I don't think 'the site-clearance hypothesis is widely accepted'. There are very few models that implement such a mechanism. Examples would be Pan & Zucker (2009) Neuron and Lin, Taschenberger & Neher 2022 (PNAS) which could be cited.

      62: the site-clearance hypothesis is “widely accepted”→ “well supported”

      Page 3 line 77: Please clarify 'fast synapses

      77: fast synapses→fast-signaling synapses, added clarification in the text.

      Page 4, line 100: Please clarify 'in the maximal rate'.

      100: in the maxima rate→reached during 1-Hz stimulation.

      Page 6, line 136: Please clarify 'to reduce the gap'.

      136: To reduce the gap between these different results→To explore the reason for these different results

      Page 7, line 157: I don't consider ML141 and Latrunculin-B 'scaffold protein inhibitors'.

      157: scaffold protein inhibitors had no effect on→ reworded as “none of these inhibitors affected fast or slow endocytosis”.  

      Page 7, line 162: P-value missing.

      162: p < 0.001 added.

      Page 8, line 184: "Since both endocytic blockers and scaffold inhibitors enhanced synaptic depression with a similar time course" consider rephrasing. Sounds like you refer to the time course by which these drugs exert their effect after being applied.

      184: Since both endocytic blockers and scaffold inhibitors enhance synaptic depression with a similar time course→Since the enhancement of synaptic depression by endocytic blockers or scaffold inhibitor occurred mostly at the early phase of synaptic depression.

      Same on page 11, line 250: "At the calyx of Held, scaffold protein inhibitors significantly enhanced synaptic depression with a time course closely matching to that enhanced by endocytic blocker" Please consider rephrasing.

      At the calyx of Held, scaffold protein inhibitors significantly enhanced synaptic depression with a time course closely matching to that enhanced by endocytic blocker →the early phase of synaptic depression like endocytic blockers

      Page 13, line 318: Please clearly state which experiments were performed at 1.3 mM and which at 2 mM external Ca if two different concentrations were used during recordings.

      320: Added text “Unless otherwise noted, EPSCs were recorded in 1.3 mM [Ca2+] aCSF at 37oC” in the methods.

      Page 15: line 346: Reference in the wrong format.

      346; (25) → (Yamashita et al, 2005)

      Page 15: line 351: Do you mean to say every 10 s and every 20 s? Please clarify.

      No, averaged at 10 ms and 20 ms, respectively as written.

      Page 16, line 369: 1 mM kyn was present in only very few experiments shown in the supplemental figures. Please clarify.

      368: In some experiments, to test in the presence of 1 mM kyn, if there is any difference in enhanced STD following endocytic block. However, as shown in Figure S3, our results are essentially the same with or without kynurenate, suggesting glutamate released during AP-evoked EPSCs does not saturate or desensitize postsynaptic receptors at post-hearing calyces of Held (Ishikawa et al, 2002; Yamashita et al, 2003) unlike in pre-hearing calyces (Yamashita et al, 2009).

      Page 16, line 387: You cannot simply use multiple t-tests to compare a single control to multiple test conditions which seems to be the scenario here. Please correct or clarify.

      Experimental protocols are clarified in Methods as “Experiments were designed as population study using different cells from separate brain slices under control and drug treatment, rather than on a same cell before and after the drug exposure.”

      Table S1: 'Endo decay rate'. It's either the 'Endo rate' or the 'Deacy rate of delta Cm'. Please correct.

      Corrected as Endocytosis rate (Endo rate).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The authors' primary research question revolves around the inquiry of "how far in advance semantic information might become available from parafoveal preview." In contrast to prior studies, the current research seeks to achieve a breakthrough in terms of timing by employing innovative technology. They mention in the manuscript that "most of these studies have been limited to measuring parafoveal preview from fixations to an immediately adjacent word... We tackle these core issues using a new technique that combines the use of frequency tagging and the measurement of magnetoencephalography (MEG)-based signals." However, the argumentation for how this new technology constitutes a breakthrough is not sufficiently substantiated. Specifically, there are two aspects that require further clarification. Firstly, the authors should clarify the importance of investigating the timing of semantic integration in their research question. They need to justify why previous studies focusing on the preview effect during fixations to an immediately adjacent word cannot address their specific inquiry about "how far in advance semantic information might become available from parafoveal preview," which requires examining parafoveal processing (POF). Secondly, in terms of the research methodology, the authors should provide a more comprehensive explanation of the advantages offered by MEG technology in the observation of the timing of semantic integration compared to the techniques employed in prior research. Indeed, the authors have overlooked some rather significant studies in this area. For instance, the research conducted by Antúnez, Milligan, Hernández-Cabrera, Barber, & Schotter in 2022 addresses the same research question mentioned in the current study and employs a similar experimental design. Importantly, they utilize a natural reading paradigm with synchronized ERP and eye-tracking recordings. Collectively, these studies, along with the series of prior research studies employing ERP techniques and RSVP paradigms discussed by the authors in their manuscript, provide ample evidence that semantic information becomes available and integrated from words before fixation occurs. Therefore, the authors should provide a more comprehensive citation of relevant research and delve deeper into explaining the potential contributions of their chosen technology to this field.

      We express our gratitude to the reviewer for providing insightful comments. Firstly, we clarify the advantages of the RIFT technique. The revised paragraph is on Page 4 with tracked changes and is copied as follows:

      “…… The RIFT technique provides a notable advantage by generating a signal — the tagging response signal — specifically yoked to just the tagged word. This ensures a clear separation in processing the tagged word from the ongoing processing of other words, addressing a challenge faced by eye tracking and ERP/FRP approaches. Moreover, RIFT enables us to monitor the entire dynamics of attentional engagement with the tagged word, which may begin a few words before the tagged word is fixated.”

      We also rephase our research questions in the introduction section on Page 5 with tracked changes:

      “This paradigm allows us to address three questions. First, we aimed to measure when in the course of reading people begin to direct attention to parafoveal words. Second, we sought to ascertain when semantic information obtained through parafoveal preview is integrated into the sentence context. Modulations of pre-target RIFT responses by the contextual congruity of target words would serve as evidence that parafoveal semantic information has not only been extracted and integrated into the sentence context but that it is affecting how readers allocate attention across the text. Third, we explored whether these parafoveal semantic attention effects have any relationship to reading speed.”

      Secondly, we would like to elucidate the significance of investigating the timing of semantic integration and why this complements existing findings of parafoveal processing (POF) during reading. Our manuscript has been revised accordingly, with specific modifications highlighted on Page 2. The revised passage reads as follows:

      “…… eye tracking-based evidence for the extraction of parafoveal semantic information …… was eventually extended into English …… For example, Schotter and Jia (2016) showed preview benefits on early gaze measures for plausible compared to implausible words, even for plausible words that were unrelated to the target. These results demonstrate that semantic information can indeed be extracted from parafoveal words. However, due to the limitations of the boundary paradigm, which only assesses effects after target words have been fixated, it is challenging to precisely determine when and how parafoveal semantic processing takes place. Furthermore, it is generally hard to distinguish between the effects of cross-saccade integration (e.g., mismatch between the preview and the word fixated) and the effects of how differing words fit into the context itself (Veldre and Andrews, 2016a, 2016b).”

      Thirdly, we now better highlight the contributions of Antúnez et al. paper as they have provided important evidence for parafoveal semantic processing during natural reading. The relevant modifications are highlighted on Page 3. The revised passage is as follows: “Although many of these effects have been measured in the context of unnatural reading paradigms (e.g., the “RSVP flanker paradigm”), similar effects obtain during natural reading. Using the stimuli and procedures from Schotter and Jia (2016), Antúnez et al. (2022) showed that N400 responses, measured relative to the fixation before the target words (i.e., before the boundary change while the manipulated words were in parafoveal preview), were sensitive to the contextual plausibility of these previewed words. These studies suggest that semantic information is available from words before they are fixated, even if that information does not always have an impact on eye fixation patterns.”

      References:

      Schotter ER, Jia A. 2016. Semantic and plausibility preview benefit effects in English: Evidence from eye movements. J Exp Psychol Learn Mem Cogn 42:1839–1866. doi:10.1037/xlm0000281

      Veldre A, Andrews S. 2016a. Is Semantic Preview Benefit Due to Relatedness or Plausibility? J Exp Psychol Hum Percept Perform 42:939–952. doi:10.1037/xhp0000200

      Veldre A, Andrews S. 2016b. Semantic preview benefit in English: Individual differences in the extraction and use of parafoveal semantic information. J Exp Psychol Learn Mem Cogn 42:837–854. doi:10.1037/xlm0000212

      Antúnez M, Milligan S, Andrés Hernández-Cabrera J, Barber HA, Schotter ER. 2022. Semantic parafoveal processing in natural reading: Insight from fixation-related potentials & eye movements. Psychophysiology 59:e13986. doi:10.1111/PSYP.13986

      (2) Further, the authors emphasize semantic integration in their observed results but overlook the intricate relationship between access, priming, and integration. This assertion appears overly confident. Despite using low-constraint sentences and low-predicted targets (lines 439-441), differences between congruent and incongruent conditions may be influenced by word-level factors. For instance, in the first coherent sentence, such as "Last night, my lazy brother came to the party one minute before it was over" (line 1049), replacing the keyword "brother" with an incongruent word could create an incoherent sentence, possibly due to semantic violation, relation mismatch with "lazy," or prediction error related to animate objects. A similar consideration applies to the second example sentence, "Lily says this blue jacket will be a big fashion trend this fall" (line 1050), where the effect might result from a discrepancy between "blue" and an incongruent word. However, the authors do not provide incongruent sentences to substantiate their claims. I recommend that the authors discuss alternative explanations and potentially control for confounding factors before asserting that their results unequivocally reflect semantic integration. My intention is not to dispute the semantic integration interpretation but to stress the necessity for stronger evidence to support this assertion.

      We agree with the reviewer that stimulus control is very critical for this kind of work and apologize for the lack of clarity in the original manuscript.

      (1) We fully agree that word-level factors can be an important confound, which is why we carefully controlled word-level factors in the experimental design. As detailed in the Appendix of the original manuscript, each pair of target words has been strategically embedded into two sentences, allowing for the creation of both congruent and incongruent sentence pairs through the interchange of these words. We now have explicitly specified this design in all sentences, as reflected in the edited manuscript on Page 38. For example, considering the exemplar pair of “brother/jacket”,

      “Last night, my lazy brother/jacket came to the party one minute before it was over.

      Lily says this blue jacket/brother will be a big fashion trend this fall.”

      In this design, the pair of target words is presented in both congruent and incongruent sentences. Participant A reads “lazy brother” and “blue jacket”, while Participant B reads “lazy jacket” and “blue brother”. This approach ensures that the same target words appear in both congruent and incongruent conditions across participants, serving as an effective control for word-level factors.

      (2) We acknowledge that the consideration of word-level information is crucial when making claims about contextual integration in the current study. However, we don’t think there are many cases in the stimulus set where a single feature like animacy is enough to create the mismatch. Instead, the stimuli were written so that it is not possible to strongly predict any word or even a specific semantic feature, so that appreciating the mismatch requires the comprehender to integrate the word into the context (and especially to integrate the word with the immediately preceding one). However, this more local modifier/noun plausibility may behave differently from a more global contextual plausibility, which is a limitation of the stimulus set and has been discussed in the revised manuscript, as indicated by the tracked changes on Page 16, as copied below:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      Reviewer #2 (Public Review):

      This MEG study used co-registered eye-tracking and Rapid Invisible Frequency Tagging (RIFT) to track the effects of semantic parafoveal preview during natural sentence reading. Unpredictable target words could either be congruent or incongruent with sentence context. This modulated the RIFT response already while participants were fixating on the preceding word. This indicates that the semantic congruency of the upcoming word modulates visual attention demands already in parafoveal preview.

      The quest for semantic parafoveal preview in natural reading has attracted a lot of attention in recent years, especially with the development of co-registered EEG and MEG. Evidence from dynamic neuroimaging methods using innovative paradigms as in this study is important for this debate.

      We express our gratitude to the reviewer for recognizing the significance of our research question in the domain of natural reading.

      Major points:

      (1) The authors frame their study in terms of "congruency with sentence context". However, it is the congruency between adjective-noun pairs that determines congruency (e.g. "blue brother" vs "blue jacket", and examples p. 16 and appendix). This is confirmed by Suppl Figure 1, which shows a significantly larger likelihood of refixations to the pre-target word for incongruent sentences, probably because the pre-target word is most diagnostic for the congruency of the target word. The authors discuss some possibilities as to why there is variability in parafoveal preview effects in the literature. It is more likely to see effects for this simple and local congruency, rather than congruency that requires an integration and comprehension of the full sentence. I'm not sure whether the authors really needed to present their stimuli in a full-sentence context to obtain these effects. This should be explicitly discussed and also mentioned in the introduction (or even the abstract).

      We have addressed this limitation of the study explicitly in the revised manuscript. The modifications can be found in the tracked changes on Page 16, and is copied as follows:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      (2) The authors used MEG and provided a source estimate for the tagging response (Figure 2), which unsurprisingly is in the visual cortex. The most important results are presented at the sensor level. This does not add information about the brain sources of the congruency effect, as the RIFT response probably reflects top-down effects on visual attention etc. Was it necessary to use MEG? Would EEG have produced the same results? In terms of sensitivity, EEG is better than MEG as it is more sensitive to radial and deeper sources. This should be mentioned in the discussion and/or methods section.

      Source estimation was exclusively provided for the tagging response rather than the congruency effect because we posit that this conditional contrast would emanate from the same brain regions exhibiting the tagging responses in general. As depicted in the following figure, source localization for the congruency effect was identified in the left association cortex (Brodmann area 18), the same area as the source localization for the tagging response (the negative cluster observed here is due to the incongruent minus congruent contrast). While we agree with the Reviewer that the RIFT result might indicate a top-down effect on visual attention, it is important to note that, due to the low-pass filter property of synapses, observing a tagging response at a high frequency beyond the visual cortex is challenging.

      Author response image 1.

      We discussed the necessity of using MEG in the edited manuscript with tracked changes on Page 20, and is copied as follows:

      “While the current study was conducted using MEG, these procedures might also work with EEG. If so, this would make our approach accessible to more laboratories as EEG is less expensive. However, there are currently no studies directly comparing the RIFT response in EEG versus MEG. Therefore, it would be of great interest to investigate if the current findings can be replicated using EEG.”

      (3) The earliest semantic preview effects occurred around 100ms after fixating the pre-target word (discussed around l. 323). This means that at this stage the brain must have processed the pre-target and the target word and integrated their meanings (at some level). Even in the single-word literature, semantic effects at 100 ms are provocatively early. Even studies that tried to determine the earliest semantic effects arrived at around 200 ms (e.g. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382728/, https://psycnet.apa.org/record/2013-17451-002). The present results need to be discussed in a bit more detail in the context of the visual word recognition literature.

      We have incorporated this valuable suggestion into the discussion section to enhance the clarity of our key result regarding the timing of parafoveal semantic integration. The revised manuscript with tracked changes can be found on Page 14, and the relevant passage is provided below:

      “Our results also provide information about the time course of semantic integration …… by as early as within 100 ms after fixating on the pre-target word. The timing of this parafoveal semantic effect appears remarkably early, considering that typical semantic access for a single word occurs no earlier than around 200 ms, as demonstrated in the visual word recognition literature (Carreiras et al., 2014). For instance, in a Go/NoGo paradigm, the earliest distinguishable brain activity related to category-related semantic information of a word occurs at 160 ms (Amsel et al., 2013; Hauk et al., 2012). Therefore, the RIFT results presented here suggest that natural reading involves parallel processing that spans multiple words. The level of (covert) attention allocated to the target word, as indexed by the significant difference in RIFT responses compared to the baseline interval, was observed even three words in advance (see Figure 2C). This initial increase in RIFT coincided with the target entering the perceptual span (McConkie and Rayner, 1975; Rayner, 1975; Underwood and McConkie, 1985), likely aligning with the initial extraction of lower-level perceptual information about the target. The emerging sensitivity of the RIFT signal to target plausibility, detected around 100 ms after the fixation on the pre-target word, suggests that readers at that time had accumulated sufficient semantic information about the target words and integrated that information with the evolving sentence context. Therefore, it is plausible that the initial semantic processing of the target word commenced even before the pre-target fixation and was distributed across multiple words. This parallel processing of multiple words facilitates rapid and fluent reading.”

      References:

      Carreiras M, Armstrong BC, Perea M, Frost R. 2014. The what, when, where, and how of visual word recognition. Trends Cogn Sci 18:90–98. doi:10.1016/j.tics.2013.11.005

      Amsel BD, Urbach TP, Kutas M. 2013. Alive and grasping: Stable and rapid semantic access to an object category but not object graspability. Neuroimage 77:1–13. doi:10.1016/J.NEUROIMAGE.2013.03.058

      Hauk O, Coutout C, Holden A, Chen Y. 2012. The time-course of single-word reading: Evidence from fast behavioral and brain responses. Neuroimage 60:1462. doi:10.1016/J.NEUROIMAGE.2012.01.061

      McConkie GW, Rayner K. 1975. The span of the effective stimulus during a fixation in reading. Percept Psychophys 17:578–586. doi:10.3758/BF03203972

      Rayner K. 1975. The perceptual span and peripheral cues in reading. Cogn Psychol 7:65–81.

      Underwood NR, McConkie GW. 1985. Perceptual Span for Letter Distinctions during Reading. Read Res Q 20:153. doi:10.2307/747752

      (4) As in previous EEG/MEG studies, the authors found a neural but no behavioural preview effect. As before, this raises the question of whether the observed effect is really "critical" for sentence comprehension. The authors provide a correlation analysis with reading speed, but this does not allow causal conclusions: Some people may simply read slowly and therefore pay more attention and get a larger preview response. Some readers may hurry and therefore not pay attention and not get a preview response. In order to address this, one would have to control for reading speed and show an effect of RIFT response on comprehension performance (or vice versa, with a task that is not close to ceiling performance). The last sentence of the discussion is currently not justified by the results.

      We acknowledge that the correlation analysis between the RIFT effect and reading speed on the group level lacks causality, making it less ideal for addressing this question. We have incorporated this acknowledgment as one of the limitations of the current study in the revised manuscript on Page 16, as indicated by the tracked changes, and the relevant passage is provided below:

      “Two noteworthy limitations exist in the current study. …… Secondly, the correlation analysis between the pre-target RIFT effect and individual reading speed (Figure 5) does not establish a causal relationship between parafoveal semantic integration and reading performance. Given that the comprehension questions in the current study were designed primarily to maintain readers’ attention and the behavioural performance reached a ceiling level, employing more intricate comprehension questions in future studies would be ideal to accurately measure reading comprehension and reveal the impact of semantic parafoveal processing on it.”

      We reformulated the last sentence:

      “These results support the idea that words are processed in parallel and suggest that early and deep parafoveal processing may be important for fluent reading.”

      (5) L. 577f.: ICA components were selected by visual inspection. I would strongly recommend including EOG in future recordings when the control of eye movements is critical.

      We appreciate the reviewer for providing this valuable suggestion. We acknowledge that EOG recordings were not included in the current study due to restrictions on MEG data collection from the University of Birmingham during the COVID-19 pandemic. In our future studies, we will follow the reviewer's suggestion to incorporate EOG recordings in data collection. This addition will facilitate optimal eye movement-related artifact rejection through ICA, as recommended by Dimigen in his methodological paper:

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      (6) The authors mention "saccade planning" a few times. I would suggest looking at the SWIFT model of eye movement control, which is less mechanistic than the dominant EZ-Reader model (https://psycnet.apa.org/record/2005-13637-003). It may be useful for the framing of the study and interpretation of the results (e.g. second paragraph of discussion).

      In the revised manuscript, we have provided a more comprehensive explanation eye movements/saccade planning, aligning it with the SWIFT model. Please refer to Page 15 with tracked changes, and the updated passage is provided below:

      “The results of the present study are aligned with the SWIFT model of eye movement control in natural reading (Engbert et al., 2005), wherein the activation field linked to a given word is hypothesized to be both temporally and spatially distributed. Indeed, we found that the initial increase in covert attention to the target word occurred as early as three words before, as measured by RIFT responses (Figure 2C). These covert processes enable the detection of semantic incongruity (Figure 3B and Figure 3C). However, it may occur at the non-labile stage of saccade programming, preventing its manifestation in fixation measures of the currently fixated pre-target word (Figure 1B). Therefore, the RIFT technique’s capacity to yoke patterns to a specific word offers a unique opportunity to track the activation field of word processing during natural reading.”

      References:

      Engbert R, Nuthmann A, Richter EM, Kliegl R. 2005. Swift: A dynamical model of saccade generation during reading. Psychol Rev 112:777–813. doi:10.1037/0033-295X.112.4.777

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript is well-written and presents a structured analysis of the data, it requires further clarification and substantiation regarding the originality of the research questions, the advantages of the proposed methodology, and the interpretation of the results related to semantic integration. Additional references and a more thorough discussion of related research are needed to strengthen the manuscript's contribution to the field.

      We appreciate the reviewer's kind words about this manuscript and the insightful comments and suggestions provided. In the revised manuscript, we have now placed additional emphasis on the importance of investigating semantic integration within the realm of parafoveal processing in natural reading. We have clarified the advantages of employing MEG and RIFT and expanded upon our results in the context of Antúnez et al.'s 2022 paper, as suggested by the reviewer.

      Reviewer #2 (Recommendations For The Authors):

      (1) L. 59: The "N400" has been linked to much more than "semantic access". I think it is widely accepted that "access" happens (or at least begins) earlier, and that the N400 reflects high-level integration processes etc.

      Earlier debates about whether the N400 is more linked to access or integration have resolved in favour of an access account, but with a growing appreciation of the blurred boundaries between constructions like access, priming, and integration, as Reviewer 1 also pointed out in comment #2.

      (2) L. 177: I wasn't sure about the selection of sensors. Were the same sensors used for all participants (whether they had a tagging response or not)?

      We appreciate the reviewer for highlighting the confusion regarding the sensor selection procedure in the study. In response, we have added further clarifications about this procedure in the Method section of the revised manuscript. The relevant changes can be found on Page 25 with tracked changes, and the modified passage is reproduced below:

      "Please note that the tagging response sensors may vary in number across participants (7.9 ± 4.5 sensors per participant, M ± SD). Additionally, they may have a different but overlapping spatial layout, primarily over the visual cortex. For the topography of all tagging response sensors, please refer to Figure 2A."

      (3) Ll. 247ff.: I don't understand the idea of a "spill-over effect". The future cannot spill into the past. Or does this refer to possible artefacts or technical problems?

      In the revised manuscript, we have rephrased this passage with tracked changes on Page 11, and the updated version is provided below:

      “We conducted a similar analysis of the coherence measured when participants fixated the target word and found no significant modulations related to the contextual congruity of that target word. …… Thus, the parafoveal semantic integration effect identified during the pre-target intervals cannot be attributed to signal contamination from fixations on the target word induced by the temporal smoothing of filters.”

      (4) I struggled to follow the "internal attention" explanation for the paradoxical RIFT effect (p. 11/12).

      We appreciate the reviewer for pointing out the confusion, and we have rephrased the passage in the revised manuscript with tracked changes on Page 13. The revised version is provided below:

      "Previous work has demonstrated that tagging responses decrease as attention shifts from an external task (e.g., counting visual targets) to an internal task (e.g., counting heartbeats) (Kritzman et al., 2022). Similarly, in a reading scenario, visually perceiving the flickering word constitutes an external task, while the internal task involves the semantic integration of previewed information into the context. If more attentional resources are internally directed when faced with the challenge of integrating a contextually incongruent word, fewer attentional resources would remain for processing the flickering word. This may be the kind of shift reflected in the reduction in RIFT responses."

      References:

      Kritzman L, Eidelman-Rothman M, Keil A, Freche D, Sheppes G, Levit-Binnun N. 2022. Steady-state visual evoked potentials differentiate between internally and externally directed attention. Neuroimage 254:119133.

      (5) L. 572: Why was detrending necessary on top of a 0.5 Hz high-pass filter? Was detrending applied to the continuous raw data, or to epochs? Was it just the linear trend or other polynomial terms?

      We agree with the Reviewer that, given the prior application of a 0.5Hz high-pass filter to the data, the detrending does not alter the data. Nonetheless, we included this procedure in the manuscript for the sake of completeness. In the revised manuscript, we have provided additional clarification on this point, as indicated by the tracked changes on Page 23. The modified passage is presented below:

      "Subsequently, detrending was applied individually to each channel of the continuous raw data to factor out the linear trend."

      (6) Source analysis, p. 25f.: How was the beamformer regularized?

      This information was already included in the original manuscript on Page 26. The original text is provided below for reference:

      “No regularisation was performed to the CSD matrices (lambda = 0).”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Why does stimulation at 0.15 Hz show a third harmonic signal (Figure 5A) but 0.25 Hz does not show a second harmonic signal?

      Second and third harmonic signals were sometimes observed in 0.15 Hz and also in 0.25 Hz and other frequency stimulations. The second harmonic signal is easier to understand as vasomotion may be reacting to both directions of oscillating stimuli. The reason for the emergence of the third harmonics was totally unknown. These harmonic signals were not always observed, and the magnitude of these signals was variable. The frequency-locked signal was robust, thus, in this manuscript, we decided to describe only this signal. These observations are mentioned in the revised manuscript (Results, page 9, paragraph 2).

      References for the windows are missing. Closed craniotomy: (Morii, Ngai, and Winn 1986). Thinned skull: (Drew et al. 2010).

      These references were incorporated into the revised manuscript.

      An explanation of, or at least a discussion on, why a flavoprotein or other intrinsic signal from the parenchyma might follow vasomotion with high fidelity would be most helpful.

      We spend a large part of the Results describing that any fluorescence signal from the brain parenchyma follows the vasomotion because the blood vessels largely lack fluorescence signals within the filter band that we observe. This is described as “shadow imaging”. What was rather puzzling was that flavoprotein or other intrinsic signals were phase-shifted in time. This suggests that these autofluorescence signals have an anti-phase “shadow imaging” component and another component that is phase-shifted in time. This is described in the manuscript as the following.

      (Results, page 13, paragraph 2)

      “Production and degradation of flavin and other metabolites may be induced by the fluctuation in the blood vessel diameter with a fixed delay time. The phase shift in the autofluorescence could be due to the additive effect of “shadow” imaging of the vessel and to the concentration fluctuation of the autofluorescent metabolite”

      Glucose and oxygen are likely to be abundantly delivered during the vasodilation phase compared to the vasoconstriction phase of vasomotion. These molecules will trigger cell metabolism and endogenous fluorescent molecules such as NADH, NADPH, and FAD may increase or decrease with a certain delay, which is required for the chemical reactions to occur. Therefore, the concentration fluctuation of these metabolites could lag in time to the changes in the blood flow. These discussions are added in the revised manuscript (Discussions, page 19, paragraph 2).

      Reviewer #2 (Recommendations For The Authors):

      Minor corrections to the text and figures:

      (1) Figures 1 and 2- The single line slice basal and dilated traces are larger in Figure 2 (intact skull) than in Figure 1 (thinned skull)- have these been mixed up, as the authors state in the text that larger dilations are detected in the thinned skull preparation?

      The example vessel described for the thinned skull (Figure 1) happened to be larger than that shown for the intact skull (Figure 2). We did not describe that larger dilations are observed in the thinned skull preparation. What was described was that the vessel profiles were shallower in the intact skull. This is because the presence of the intact skull blurs the fluorescence image.

      (2) Figure 3- I think the lower panel of the amplitude spectrums from 3 individual animals included in D would benefit from being in its own panel within this Figure (i.e. E). The peak ratio is also used in this figure, but the equation to calculate this is not displayed until Figure 4.

      We thank the reviewer for recommending making the figure more comprehensible. We have divided panel D into D and E and shifted the panel character accordingly. The manuscript text was also updated.

      As the reviewer describes, the peak ratio of 0.25 Hz is used in Figure 3E (original). However, the equation to calculate this figure is described in the appropriate location within the main text of the manuscript (Results, page 10, paragraph 2) as well as in the figure legend.

      (3) Figure 5- In the visual stimulation traces displayed in C you have included a 10-degree scale bar, which looks similar in amplitude to the trace but the text states these are 17-degree amplitude traces.

      We thank the reviewer for noticing this mistake of labeling in the figure. We have corrected the error in the revised figure.

      (4) Figure 6- For the Texas red fluorescence traces and image scales displayed in F, you have shown the responding traces on the right and non-responding on the left, but the figure legend states the amplitude is strong on the left and weak on the right.

      We thank the reviewer for noticing the error in the figure legend text. We have corrected the error in the revised manuscript.

      (5) Figure 6- It would be helpful for the reader if the r value was displayed on the graph in G.

      We thank the reviewer for the suggestion. We have indicated the r value in Figure 6G as the reviewer recommended.

      Reviewer #3 (Recommendations For The Authors):

      Major

      It is unclear to me if the authors are studying vasomotion per se. Vasomotion is an intrinsic, natural rhythm of blood vessel diameter oscillation that is entrained by endogenous rhythmic neural activity. Importantly, if you take neural activity away, the blood vessel (with flow and pressure) should still be capable of oscillating due to an intrinsic mechanism within the vessel wall. In contrast, if one increases neural activity by way of sensory stimulation and blood flow increases, this is the basis of functional hyperemia. If one stimulates the brain over and over again at a particular frequency, it is expected that blood flow will increase whenever neural activity increases to the stimulus, up to a particular frequency until the blood vessel cannot physically track the stimulus fast enough. Functional hyperemia does not depend on an intrinsic oscillator mechanism. It occurs when the brain becomes active above endogenous resting activity due to sensory or motor activity.

      We thank the reviewer for stressing the importance of the distinction between “vasomotion” and functional “hyperemia”.

      We recognized that the terminology used in our paper was not explicitly explained. Traditionally, “vasomotion” is defined as the dilation and constriction of the blood vessels that occurs spontaneously at low frequencies in the 0.1 Hz range without any apparent external stimuli. Sensory-induced changes in the blood flow are usually called “hyperemia”. However, in our paper, we used the term, vasomotion, literally, to indicate both forms of “vascular” “motion”. Therefore, the traditional vasomotion was called “spontaneous vasomotion” and the hyperemia, with both vasoconstriction and vasodilation, induced with slow oscillating visual stimuli was called “visually induced vasomotion”. This distinction in the terminology is now explicitly introduced in the revised manuscript (Introduction, page 3, paragraph 2-3; page 4, paragraph 1-2).

      Using our newly devised methods, we show the presence of “spontaneous vasomotion”. However, this spontaneous vasomotion was often fragmented and did not last long at a specific frequency. With visual stimuli that slowly oscillated at temporal frequencies close to the frequency of spontaneous vasomotion, oscillating hyperemia, or “visually induced vasomotion” was observed. Importantly, this visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex. We also do not know how the synchronized vasomotion can spread throughout the whole brain. Where the plasticity for vasomotion entrainment occurs is also unknown. How much of the visually induced vasomotion relies on the mechanisms of intrinsic spontaneous vasomotion is also undetermined. Discussion about the future directions of understanding the mechanisms of visually induced vasomotion and entrainment is described in better detail in the revised manuscript (Discussions, page 19, paragraph 1).

      To me, one would need to silence the naturally occurring vasomotion to study it. As soon as one activates the brain with an external stimulus, functional hyperemia is being studied. One idea that would be interesting to look at is whether a single or perhaps a double stimulus, in an untrained vs trained mouse, shows vasodilation that occurs across the cortex and in the cerebellum. In other words, is there something special about repeating the signal over and over again that results in brain-wide synchronization, or does a single or double oscillation of the same frequency (0.25Hz) also transiently synchronize the brain? My guess is that a short stimulus would give you the same thing (especially in a trained mouse) and that there is nothing special about oscillating the signal over and over again (except for the learning component).

      We thank the reviewer for the ideas of new experiments to understand whether the visually induced vasomotion shares the same mechanisms for creating spontaneous vasomotion or not.

      We would like to emphasize again that the visually induced vasomotion is not observed in the Novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to the visual stimuli. Entrainment with repeated presentation of visual stimuli is required for this global synchronization phenomenon to occur.

      We would also like to emphasize that, even in Expert animals, the visually induced vasomotion that is frequency-locked to the presented stimulus does not always occur immediately. As shown in Figure 3D lower panel (Figure 3E in the revised figure), the vasomotion did not always immediately frequency-lock. The vasomotion was also not always stable throughout the 15 min of visual stimulation presentation. These characteristics are emphasized in the revised manuscript (Results, page 10, paragraph 1).

      Therefore, we would assume that a single or double frequency of the visual stimulation would not always be sufficient to transiently frequency-lock the visually induced vasomotion.

      An alternative idea is to test frequencies lower than vasomotion. Vasomotion typically oscillates around a wide range of very low frequencies averaging around 0.1Hz, yet here the authors entrain blood vessel oscillations towards the top end of vasomotion, at 0.25Hz. What would happen if the authors tried synchronizing brain activity with 0.025Hz? Would the natural vasomotion frequency still be there, or would it be gone, dominated by the 0.025Hz entrainment?

      We would assume that visually induced vasomotion will not be induced with 0.025 Hz visual stimuli. This is too slow to induce smooth pursuit of the visual stimuli with eye movement. We show that, even if smooth eye pursuit occurs, the visually induced vasomotion may or may not occur (Figure 6F). However, visually induced vasomotion does not largely occur without eye movement. Therefore, the proposed experiment by the reviewer is likely not doable.

      Finally, perhaps the authors can see if there is a long-lasting change in natural vasomotion occurring after the animal has been trained to 0.25Hz. For example, is there greater power in the endogenous fluctuation at either 0.25Hz (or perhaps 0.1Hz) with no visual stimulation given but after the animal has been trained? These ideas would be interesting to test and could help clarify whether this is plasticity in functional hyperemia or plasticity in vasomotion.

      It should also be mentioned that the frequency-locked vasomotion quickly dissipates as soon as the visual stimulation is halted (Figure 3D upper panel, middle). However, we agree with the reviewer that it would be interesting to see whether the fragmentation of the spontaneous vasomotion is observed less in the Trained or Expert mice compared to the Novice mice, to understand whether the entrainment effect would propagate to the properties of the spontaneous vasomotion.

      This issue I have raised is not a fundamental flaw in the paper, it pertains more to the wording, phrasing, and pitch of the paper i.e. is this really entrained and plastic vasomotion? I am skeptical. Nevertheless, I think the authors should try some of these suggestions to better characterize this effect.

      We agree that the phrasing used in the original manuscript was rather confusing, as “vasomotion” normally refers to spontaneous vascular movement. However, functional “hyperemia” may not adequately express the phenomenon that we observe either. The phenomenon that we observe is slowly oscillating vasodilation and vasoconstriction that is induced with visual stimuli with a temporal frequency similar to the spontaneously occurring “vasomotion”. This phenomenon is not a direct hyperemia response to the visual stimuli as it requires entrainment and it spreads globally throughout the whole brain. We revised our manuscript to define the terminology that we use.

      An important question is if neural activity is entraining the CBF responses. The authors should do one experiment in a pan-neural GCaMP line to test if neural activity in the visual cortex (and other areas captured in the widefield microscope) shows a progressive and gradual synchronization (or not) to the vasomotion responses with training. It is possible to do this through a thinned skull window. This important to know if/how synchronized population neural activity scales with training. Perhaps they will not correlate and there is something more subtle going on.

      In our paper, we mainly studied visually induced vasomotion (or visual stimulus-triggered vasomotion). Therefore, visual stimulation must first activate the neurons and, through neurovascular coupling, the initial drive for vasomotion is likely triggered. However, visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex.

      An important point that should be pointed out is that the neuronal visual response in the primary visual cortex could potentially decrease with repeated visual stimulation presentation as the adaptive movement of the eye should decrease the retinal slip. With repeated training sessions, a more static projection of the presented image will likely be shown to the retina. The neurovascular coupling could be enhanced with increased responsiveness of the vascules and vascular-to-vascular coupling could also be potentiated. This argument is now incorporated in the revised manuscript (Discussions, page 19, paragraph 1).

      We agree with the reviewer that, to identify the extent of the neuronal contribution to the vasomotion triggering, whole brain synchronization, and vasomotion entrainment, simultaneous neuronal calcium imaging would be ideal. However, due to the fact that fluorescent Ca2+ indicators expressed in neurons would also be distorted by the “shadow” effect from the vasomotion, exquisite imaging techniques would be required. We recognize this “shadow” effect and we are currently developing methods to take out the “shadow” effect and the intracellular pH fluctuation effect from the fluorescence traces.

      The authors nicely show that plasticity in vasomotion coincides with the mouse learning the HOKR task and that as eye movement tracks the stimulus, CBF gets entrained. However, there could also be a stress effect going on in the early trials, and as the mouse gets used to the procedure and stress comes down, the vasomotion entrainment can be seen. It could be the case that the vasomotion process is there on the first trial, but masked by stress-induced effects on neural and/or vascular activity. I did not see anything in the methods about how the mouse was habituated to head restraint. Was the first visual stim trial the first time the mouse was head restrained? If so, there could be a strong stress effect. The authors should address this either by clarifying that habituation to head restraint was done, or by doing a control experiment where each animal receives at least 1week of progressive and gradual head restraint before doing the same HOKR experiment using multiple trials.

      We agree with the reviewer that stress could well affect spontaneous vasomotion as well as visually induced vasomotion (or visual stimulus-triggered vasomotion). As the reviewer suggested, we could have compared the habituated and non-habituated mice to the initial visually induced vasomotion response. In addition, whether the experimentally induced increase in stress would interfere with the vasomotion or not could also be studied. With the TexasRed experiments, we observed that tail-vein injection stress appeared to interfere with the HOKR learning process. In the experiments presented in Fig. 3, TexasRed was injected before session 1. Vasomotion entrainment likely progressed with sessions 2 and 3 training. Before session 4, TexasRed was injected again to visualize the vasomotion. The vasomotion was clearly observed in session 4, indicating that the stress induced by tail-vein injection could not interfere with the generation of visually induced vasomotion. This argument is included in the revised manuscript (Discussions, page 20, paragraph 2).

      Minor

      The first sentence of the introduction requires citations. It is also a somewhat irrelevant comparison to make.

      Necessary citation was made in the revised manuscript, as the reviewer suggested. We think that describing how the energy is distributed in the brain would provide one of the most important breakthroughs to the understanding of how efficient information processing in the brain works. Therefore, we would like to keep this introduction.

      The third and fourth sentence of the introduction equates vasodilation/vasoconstriction with vasomotion and it is not this simple. Vasomotion is a specific physiological process involving rhythmic changes to artery diameter. Also, the frequency of these slow oscillations needs to be stated. The authors only say they are slower than 10Hz.

      The definition of spontaneous vasomotion with indication of typical temporal frequency is described in the revised manuscript, as the reviewer suggested.

      More than half of the introduction is describing the paper itself, rather than setting the stage for the findings. The authors need a more thorough account of what is known and what is not known in this area. Some of this information is in the discussion, which should be moved up to the intro.

      We have revised the introduction to include the definition of spontaneous vasomotion and visually induced vasomotion or functional hyperemia, as the reviewer suggested.

      In the first paragraph of the results section, the authors should state in what way the mice are awake. Are they freely mobile? Are they head-restrained? Are they resting or moving or doing both at different times? This is clarified later but it should come up front as someone reads through the paper.

      As the reviewer suggested, we clarified that the experiments were done in awake and head-restrained mice within the first paragraph for the Results section.

      The authors say "As shown later, blood vessels on the surface...". There is no need to say "as shown later".

      This is deleted as the reviewer suggested.

      The use of "full width at 10% maximum" of the Texas red intensity for the diameter measure is a little odd, as it may actually overestimate the diameter, but I see what the authors were trying to do. A full-width half max is standard here and that is likely more appropriate. Also, the line profiles of intensity are not raw data. The authors say the trace is strongly filtered/smoothed. If so, this creates a somewhat artificial platform to make the diameter measurement. The authors should show raw data from a single experiment and make the measurement from that. The raw line profile should look almost square, where a full-width half-max would work well.

      Contrary to what the reviewer observed, the raw line profile was not almost square. Even if there were almost no blur in the XY dimension in the optical imaging system, one would not expect to see a square line profile, as the thickness of the vessel increases in the Z dimension towards the center, as this is not a confocal or two-photon microscope image, and an ideal optical section was not created. Therefore, the full-width half-maximum value would definitely be an underestimate of the actual vessel diameter. It may be possible to equate an ideal value for cutoff if we have the 3D point spread function of the imaging. 10% is an arbitrary number but we think 10% is the minimum intensity that we can distinguish from the background intensity fluctuations. We did not attempt to derive the “true” diameter of the vessel and full-width at 10% maximum is just an index of the actual diameter. In most of the manuscript, we only deal with the change of the vessel diameter relative to the basal diameter, therefore, we considered that careful derivation of the absolute diameter estimate is not necessary. This argument is detailed in the Materials and Methods section in the revised manuscript (page 31, paragraph 2).

      The raw line profile before filtering is shown overlaid in Figure 1C, as the reviewer suggested.

      In Figures 1 and 2, state/label what brain region this is.

      The blood vessels between the bregma and lambda on the cortex were observed and described in Figures 1 and 2. This is described in the revised manuscript, as the reviewer suggested.

      Can the authors also show what a vein or venule looks like using their quantification method in Figures 1 and 2? This would be a helpful comparison to a static vein.

      The methods shown in Figures 1 and 2 would not allow us to distinguish between vein and venule in our study. Methods that allow quantification of the relative blood vessel diameter fluctuation due to spontaneous or visually induced vasomotion activities are shown in Figures 1 and 2. Later in the manuscript, the whole intensity fluctuation of TexasRed or autofluorescence in the brain parenchyma is studied, and in this case, no distinction between vein and venules could be made.

      Statements such as this are not necessary: "Later in the manuscript, we will be dealing with vasomotion dynamics observed with the optical fiber photometry methods, in which the blood vessel type under the detection of the fiber could not be identified". Simply talk about this data when you get to it.

      We have deleted this statement in this part of the manuscript, as the reviewer suggested.

      Same as this, please consider deleting: "Spontaneous vasomotion dynamic differences between different classes of blood vessels would be of interest to study using a more sophisticated in vivo two-photon microscope which we do not own." Just describe the data you have from the methods you have. There is no need to lament.

      We deleted this sentence, as the reviewer suggested.

      Figure 3 D the light blue boxes showing the time period of visual stimulation physically overlay with the frequency-time spectrograms. They should not overlay with this graph because it makes them more light blue, distorting the figure which also uses light blue in the heat map.

      Figure 3D was modified, as the reviewer suggested.

      The authors say: "The reason why the vasomotion detected in our system through the intact skull in awake in vivo mice was less periodic was unknown." Yes, but you are imaging an awake mouse. Many spontaneous behaviours such as whisking, grooming, twitching, and struggling will manifest as increased artery diameter. These will be functional hyperemia occurring events on top of rhythmic vasomotion. This can be briefly discussed.

      As the reviewer comments, the vasomotion detected in awake mice was likely to be less periodic because the spontaneous animal behavior induces functional hyperemia and interrupts spontaneous vasomotion. This interpretation was included in the revised manuscript (Results, page 8, paragraph 1).

      The authors say "extremely tuned" on page 8. They should not use words like "extremely". Perhaps say "more strongly tuned" or equivalent.

      We have changed “extremely” to “more strongly”, as the reviewer suggested.

      The authors say "First, the Texas Red fluorescence images were Gaussian filtered in the spatial XY dimension to take out the random noise presumably created within the imaging system." It is inadvisable to alter the raw data in this way unless there is a sound reason to do so. If there is random noise this should not affect the Fast Fourier Transform analysis. If there is regular noise caused by instrumentation artefact, which is picked up by the analysis then perhaps this could be filtered out. A static Texas red sample in a vial can be used to determine if there is artefactual noise.

      We mainly used the Gaussian filter for better presentation of the imaged data. The TexasRed fluorescence was low in intensity and the acquired images were Gaussian filtered in the spatial XY dimesion to reduce the pixelated noise at the expense of spatial resolution reduction. This filter should not affect the temporal frequency of the observed vasomotion. This is now more clearly indicated in the revised manuscript (Results, page 10, paragraph 2).

      There are endogenous fluorescent molecules in cell metabolism that change dynamically to neural activity: NADH, NADPH, and FAD. These are almost certainly a fraction of the auto-fluorescent signal the authors are measuring and it would be expected to see small fluctuations in these metabolites with neural activity. Perhaps this can be discussed, and the authors can likely argue that metabolic signals are much smaller than the change caused by vasodilation.

      We found that the autofluorescence signal was phase-shifted in time relative to the vasomotion, which was visualized with TexasRed. This suggests that these autofluorescence signals have an anti-phase “shadow imaging” component and another component that is phase-shifted in time. Glucose and oxygen are likely to be abundantly delivered during the vasodilation phase compared to the vasoconstriction phase of vasomotion. These molecules will trigger cell metabolism and endogenous fluorescent molecules such as NADH, NADPH, and FAD may increase or decrease with a certain delay, which is required for the chemical reactions to occur. Therefore, the concentration fluctuation of these metabolites could lag in time to the changes in the blood flow. It is also expected that these metabolites may fluctuate according to the neuronal activity that triggers visually induced vasomotion or functional hyperemia. These discussions are added in the revised manuscript (Discussions, page 19, paragraph 2).

      The authors say "however, we found that, if Texas Red had to be injected before every training session, the mouse did not learn very well." This is interesting. Why do the authors suppose this was the case? Stress from the injection? Or perhaps some deleterious effect on blood vessel function caused by the dye itself? Either way, I think this honest statement should remain. Others need to know about it.

      We think that the stress from the injection interferes with the HOKR learning. However, as shown, TexasRed injection after the mouse had learned did not interfere with the eye movement or with the visually induced vasomotion. We do not know whether the injection stress directly interferes with the blood vessel function and affects the plastic vasomotion entrainment. These arguments are now described in the revised manuscript (Discussions, page 20, paragraph 2). The statement above remains as is, as the reviewer suggested.

      YCnano50 is a calcium sensor and not really appropriate for the use employed by the authors. They are exciting YFP at 505nm but unless the authors are using a laser line, there is some bandwidth of excitation light that is likely exciting the CFP too which still absorbs light up to ~490nm. Here, calcium signalling may affect the YFP signal. This can be discussed.

      Multiband-pass filter (Chroma 69008x with the relevant band of 503 nm / 19.5 nm (FWHM)) was used for direct excitation of YFP. Negligible light is passed below 490 nm. CFP excitation above 490 nm is assumed to be negligible and usually not defined in literature. We assume that with our optical system, fluorescence by direct YFP excitation dominates the effect from the minor CFP excitation effect. We explicitly describe this in the revised manuscript (Materials and Methods, page 28, paragraph 2).

      The discussion is interesting but does not actually discuss much of the data or measurements in the paper. Most of the discussion reads more like a topical review, rather than a critical analysis of the effects/measurements and why the authors' interpretations are likely correct. This can be improved.

      As the reviewer suggests, we have improved the discussion by starting with the summary of the results (Discussion, page 19, paragraph 1). We also included the possibility of stress affecting visually induced vasomotion (Discussion, page 20, paragraph 2).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper presents a thoroughly detailed methodology for mesoscale-imaging of extensive areas of the cortex, either from a top or lateral perspective, in behaving mice. While the examples of scientific results to be derived with this method are in the preliminary stages, they offer promising and stimulating insights. Overall, the method and results presented are convincing and will be of interest to neuroscientists focused on cortical processing in rodents.

      Authors’ Response: We thank the reviewers for the helpful and constructive comments. They have helped us plan for significant improvements to our manuscript. Our preliminary response and plans for revision are indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors introduce two preparations for observing large-scale cortical activity in mice during behavior. Alongside this, they present intriguing preliminary findings utilizing these methods. This paper is poised to be an invaluable resource for researchers engaged in extensive cortical recording in behaving mice.

      Strengths:

      -Comprehensive methodological detailing:

      The paper excels in providing an exceptionally detailed description of the methods used. This meticulous documentation includes a step-by-step workflow, complemented by thorough workflow, protocols, and a list of materials in the supplementary materials.

      -Minimal movement artifacts:

      A notable strength of this study is the remarkably low movement artifacts. To further underscore this achievement, a more robust quantification across all subjects, coupled with benchmarking against established tools (such as those from suite2p), would be beneficial.

      Authors’ Response: This is a good suggestion. We have records of the fast-z correction applied by the ScanImage on microscope during acquisition, so we have supplied the online fast-z motion correction .csv files for two example sessions on our GitHub page as supplementary files:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/online_fast_z_correction

      These files correspond to Figure S3b (2367_200214_E210_1) and to Figures 5 and 6 (3056_200924_E235_1). These are now also referenced in the main text. See lines ~595, pg 18 and lines ~762, pg 24.

      We have also made minor revisions to the main text of the manuscript with clear descriptions of methods that we have found important for the minimization of movement artifacts, such as fully tightening all mounting devices, implanting the cranial window with proper, evenly applied pressure across its entire extent, and mounting the mouse so that it is not too close or far from the surface of the running wheel. See Line ~309, pg 10.

      Insightful preliminary data and analysis:

      The preliminary data unveiled in the study reveal interesting heterogeneity in the relationships between neural activity and detailed behavioral features, particularly notable in the lateral cortex. This aspect of the findings is intriguing and suggests avenues for further exploration.

      Weaknesses:

      -Clarification about the extent of the method in the title and text:

      The title of the paper, using the term "pan-cortical," along with certain phrases in the text, may inadvertently suggest that both the top and lateral view preparations are utilized in the same set of mice. To avoid confusion, it should be explicitly stated that the authors employ either the dorsal view (which offers limited access to the lateral ventral regions) or the lateral view (which restricts access to the opposite side of the cortex). For instance, in line 545, the phrase "lateral cortex with our dorsal and side mount preparations" should be revised to "lateral cortex with our dorsal or side mount preparations" for greater clarity.

      Authors’ Response: We have opted to not change the title of the paper, because we feel that adding the qualifier, “in two preparations,” would add unnecessary complexity. In addition, while the dorsal mount preparation allows for imaging of bilateral dorsal cortex, the side mount preparation does indeed allow for imaging of both dorsal and lateral cortex across the right hemisphere (a bit of contralateral dorsal cortex is also imageable), and the design can be easily “flipped” across a mirror-plane to allow for imaging of left dorsal and lateral cortex. Taken together, we do show preparations that allow for pan-cortical 2-photon imaging.

      We do agree that imprecise reference to the two preparations can sometimes lead to confusion. Therefore, we made several small revisions to the manuscript, including at ~line 545, to make it clearer that we used two imaging preparations to generate our combined 2-photon mesoscope dataset, and that each of those two preparations had both benefits and limitations.

      -Comparison with existing methods:

      A more detailed contrast between this method and other published techniques would add value to the paper. Specifically, the lateral view appears somewhat narrower than that described in Esmaeili et al., 2021; a discussion of this comparison would be useful.

      Authors’ Response: The preparation by Esmaeili et al. 2021 has some similarities to, but also differences from, our preparation. Our preliminary reading is that their through-the-skull field of view is approximately the same as our through-the-skull field of view that exists between our first (headpost implantation) and second (window implantation) surgeries for our side mount preparation, although our preparation appears to include more anterior areas both near to and on the contralateral side of the midline. We have compared these preparations more thoroughly in the revised manuscript. (See lines ~278.)

      Furthermore, the number of neurons analyzed seems modest compared to recent papers (50k) - elaborating on this aspect could provide important context for the readers.

      Authors’ response: With respect to the “modest” number of neurons analyzed (between 2000 and 8000 neurons per session for our dorsal and side mount preparations with medians near 4500; See Fig. S2e) we would like to point out that factors such as use of dual-plane imaging or multiple imaging planes, different mouse lines, use of different duration recording sessions (see our Fig S2c), use of different imaging speeds and resolutions (see our Fig S2d), use of different Suite2p run-time parameters, and inclusion of areas with blood vessels and different neuron cell densities, may all impact the count of total analyzed neurons per session. We now mention these various factors and have made clear that we were not, for the purposes of this paper, trying to maximize neuron count at the expense of other factors such as imaging speed and total spatial FOV extent.

      We refer to these issues now briefly in the main text. (See ~line 93, pg 3).

      -Discussion of methodological limitations:

      The limitations inherent to the method, such as the potential behavioral effects of tilting the mouse's head, are not thoroughly examined. A more comprehensive discussion of these limitations would enhance the paper's balance and depth.

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this configuration (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript at ~line 235, pg. 7.

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it is possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the Thorlabs mesoscope objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use ultrasound gel instead (which we found to be, to some degree, optically inferior to water), but without the horizontal light shield, light from the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult under these conditions because the camera would need the same optical access angle as the 2-photon objective, or would need to be moved downward toward the air table and rotated up at an angle of 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      -Preliminary nature of results:

      The results are at a preliminary stage; for example, the B-soid analysis is based on a single mouse, and the validation data are derived from the training data set.

      Authors’ Response: In this methods paper, we have chosen to supply proof of principle examples, without a complete analysis of animal-to-animal variance.

      The B-SOiD analysis that we show in Figure 6 is based on a model trained on 80% of the data from four sessions taken from the same mouse, and then tested on all of a single session from that mouse. Initial attempts to train across sessions from different mice were unsuccessful, probably due to differences in behavioral repertoires across mice. However, we have performed extensive tests with B-SOiD and are confident that these sorts of results are reproducible across mice, although we are not prepared to publish these results at this time.

      We now clarify these points in the main text at ~line 865, pg 27.

      An additional comparison of the results of B-SOiD trained on different numbers of sessions to that of keypoint-MOSEQ (Weinreb et al, 2023, bioRxiv) trained on ~20 sessions can now be found as supplementary material on our GitHub site:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/Figure_SZZ_BSOID_MOSEQ_align.pdf

      The discrepancy between the maps in Figures 5e and 6e might indicate that a significant portion of the map represents noise. An analysis of variability across mice and a method to assign significance to these maps would be beneficial.

      Authors’ Response: After re-examination of the original analysis output files, we have indeed discovered that some of the Rastermap neuron density maps in Figure 6e were incorrectly aligned with their respective qualitative behaviors due to a discrepancy in file numbering between the images in 6e and the ensembles identified in 6c (each time that Rastermap is run on the same data, at least with the older version available at the time of creation of these figures, the order of the ensembles on the y-axis changes and thus the numbering of the ensembles would change even though the neuron identities within each group stayed the same for a given set of parameters).

      This unfortunate panel alignment / graphical display error present in the original reviewed preprint has been fixed in the current, updated figure (i.e. twitch corresponds to Rastermap groups 2 and 3, whisk to group 6, walk to groups 5 and 4, and oscillate to groups 0 and 1), and in the main text at ~line 925, pg 29. We have also changed the figure legend, which also contained accurate but misaligned information, for Figure 6e to reflect this correction.

      One can now see that, because the data from both figures is from the same session in the same mouse, as you correctly point out, Fig 5d left (walk and whisk) corresponds roughly to Fig 6e group R7, “walk”, and that Fig 5d right (whisk) corresponds roughly to Fig 6e group R4, “twitch”.

      We have double-checked the identity of other CCF map displays of Rastermap neuron density and of mean correlations between neural activity and behavioral primitives in all other figures, and we found no other such alignment or mis-labeling errors.

      We have also added a caveat in the main text at ~lines 925-940, pg. 30, pointing out the preliminary nature of these findings, which are shown here as an example of the viability of the methods. Analysis of the variability of Rastermap alignments across sessions is beyond the scope of the current paper, although it is an issue that we hope to address in upcoming analysis papers.

      -Analysis details:

      More comprehensive details on the analysis would be beneficial for replicability and deeper understanding. For instance, the statement "Rigid and non-rigid motion correction were performed in Suite2p" could be expanded with a brief explanation of the underlying principles, such as phase correlation, to provide readers with a better grasp of the methodologies employed.

      Authors’ Response: We added a brief explanation of Suite2p motion correction at ~line 136, pg 4. We have also added additional details concerning CCF / MMM alignment and other analysis issues. In general we cite other papers where possible to avoid repeating details of analysis methods that are already published.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a comprehensive technical overview of the challenging acquisition of large-scale cortical activity, including surgical procedures and custom 3D-printed headbar designs to obtain neural activity from large parts of the dorsal or lateral neocortex. They then describe technical adjustments for stable head fixation, light shielding, and noise insulation in a 2-photon mesoscope and provide a workflow for multisensory mapping and alignment of the obtained large-scale neural data sets in the Allen CCF framework. Lastly, they show different analytical approaches to relate single-cell activity from various cortical areas to spontaneous activity by using visualization and clustering tools, such as Rastermap, PCA-based cell sorting, and B-SOID behavioral motif detection.

      Authors’ Response: Thank you for this excellent summary of the scope of our paper.

      The study contains a lot of useful technical information that should be of interest to the field. It tackles a timely problem that an increasing number of labs will be facing as recent technical advances allow the activity measurement of an increasing number of neurons across multiple areas in awake mice. Since the acquisition of cortical data with a large field of view in awake animals poses unique experimental challenges, the provided information could be very helpful to promote standard workflows for data acquisition and analysis and push the field forward.

      Authors’ Response: We very much support the idea that our work here will contribute to the development of standard workflows across the field including those for multiple approaches to large-scale neural recordings.

      Strengths:

      The proposed methodology is technically sound and the authors provide convincing data to suggest that they successfully solved various problems, such as motion artifacts or high-frequency noise emissions, during 2-photon imaging. Overall, the authors achieved their goal of demonstrating a comprehensive approach for the imaging of neural data across many cortical areas and providing several examples that demonstrate the validity of their methods and recapitulate and further extend some recent findings in the field.

      Weaknesses:

      Most of the descriptions are quite focused on a specific acquisition system, the Thorlabs Mesoscope, and the manuscript is in part highly technical making it harder to understand the motivation and reasoning behind some of the proposed implementations. A revised version would benefit from a more general description of common problems and the thought process behind the proposed solutions to broaden the impact of the work and make it more accessible for labs that do not have access to a Thorlabs mesoscope. A better introduction of some of the specific issues would also promote the development of other solutions in labs that are just starting to use similar tools.

      Authors’ Response: We have edited the motivations behind the study to clarify the general problems that are being addressed. However, as the 2-photon imaging component of these experiments were performed on a Thorlabs mesoscope, the imaging details necessarily deal specifically with this system.

      We briefly compare the methods and results from our Thorlabs system to that of Diesel-2p, another comparable system, based on what we have been able to glean from the literature on its strengths and weaknesses. See ~lines 206-213, pg 6.

      Reviewer #3 (Public Review):

      Summary

      In their manuscript, Vickers and McCormick have demonstrated the potential of leveraging mesoscale two-photon calcium imaging data to unravel complex behavioural motifs in mice. Particularly commendable is their dedication to providing detailed surgical preparations and corresponding design files, a contribution that will greatly benefit the broader neuroscience community as a whole. The quality of the data is high, but it is not clear whether this is available to the community, some datasets should be deposited. More importantly, the authors have acquired activity-clustered neural ensembles at an unprecedented spatial scale to further correlate with high-level behaviour motifs identified by B-SOiD. Such an advancement marks a significant contribution to the field. While the manuscript is comprehensive and the analytical strategy proposed is promising, some technical aspects warrant further clarification. Overall, the authors have presented an invaluable and innovative approach, effectively laying a solid foundation for future research in correlating large-scale neural ensembles with behaviour. The implementation of a custom sound insulator for the scanner is a great idea and should be something implemented by others.

      Authors’ Response: Thank you for the kind words.

      We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      The data is located here: https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with in-depth analysis papers that are currently in preparation.

      This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other. This is described in the methods, but a visual representation would greatly benefit the readers looking to implement something similar.

      Authors’ Response: This is an excellent suggestion. We have included a workflow diagram in the revised manuscript, in the form of a 3-part figure, for the methods (a), data collection (b and c), and analysis (d). This supplementary figure is now located on the GitHub page at the following link:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/pancortical_workflow_diagrams.pdf

      We now reference this figure on ~lines 190-192, pg 6 of the main text, near the beginning of the Results section.

      The authors should cite sources for the claims stated in lines 449-453 and cite the claim of the mouse's hearing threshold mentioned in lines 463.

      Authors’ Response: For the claim stated in lines 449-453:

      “The unattenuated or native high-frequency background noise generated by the resonant scanner causes stress to both mice and experimenters, and can prevent mice from achieving maximum performance in auditory mapping, spontaneous activity sessions, auditory stimulus detection, and auditory discrimination sessions/tasks”

      ,we can provide the following references: (i) for mice: Sadananda et al, 2008 (“Playback of 22-kHz and 50-kHz ultrasonic vocalizations induces differential c-fos expression in rat brain”, Neuroscience Letters, Vol 435, Issue 1, p 17-23), and (ii) for humans: Fletcher et al, 2018 (“Effects of very high-frequency sound and ultrasound on humans. Part I: Adverse symptoms after exposure to audible very-high frequency sound”, J Acoust Soc A, 144, 2511-2520). We will include these references in the revised paper.

      For the claim stated on line 463:

      “i.e. below the mouse hearing threshold at 12.5 kHz of roughly 15 dB”

      ,we can provide the following reference: Zheng et al, 1999 (“Assessment of hearing in 80 inbred strains of mice by ABR threshold analyses”, Vol 130, Issues 1-2, p 94-107).

      We have included these two new references in the new, revised version of our paper. Thank you for identifying these citation omissions.

      No stats for the results shown in Figure 6e, it would be useful to know which of these neural densities for all areas show a clear statistical significance across all the behaviors.

      Authors’ Response: It would be useful if we could provide a statistic similar to what we provide for Fig. S6c and f, in which for each CCF area we compare the observed mean correlation values to a null of 0, or, in this case, the population densities of each Rastermap group within each CCF area to a null value equal to the total number of CCF areas divided by the total number of recorded neurons for that group (i.e. a Rastermap group with 500 neurons evenly distributed across ~30 CCF areas would contain ~17 neurons, or ~3.3% density, per CCF area.) Our current figure legend states the maximums of the scale bar look-up values (reds) for each group, which range from ~8% to 32%.

      However, because the data in panel 6e are from a single session and are being provided as an example of our methods and not for the purpose of claiming a specific result at this point, we choose not to report statistics. It is worth pointing out, perhaps, that Rastermap group densities for a given CCF area close to 3.3% are likely not different from chance, and those closer to ~40%, which is our highest density (for area M2 in Rastermap group 7, which corresponds to the qualitative behavior “walk”), are most likely not due to chance. Without analysis of multiple sessions from the same mouse we believe that making a clear statement of significance for this likelihood would be premature.

      We now clarify this decision and related considerations in the main text at ~line 920, pg 29.

      While I understand that this is a methods paper, it seems like the authors are aware of the literature surrounding large neuronal recordings during mouse behavior. Indeed, in lines 178-179, the authors mention how a significant portion of the variance in neural activity can be attributed to changes in "arousal or self-directed movement even during spontaneous behavior." Why then did the authors not make an attempt at a simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc). These models are straightforward to implement, and indeed it would benefit this work if the model extracts information on par with what is known from the literature.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the current methods paper. We are following up with an in depth analysis of neural activity and corresponding behavior across the cortex during spontaneous and trained behaviors, but this analysis goes well beyond the scope of the present manuscript.

      Here, we prefer to present examples of the types of results that can be expected to be obtained using our methods, and how these results compare with those obtained by others in the field.

      Specific strengths and weaknesses with areas to improve:

      The paper should include an overall cartoon diagram that indicates how the various modules are linked together for the sampling of both behaviour and mesoscale GCAMP. This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other.

      Authors’ Response: This is an excellent suggestion. We have included a workflow diagram in the revised manuscript, in the form of a 3-part figure, for the methods (a), data collection (b and c), and analysis (c). This supplementary figure is now located on the GitHub page at the following link:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/pancortical_workflow_diagrams.pdf

      The paper contains many important results regarding correlations between behaviour and activity motifs on both the cellular and regional scales. There is a lot of data and it is difficult to draw out new concepts. It might be useful for readers to have an overall figure discussing various results and how they are linked to pupil movement and brain activity. A simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc) may help in this regard.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the present methods paper. Such an analysis is a significant undertaking with such large and heterogeneous datasets, and we provide proof-of-principle data here so that the reader can understand the type of data that one can expect to obtain using our methods. We will provide a more complete analysis of data obtained using our methodology in the near future in another manuscript.

      Previously, widefield imaging methods have been employed to describe regional activity motifs that correlate with known intracortical projections. Within the authors' data it would be interesting to perhaps describe how these two different methods are interrelated -they do collect both datasets. Surprisingly, such macroscale patterns are not immediately obvious from the authors' data. Some of this may be related to the scaling of correlation patterns or other factors. Perhaps there still isn't enough data to readily see these and it is too sparse.

      Authors’ Response: Unfortunately, we are unable to directly compare 1-photon widefield GCaMP6s activity with mesoscope 2-photon GCaMP6s activity. During widefield data acquisition, animals were stimulated with visual, auditory, or somatosensory stimuli (i.e. “passive sensory stimulation”), while 2-photon mesoscope data collection occurred during spontaneous changes in behavioral state, without sensory stimulation. The suggested comparison is, indeed, an interesting project for the future.

      In lines 71-71, the authors described some disadvantages of one-photon widefield imaging including the inability to achieve single-cell resolution. However, this is not true. In recent years, the combination of better surgical preparations, camera sensors, and genetically encoded calcium indicators has enabled the acquisition of single-cell data even using one-photon widefield imaging methods. These methods include miniscopes (Cai et al., 2016), multi-camera arrays (Hope et al., 2023), and spinning disks (Xie et al., 2023).

      Cai, Denise J., et al. "A shared neural ensemble links distinct contextual memories encoded close in time." Nature 534.7605 (2016): 115-118.

      Hope, James, et al. "Brain-wide neural recordings in mice navigating physical spaces enabled by a cranial exoskeleton." bioRxiv (2023).

      Xie, Hao, et al. "Multifocal fluorescence video-rate imaging of centimetre-wide arbitrarily shaped brain surfaces at micrometric resolution." Nature Biomedical Engineering (2023): 1-14.

      Authors’ Response: We have corrected these statements and incorporated these and other relevant references. There are advantages and disadvantages to each chosen technique, such as ease of use, field of view, accuracy, and speed. We will reference the papers you mention without an extensive literature review, but we would like to emphasize the following points:

      Even the best one-photon imaging techniques typically have ~10-20 micrometer resolution in xy (we image at 5 micrometer resolution for our large FOV configuration, but the xy point-spread function for the Thorlabs mesoscope is 0.61 x 0.61 micrometers in xy with 970 nm excitation) and undefined z-resolution (4.25 micrometers for Thorlabs mesoscope). A coarser resolution increases the likelihood that activity related fluorescence from neighboring cells may contaminate the fluorescence observed from imaged neurons. Reducing the FOV and using sparse expression of the indicator lessens this overlap problem.

      We do appreciate these recent advances, however, particularly for use in cases where more rapid imaging is desired over a large field of view (CCD acquisition can be much faster than that of standard 2-photon galvo-galvo or even galvo-resonant scanning, as the Thorlabs mesoscope uses). This being said, there are few currently available genetically encoded Ca2+ sensors that are able to measure fluctuations faster than ~10 Hz, which is a speed achievable on the Thorlabs 2-photon mesoscope with our techniques using the “small, multiple FOV” method (Fig. S2d, e).

      We have further clarified our discussion of these issues in the main text at ~lines 76-80, pg 2.

      The authors' claim of achieving optical clarity for up to 150 days post-surgery with their modified crystal skull approach is significantly longer than the 8 weeks (approximately 56 days) reported in the original study by Kim et al. (2016). Since surgical preparations are an integral part of the manuscript, it may be helpful to provide more details to address the feasibility and reliability of the preparation in chronic studies. A series of images documenting the progression optical quality of the window would offer valuable insight.

      Authors’ Response: As you suggest, we now include brief supplementary material demonstrating the changes in the window preparation that we observed over the prolonged time periods of our study, for both the dorsal and side mount preparations. The following link to this material is now referenced at ~line 287, pg 9, and at the end of Fig S1:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/window_preparation_stability.pdf

      We have also included brief additional details in the main text that we found were useful for facilitating long term use of these preparations. These are located at ~line 287-290, pg 9.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Sharing raw data and code:

      I strongly encourage sharing some of the raw data from your experiments and all the code used for data analysis (e.g. in a github repository). This would help the reader evaluate data quality, and reproduce your results.

      Authors’ Response: We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

      The data is located here: https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

      Our existing GitHub repository, already referenced in the paper, is located here:

      https://github.com/vickerse1/mesoscope_spontaneous

      We have added an additional reference in the main text to the existence of these publicly available resources, including the appropriate links, located at ~lines 190-200, pg 6.

      (2) Use of proprietary software:

      The reliance on proprietary tools like LabView and Matlab could be a limitation for some researchers, given the associated costs and accessibility issues. If possible, consider incorporating or suggesting alternatives that are open-source, to make your methodology more accessible to a broader range of researchers, including those with limited resources.

      Authors’ Response: We are reluctant to recommend open source software that we have not thoroughly tested ourselves. However, we will mention, when appropriate, possible options for the reader to consider.

      Although LabView is proprietary and can be difficult to code, it is particularly useful when used in combination with National Instruments hardware. ScanImage in use with the Thorlabs mesoscope uses National Instruments hardware, and it is convenient to maintain hardware standards across the integrated rig/experimental system. Labview is also useful because it comes with a huge library of device drivers that makes addition of new hardware from basically any source very convenient.

      That being said, there are open source alternatives that could conceivably be used to replace parts of our system. One example is AutoPilot (author: Jonny Saunders), for control of behavioral data acquisition: https://open-neuroscience.com/post/autopilot/.

      We are not aware of an alternative to Matlab for control of ScanImage, which is the supported control software for the ThorLabs 2-photon mesoscope.

      Most of our processing and analysis code (see GitHub page: https://github.com/vickerse1/mesoscope_spontaneous) is in Python, but some of the code that we currently use remains in Matlab form. Certainly, this could be re-written as Python code. However, we feel like this is outside the scope of the current paper. We have provided commenting to all code in an attempt to aid users in translating it to other languages, if they so desire.

      (3) Quantifying the effect of tilted head:

      To address the potential impact of tilting the mouse's head on your findings, a quantitative analysis of any systematic differences in the behavior (e.g. Bsoid motifs) could be illuminating.

      Authors’ Response: We have performed DeepLabCut analysis of all sessions from both preparations, across several iterations with different parameters, to extract pose estimates, and we have also performed BSOiD of these sessions. We did not find any obvious qualitative differences in the number of behavioral motifs identified, the dwell times of these motifs, and similar issues, relating to the issue of tilting of the mouse’s head in the side mount preparation. We also did not find any obvious differences in the relative frequencies of high level qualitative behaviors, such as the ones referred to in Fig. 6, between the two preparations.

      Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this configuration (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript. (See ~line 235, pg. 7)

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the Thorlabs mesoscope objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      (4) Clarification in the discussion section:

      The paragraph titled "Advantages and disadvantages of our approach" seems to diverge into discussing future directions, rather than focusing on the intended topic. I suggest revisiting this section to ensure that it accurately reflects the strengths and limitations of your approach.

      Authors’ Response: We agree with the reviewer that this section included several potential next steps or solutions for each advantage and disadvantage, which the reviewer refers to as “future directions” and are thus arguably beyond the scope of this section. Therefore we have retitled this section as, “Advantages and disadvantages of our approach (with potential solutions):”.

      Although we believe this to be a logical organization, and we already include a section focused purely on future directions in the Discussion section, we have refocused each paragraph of the advantages/disadvantages subsection to concentrate on the advantages and disadvantages per se. In addition, we have made minor changes to the “future directions” section to make it more succinct and practical. These changes can be found at lines ~1016-1077, pg 33-34.

      Reviewer #2 (Recommendations For The Authors):

      Below are some more detailed points that will hopefully help to further improve the quality and scope of the manuscript.

      • While it is certainly favorable for many questions to measure large-scale activity from many brain regions, the introduction appears to suggest that this is a prerequisite to understanding multimodal decision-making. This is based on the argument that combining multiple recordings with movement indicators will 'necessarily obscure the true spatial correlation structures'. However, I don't understand why this is the case or what is meant by 'true spatial correlation structures'. Aren't there many earlier studies that provided important insights from individual cortical areas? It would be helpful to improve the writing to make this argument clearer.

      Authors’ Response: The reviewer makes an excellent point and we have re-worded the manuscript appropriately, to reflect the following clarifications. These changes can be found at ~lines 58-71, pg. 2.

      We believe you are referring to the following passage from the introduction:

      “Furthermore, the arousal dependence of membrane potential across cortical areas has been shown to be diverse and predictable by a temporally filtered readout of pupil diameter and walking speed (Shimoaka et al, 2018). This makes simultaneous recording of multiple cortical areas essential for comparison of the dependence of their neural activity on arousal/movement, because combining multiple recording sessions with pupil dilations and walking bouts of different durations will necessarily obscure the true spatial correlation structures.”

      Here, we do not mean to imply that earlier studies of individual cortical areas are of no value. This argument is provided as an example, of which there are others, of the idea that, for sequences or distributed encoding schemes that simultaneously span many cortical areas that are too far apart to be simultaneously imaged under conventional 2-photon imaging, or are too sparse to be discovered with 1-photon widefield imaging, there are some advantages of our new methods over conventional imaging methods that will allow for truly novel scientific analyses and insights.

      The general idea of the present example, based on the findings of Shimoaka et al, 2018, is that it is not possible to directly combine and/or compare the correlations between behavior and neural activity across regions that were imaged in separate sessions, because the correlations between behavior and neural activity in each region appear to depend on the exact time since the behavior began (Shimoaka et al, 2018), in a manner that differs across regions. So, for example, if one were to record from visual cortex in one session with mostly brief walk bouts, and then from somatosensory cortex in a second session with mostly long walk bouts, any inferred difference between the encoding of walk speed in neural activity between the two areas would run the risk of being contaminated by the “temporal filtering” effect shown in Shimoaka et al, 2018. However, this would not be the case in our recordings, because the distribution of behavior durations corresponding to our recorded neural activity across areas will be exactly the same, because they were recorded simultaneously.

      • The text describes different timescales of neural activity but is an imaging rate of 3 Hz fast enough to be seen as operating at the temporal dynamics of the behavior? It appears to me that the sampling rate will impose a hard limit on the speed of correlations that can be observed across regions. While this might be appropriate for relatively slow behaviors and spontaneous fluctuations in arousal, sensory processing and decision formation likely operate on faster time scales below 100ms which would even be problematic at 10 Hz which is proposed as the ideal imaging speed in the manuscript.

      Authors’ Response: Imaging rate is always a concern and the limitations of this have been discussed in other manuscripts. We will remind the reader of these limitations, which must always be kept in mind when interpreting fluorescence based neural activity data.

      Previous studies imaging on a comparable yet more limited spatial scale (Stringer et al, 2019) used an imaging speed of ~1 Hz. With this in view, our work represents an advance both in spatial extent of imaged cortex and in imaging speed. Specifically, we believe that ~1 Hz imaging may be sufficient to capture flip/flop type transitions between low and high arousal states that persist in general for seconds to tens of seconds, and that ~3-5 Hz imaging likely provides additional information about encoding of spontaneous movements and behavioral syllables/motifs.

      Indeed, even 10 Hz imaging would not be fast enough to capture the detailed dynamics of sensory processing and decision formation, although these speeds are likely sufficient to capture “stable” encodings of sensory representations and decisions that must be maintained during a task, for example with delayed match-to-sample tasks.

      In general we are further developing our preparations to allow us to perform simultaneous widefield imaging and Neuropixels recordings, and to perform simultaneous 1.2 x 1.2 mm 2-photon imaging and visually guided patch clamp recordings.

      Both of these techniques will allow us to combine information across both the slow and fast timescales that you refer to in your question.

      We have clarified these points in the Introduction and Discussion sections, at ~lines ~93-105, pg 3, and ~lines 979-983, pg 31 and ~lines 1039-1045, pg 33, respectively.

      • The dorsal mount is very close to the crystal skull paper and it was ultimately not clear to me if there are still important differences aside from the headbar design that a reader should be aware of. If they exist, it would be helpful to make these distinctions a bit clearer. Also, the sea shell implants from Ghanbari et al in 2019 would be an important additional reference here.

      Authors’ Response: We have added brief references to these issues in our revised manuscript at ~lines 89-97, pg 3:

      Although our dorsal mount preparation is based on the “crystal skull paper” (Kim et al, 2016), which we reference, the addition of a novel 3-D printable titanium headpost, support arms, light shields, and modifications to the surgical protocols and CCF alignment represent significant advances that made this preparation useable for pan-cortical imaging using the Thorlabs mesoscope. In fact, we were in direct communication with Cris Niell, a UO professor and co-author on the original Kim et al, 2016 paper, during the initial development of our preparation, and he and members of his lab consulted with us in an ongoing manner to learn from our successful headpost and other hardware developments. Furthermore, all of our innovations for data acquisition, imaging, and analysis apply equally to both our dorsal mount and side mount preparations.

      Thank you for mentioning the Ghanbari et al, 2019 paper on the transparent polymer skull method, “See Shells.” We were in fact not aware of this study. However, it should be noted that their preparation seems to, like the crystal skull preparation and our dorsal mount preparation, be limited to bilateral dorsal cortex and not to include, as does our cranial window side mount preparation and the through-the-skull widefield preparation of Esmaeili et al, 2021, a fuller range of lateral cortical areas, including primary auditory cortex.

      • When using the lateral mount, rotating the objective, rather than the animal, appears to be preferable to reduce the stress on the animal. I also worry that the rather severe head tilt could be an issue when training animals in more complex behaviors and would introduce an asymmetry between the hemispheres due to the tilted body position. Is there a strong reason why the authors used water instead of an imaging gel to resolve the issue with the meniscus?

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this situation (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript. (See ~line 235, pg. 7)

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      • In parts, the description of the methods is very specific to the Thorlabs mesoscope which makes it harder to understand the general design choices and challenges for readers that are unfamiliar with that system. Since the Mesoscope is very expensive and therefore unavailable to many labs in the field, I think it would increase the reach of the manuscript to adjust the writing to be less specific for that system but instead provide general guidance that could also be helpful for other systems. For example (but not exclusively) lines 231-234 or lines 371 and below are very Thorlabs-specific.

      Authors’ Response: We have revised the manuscript so that it is more generally applicable to mesoscopic methods.

      We will make revisions as you suggest where possible, although we have limited experience with the other imaging systems that we believe you are referring to. However, please note that we already mentioned at least one other comparable system in the original eLife reviewed pre-print (Diesel 2p, line 209; Yu and Smith, 2021).

      Here are a couple of examples of how we have broadened our description:

      (1) On lines ~231-234, pg 7, we write:

      “However, if needed, the objective of the Thorlabs mesoscope may be rotated laterally up to +20 degrees for direct access to more ventral cortical areas, for example if one wants to use a smaller, flat cortical window that requires the objective to be positioned orthogonally to the target region.”

      Here have modified this to indicate that one may in general rotate their objective lens if their system allows it. Some systems, such as the Thorlabs Bergamo microscope and the Sutter MOM system, allow more than 20 degrees of rotation.

      (2) On line ~371, pg 11, we write:

      “This technique required several modifications of the auxiliary light-paths of the Thorlabs mesoscope”

      Here, we have changed the writing to be more general such as “may require…of one’s microscope.”

      Thank you for these valuable suggestions.

      • Lines 287-299: Could the authors quantify the variation in imaging depth, for example by quantifying to which extent the imaging depth has to be adjusted to obtain the position of the cortical surface across cortical areas? Given that curvature is a significant challenge in this preparation this would be useful information and could either show that this issue is largely resolved or to what extent it might still be a concern for the interpretation of the obtained results. How large were the required nominal corrections across imaging sites?

      Authors’ Response: This information was provided previously (lines 297-299):

      “In cases where we imaged multiple small ROIs, nominal imaging depth was adjusted in an attempt to maintain a constant relative cortical layer depth (i.e. depth below the pial surface; ~200 micrometer offset due to brain curvature over 2.5 mm of mediolateral distance, symmetric across the center axis of the window).”

      This statement is based on a qualitative assessment of cortical depth based on neuron size and shape, the density of neurons in a given volume of cortex, the size and shape of blood vessels, and known cortical layer depths across regions. A ground-truth measurement of this depth error is beyond the scope of the present study. However, we do specify the type of glass, thickness, and curvature that we use, and the field curvature characterization of the Thorlabs mesoscope is given in Fig. 6 of the Sofroniew et al, 2016 eLife paper.

      In addition, we have provided some documentation of online fast-z correction parameters on our GitHub page at:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/online_fast_z_correction

      ,and some additional relevant documentation can be found in our publicly available data repository on FigShare+ at: https://doi.org/10.25452/figshare.plus.c.7052513

      • Given the size of the implant and the subsequent work attachments, I wonder to which extent the field of view of the animal is obstructed. Did the authors perform receptive field mapping or some other technique that can estimate the size of the animals' remaining field of view?

      Authors’ Response: The left eye is pointed down ~22.5 degrees, but we position the mouse near the left edge of the wheel to minimize the degree to which this limits their field of view. One may view our Fig. 1 and Suppl Movies 1 and 6 to see that the eyes on the left and right sides are unobstructed by the headpost, light shields, and support arms. However, other components of the experimental setup, such as the speaker, cameras, etc. can restrict a few small portions of the visual field, depending on their exact positioning.

      The facts that mice responded to left side visual stimuli in preliminary recordings during our multimodal 2-AFC task, and that the unobstructed left and right camera views, along with pupillometry recordings, showed that a significant portion of the mouse’s field of view, from either side, remains intact in our preparation.

      We have clarified these points in the text at ~lines 344-346, pg. 11.

      • Line 361: What does movie S7 show in this context? The movie seems to emphasize that the observed calcium dynamics are not driven by movement dynamics but it is not clear to me how this relates to the stimulation of PV neurons. The neural dynamics in the example cell are also not very clear. It would be helpful if this paragraph would contain some introduction/motivation for the optogenetic stimulation as it comes a bit out of the blue.

      Authors’ Response: This result was presented for two reasons.

      First, we showed it as a control for movement artifacts, since inhibition of neural activity enhances the relative prominence of non-activity dependent fluorescence that is used to examine the amplitude of movement-related changes in non-activity dependent fluorescence (e.g. movement artifacts). We have included a reference to this point at ~lines 587-588, pg 18.

      Second, we showed it as a demonstration of how one may combine optogenetics with imaging in mesoscopic 2-P imaging. References to this point were already present in the original version of the manuscript (the eLife “ reviewed preprint”).

      • Lines 362-370: This paragraph and some of the following text are quite technical and would benefit from a better description and motivation of the general workflow. I have trouble following what exactly is done here. Are the authors using an online method to identify the CCF location of the 2p imaging based on the vessel pattern? Why is it important to do this during the experiment? Wouldn't it be sufficient to identify the areas of interest based on the vessel pattern beforehand and then adjust the 2p acquisition accordingly? Why are they using a dial, shutter, and foot pedal and how does this relate to the working distance of the objective? Does the 'standardized cortical map' refer to the Allen common coordinate framework?

      Authors’ Response: We have revised this section to make it more clear.

      Currently, the general introduction to this section appears in lines 349-361. Starting in line 362, we currently present the technical considerations needed to implement the overall goals stated in that first paragraph of this section.

      In general we use a post-hoc analysis step to confirm the location of neurons recorded with 2-photon imaging. We use “online” juxtaposition of the multimodal map image with overlaid CCF with the 2-photon image by opening these two images next to each other on the ScanImage computer and matching the vasculature patterns “by eye”. We have made this more clear in the text so that the interested reader can more readily implement our methods.

      By use of the phrase “standardized cortical map” in this context, we meant to point out that we had not decided a priori to use the Allen CCF v3.0 when we started working on these issues.

      • Does Fig. 2c show an example of the online alignment between widefield and 2p data? I was confused here since the use of suite2p suggests that this was done post-recording. I generally didn't understand why the user needed to switch back and forth between the two modes. Doesn't the 2p image show the vessels already? Also, why was an additional motorized dichroic to switch between widefield and 2p view needed? Isn't this the standard in most microscopes (including the Thorlabs scopes)?

      Authors’ Response: We have explained this methodology more clearly in the revised manuscript, both at ~lines 485-500, pg 15-16, and ~lines 534-540, pg 17.

      The motorized dichroic we used replaced the motorized mirror that comes with the Thorlabs mesoscope. We switched to a dichroic to allow for near-simultaneous optogenetic stimulation with 470 nm blue light and 2-photon imaging, so that we would not have to move the mirror back and forth during live data acquisition (it takes a few seconds and makes an audible noise that we wanted to avoid).

      Figure 2c shows an overview of our two step “offline” alignment process. The image at the right in the bottom row labeled “2” is a map of recorded neurons from suite2p, determined post-hoc or after imaging. In Fig. 2d we show what the CCF map looks like when it’s overlaid on the neurons from a single suite2p session, using our alignment techniques. Indeed, this image is created post-hoc and not during imaging. In practice, “online” during imaging, we would have the image at left in the bottom row of Fig. 2c (i.e. the multimodal map image overlaid onto an image of the vasculature also acquired on the widefield rig, with the 22.5 degree rotated CCF map aligned to it based on the location of sensory responses) rotated 90 degrees to the left and flipped over a horizontal mirror plane so that its alignment matches that of the “online” 2-photon acquisition image and is zoomed to the same scale factor. Then, we would navigate based on vasculature patterns “by-eye” to the desired CCF areas, and confirm our successful 2-photon targeting of predetermined regions with our post-hoc analysis.

      • Why is the widefield imaging done through the skull under anesthesia? Would it not be easier to image through the final window when mice have recovered? Is the mapping needed for accurate window placement?

      Authors’ Response: The headpost and window surgeries are done 3-7 days apart to increase success rate and modularize the workflow. Multimodal mapping by widefield imaging is done through the skull between these two surgeries for two major reasons. First, to make efficient use of the time between surgeries. Second, to allow us to compare the multimodal maps to skull landmarks, such as bregma and lambda, for improved alignment to the CCF.

      Anesthesia was applied to prevent state changes and movements of the mouse, which can produce large, undesired effects on neural responses in primary sensory cortices in the context of these mapping experiments. We sometimes re-imaged multimodal maps on the widefield microscope through the window, roughly every 30-60 days or whenever/if significant changes in vasculature pattern became apparent.

      We have clarified these points in the main text at ~lines 510-522, pg 20-21, and we added a link to our new supplementary material documenting the changes observed in the window preparation over time:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/window_preparation_stability.pdf

      Thank you for these questions.

      • Lines 445 and below: Reducing the noise from resonant scanners is also very relevant for many other 2p experiments so it would be helpful to provide more general guidance on how to resolve this problem. Is the provided solution only applicable to the Thorlabs mesoscope? How hard would it be to adjust the authors' noise shield to other microscopes? I generally did not find many additional details on the Github repo and think readers would benefit from a more general explanation here.

      Authors’ Response: Our revised Github repository has been modified to include more details, including both diagrams and text descriptions of the sound baffle, respectively:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_for_noise_reduction_on_resonant_scanner_devices.pdf

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_methodology_summary.pdf

      However, we can not presently disclose our confidential provisional patent application. Complete design information will likely be available in early 2025 when our full utility patent application is filed.

      With respect to your question, yes, this technique is adaptable to any resonant scanner, or, for that matter, any complicated 3D surface that emits sound. We first 3D scan the surface, and then we reverse engineer a solid that fully encapsulates the surface and can be easily assembled in parts with bolts and interior foam that allow for a tight fit, in order to nearly completely block all emitted sound.

      It is this adaptability that has prompted us to apply for a full patent, as we believe this technique will be quite valuable as it may apply to a potentially large number of applications, starting with 2-photon resonant scanners but possibly moving on to other devices that emit unwanted sound.

      • Does line 458 suggest that the authors had to perform a 3D scan of the components to create the noise reduction shield? If so, how was this done? I don't understand the connection between 3D scanning and printing that is mentioned in lines 464-466.

      Authors’ Response: We do not want to release full details of the methodology until the full utility patent application has been submitted. However, we have now included a simplified text description of the process on our GitHub page and included a corresponding link in the main text:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_methodology_summary.pdf

      We also clarified in the main text, at the location that you indicate, why the 3D scanning is a critical part of our novel 3D-design, printing, and assembly protocol.

      • Lines 468 and below: Why is it important to align single-cell data to cortical areas 'directly on the 2-photon microscope'? Is this different from the alignment discussed in the paragraph above? Why not focus on data interpretation after data acquisition? I understand the need to align neural data to cortical areas in general, I'm just confused about the 'on the fly' aspect here and why it seems to be broken out into two separate paragraphs. It seems as if the text in line 485 and below could also be placed earlier in the text to improve clarity.

      Authors’ Response: Here by “such mapping is not routinely possible directly on the 2-photon mesoscope” what we mean is that it is not possible to do multimodal mapping directly on the mesoscope - it needs to be done on the widefield imaging rig (a separate microscope). Then, the CCF is mapped onto the widefield multimodal map, which is overlaid on an image of the vasculature (and sometimes also the skull) that was also acquired on the widefield imaging rig, and the vasculature is used as a sort of Rosetta Stone to co-align the 2-photon image to the multimodal map and then, by a sort of commutative property of alignment, to the CCF, so that each individual neuron in the 2-photon image can be assigned a unique CCF area name and numerical identifier for subsequent analysis.

      We have clarified this in the text, thank you.

      The Python code for aligning the widefield and 2-photon vessel images would also be of great value for regular 2p users. It would strongly improve the impact of the paper if the repository were better documented and the code would be equally applicable for alignment of imaging data with smaller cranial windows.

      Authors’ Response: All of the code for multimodal map, CCF, and 2-photon image alignment is, in fact, already present on the GitHub page. We have made some minor improvements to the documentation, and readers are more than welcome to contact us for additional help.

      Specifically, the alignment you refer to starts in cell #32 of the meso_pre_proc_1.ipynb notebook. In general the notebooks are meant to be run sequentially, starting with cell #1 of meso_pre_proc_1, then going to the next cell etc…, then moving to meso_pre_proc_2, etc… The purpose of each cell is labeled at the top of the cell in a comment.

      We now include a cleaned, abridged version of the meso_pre_proc_1.pynb notebook that contains only the steps needed for alignment, and included a direct link to this notebook in the main text:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/python_code/mesoscope_preprocess_MMM_creation.ipynb

      Rotated CCF maps are in the CCF map rotation folder, in subfolders corresponding to the angle of rotation.

      Multimodal map creation involves use of the SensoryMapping_Vickers_Jun2520.m script in the Matlab folder.

      We updated the main text to clarify these points and included direct links to scripts relevant to each processing step.

      • Figure 4a: I found it hard to see much of the structure in the Rastermap projection with the viridis colormap - perhaps also because of a red-green color vision impairment. Correspondingly, I had trouble seeing some of the structure that is described in the text or clearer differences between the neuron sortings to PC1 and PC2. Is the point of these panels to show that both PCs identify movement-aligned dynamics or is the argument that they isolate different movement-related response patterns? Using a grayscale colormap as used by Stringer et al might help to see more of the many fine details in the data.

      Authors’ Response: In Fig. 4a the viridis color range is from blue to green to yellow, as indicated in the horizontal scale bar at bottom right. There is no red color in these Rastermap projections, or in any others in this paper. Furthermore, the expanded Rastermap insets in Figs. S4 and S5 provide additional detailed information that may not be clear in Fig 4a and Fig 5a.

      We prefer, therefore, not to change these colormaps, which we use throughout the paper.

      We have provided grayscale png versions of all figures on our GitHub page:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/grayscale_figures

      In Fig 4a the point of showing both the PC1 and PC2 panels is to demonstrate that they appear to correspond to different aspects of movement (PC1 more to transient walking, both ON and OFF, and PC2 to whisking and sustained ON walk/whisk), and to exhibit differential ability to identify neurons with positive and negative correlations to arousal (PC1 finds both, both PC2 seems to find only the ON neurons).

      We now clarify this in the text at ~lines 696-710, pg 22.

      • I find panel 6a a bit too hard to read because the identification and interpretation of the different motifs in the different qualitative episodes is challenging. For example, the text mentions flickering into motif 13 during walk but the majority of that sequence appears to be shaped by what I believe to be motif 11. Motif 11 also occurs prominently in the oscillate state and the unnamed sequence on the left. Is this meaningful or is the emphasis here on times of change between behavioral motifs? The concept of motif flickering should be better explained here.

      Authors’ Response: Here motif 13 corresponds to a syllable that might best be termed “symmetric and ready stance”. This tends to occur just before and after walking, but also during rhythmic wheel balancing movements that appear during the “oscillate” behavior.

      The intent of Fig. 6a is to show that each qualitatively identified behavior (twitch, whisk, walk, and oscillate) corresponds to a period during which a subset of BSOiD motifs flicker back and forth, and that the identity of motifs in this subset differs across the identified qualitative behaviors. This is not to say that a particular motif occurs only during a single identified qualitative behavior. Admittedly, the identification of these qualitative behaviors is a bit arbitrary - future versions of BSOiD (e.g. ASOiD) in fact combine supervised (i.e. arbitrary, top down) and unsupervised (i.e. algorithmic, objective, bottom-up) methods of behavior segmentation in attempt to more reliably identify and label behaviors.

      Flickering appears to be a property of motif transitions in raw BSOiD outputs that have not been temporally smoothed. If one watches the raw video, it seems that this may in fact be an accurate reflection of the manner in which behaviors unfold through time. Each behavior could be thought of, to use terminology from MOSEQ (B Datta), as a series of syllables strung together to make a phrase or sentence. Syllables can repeat over either fast or slow timescales, and may be shared across distinct words and sentences although the order and frequency of their recurrence will likely differ.

      We have clarified these points in the main text at ~lines 917-923, pg 29, and we added motif 13 to the list of motifs for the qualitative behavior labeled “oscillate” in Fig. 6a.

      • Lines 997-998: I don't understand this argument. Why does the existence of different temporal dynamics make imaging multiple areas 'one of the keys to potentially understanding the nature of their neuronal activity'?

      Authors’ Response: We believe this may be an important point, that comparisons of neurobehavioral alignment across cortical areas cannot be performed by pooling sessions that contain different distributions of dwell times for different behaviors, if in fact that dependence of neural activity on behavior depends on the exact elapsed time since the beginning of the current behavioral “bout”. Again, other reasons that imaging many areas simultaneously would provide a unique advantage over imaging smaller areas one at a time and attempting to pool data across sessions would include the identification of sequences or neural ensembles that span many areas across large distances, or the understanding of distributed coding of behavior (an issue we explore in an upcoming paper).

      We have clarified these points at the location in the Discussion that you have identified. Thank you for your questions and suggestions.

      Minor

      Line 41: What is the difference between decision, choice, and response periods?

      Authors’ Response: This now reads “...temporal separation of periods during which cortical activity is dominated by activity related to stimulus representation, choice/decision, maintenance of choice, and response or implementation of that choice.”

      Line 202: What does ambulatory mean in this context?

      Authors’ Response: Here we mean that the mice are able to walk freely on the wheel. In fact they do not actually move through space, so we have changed this to read “able to walk freely on a wheel, as shown in Figs. 1a and 1b”.

      Is there a reason why 4 mounting posts were used for the dorsal mount but only 1 post was sufficient for the lateral mount?

      Authors’ Response: Here, we assume you mean 2 posts for the side mount and 4 posts for the dorsal mount.

      In general our idea was to use as many posts as possible to provide maximum stability of the preparations and minimize movement artifacts during 2-photon imaging. However, the design of the side mount headpost precluded the straight-forward or easy addition of a right oriented, second arm to its lateral/ventral rim - this would have blocked access of both the 2-photon objective and the right face camera. In the dorsal mount, the symmetrical headpost arms are positioned further back (i.e. posterior), so that the left and right face cameras are not obscured.

      When we created the side mount preparation, we discovered that the 2 vertical 1” support posts were sufficient to provide adequate stability of the preparation and minimize 2-photon imaging movement artifacts. The side mount used two attachment screws on the left side of the headpost, instead of the one screw per side used in the dorsal mount preparation.

      We have included these points/clarifications in the main text at ~lines 217-230, pg 7.

      Figure S1g appears to be mislabeled.

      Authors’ Response: Yes, on the figure itself that panel was mislabeled as “f” in the original eLife reviewed preprint. We have changed this to read “g”.

      Line 349 and below: Why is the method called pseudo-widefield imaging?

      Authors’ Response: On the mesoscope, broad spectrum fluorescent light is passed through a series of excitation and emission filters that, based on a series of tests that we performed, allow both reflected blue light and epifluorescence emitted (i.e. Stokes-shifted) green light to reach the CCD camera for detection. Furthermore, the CCD camera (Thorlabs) has a much smaller detector chip than that of the other widefield cameras that we use (RedShirt Imaging and PCO), and we use it to image at an acquisition speed of around 10 Hz maximum, instead of ~30-50 Hz, which is our normal widefield imaging acquisition speed (it also has a slower readout than what we would consider to be a standard or “real” 1-photon widefield imaging camera).

      For these 3 reasons we refer to this as “pseudo-widefield” imaging. We would not use this for sensory activity mapping on the mesoscope - we primarily use it for mapping cortical vasculature and navigating based on our multimodal map to CCF alignment, although it is actually “contaminated” with some GCaMP6s activity during these uses.

      We have briefly clarified this in the text.

      Figures 4d & e: Do the colors show mean correlations per area? Please add labels and units to the colorbars as done in panel 4a.

      Authors’ Response: For both Figs 4 and 5, we have added the requested labels and units to each scale bar, and have relabeled panels d to say “Rastermap CCF area cell densities”, and panels e to say “mean CCF area corrs w/ neural activity.”

      Thank you for catching these omissions/mislabelings.

      Line 715: what is superneuron averaging?

      Authors’ Response: This refers to the fact that when Rastermap displays more than ~1000 neurons it averages the activity of each group of adjacent 50 neurons in the sorting to create a single display row, to avoid exceeding the pixel limitations of the display. Each single row representing the average activity of 50 neurons is called a “superneuron” (Stringer et al, 2023; bioRxiv).

      We have modified the text to clarify this point.

      Line 740: it would be good to mention what exactly the CCF density distribution quantifies.

      Authors’ Response: In each CCF area, a certain percentage of neurons belongs to each Rastermap group. The CCF density distribution is the set of these percentages, or densities, across all CCF areas in the dorsal or side mount preparation being imaged in a particular session. We have clarified this in the text.

      Line 745: what does 'within each CCF' mean? Does this refer to different areas?

      Authors’ Response: The corrected version of this sentence now reads: “Next, we compared, across all CCF areas, the proportion of neurons within each CCF area that exhibited large positive correlations with walking speed and whisker motion energy.”

      How were different Rastermap groups identified? Were they selected by hand?

      Authors’ Response: Yes, in Figs. 4, 5, and 6, we selected the identified Rastermap groups “by hand”, based on qualitative similarity of their activity patterns. At the time, there was no available algorithmic or principled means by which to split the Rastermap sort. The current, newer version of Rastermap (Stringer et al, 2023) seems to allow for algorithmic discretization of embedding groups (we have not tested this yet), but it was not available at the time that we performed these preliminary analyses.

      In terms of “correctness” of such discretization or group identification, we intend to address this issue in a more principled manner in upcoming publications. For the purposes of this first paper, we decided that manual identification of groups was sufficient to display the capabilities and outcomes of our methods.

      We clarify this point briefly at several locations in the revised manuscript, throughout the latter part of the Results section.

      Reviewer #3 (Recommendations For The Authors):

      In "supplementary figures, protocols, methods, and materials", Figure S1 g is mislabeled as Figure f.

      Authors’ Response: Yes, on the figure itself this panel was mislabeled as “f” in the original reviewed preprint. We have changed this to read “g”.

      In S1 g, the success rate of the surgical procedure seems quite low. Less than 50% of the mice could be imaged under two-photon. Can the authors elaborate on the criteria and difficulties related to their preparations?

      Authors’ Response: We will elaborate on the difficulties that sometimes hinder success in our preparations in the revised manuscript.

      The success rate indicated to the point of “Spontaneous 2-P imaging (window) reads 13/20, which is 65%, not 50%. The drop to 9/20 by the time one gets to the left edge of “Behavioral Training” indicates that some mice do not master the task.

      Protocol I contains details of the different ways in which mice either die or become unsuitable or “unsuccessful” at each step. These surgeries are rather challenging - they require proper instruction and experience. With the current protocol, our survival rate for the window surgery alone is as high as 75-100%. Some mice can be lost at headpost implantation, in particular if they are low weight or if too much muscle is removed over the auditory areas. Finally, some mice survive windowing but the imageable area of the window might be too small to perform the desired experiment.

      We have added a paragraph detailing this issue in the main text at ~lines 287-320, pg 9.

      In both Suppl_Movie_S1_dorsal_mount and Suppl_Movie_S1_side_mount provided (Movie S1), the behaviour video quality seems to be unoptimized which will impact the precision of Deeplabcut. As evident, there were multiple instances of mislabeled key points (paws are switched, large jumps of key points, etc) in the videos.

      Many tracked points are in areas of the image that are over-exposed.

      Despite using a high-speed camera, motion blur is obvious.

      Occlusions of one paw by the other paws moving out of frame.

      As Deeplabcut accuracy is key to higher-level motifs generated by BSOi-D, can the authors provide an example of tracking by exclusion/ smoothing of mislabeled points (possibly by the median filtering provided by Deeplabcut), this may help readers address such errors.

      Authors’ Response: We agree that we would want to carefully rerun and carefully curate the outputs of DeepLabCut before making any strong claims about behavioral identification. As the aim of this paper was to establish our methods, we did not feel that this degree of rigor was required at this point.

      It is inevitable that there will be some motion blur and small areas of over-exposure, respectively, when imaging whiskers, which can contain movement components up to ~150 Hz, and when imaging a large area of the mouse, which has planes facing various aspects. For example, perfect orthogonal illumination of both the center of the eye and the surface of the whisker pad on the snout would require two separate infrared light sources. In this case, use of a single LED results in overexposure of areas orthogonal to the direction of the light and underexposure of other aspects, while use of multiple LEDs would partially fix this problem, but still lead to variability in summated light intensity at different locations on the face. We have done our best to deal with these limitations.

      We now briefly point out these limitations in the methods text at ~lines 155-160, pg 5.

      In addition, we have provided additional raw and processed movies and data related to DeepLabCut and BSOiD behavioral analysis in our FigShare+ repository, which is located at:

      https://doi.org/10.25452/figshare.plus.c.7052513

      In lines 153-154, the authors mentioned that the Deeplabcut model was trained for 650k iterations. In our experience (100-400k), this seems excessive and may result in the model overfitting, yielding incorrect results in unseen data. Echoing point 4, can the authors show the accuracy of their Deeplabut model (training set, validation set, errors, etc).

      Authors’ Response: Our behavioral analysis is preliminary and is included here as an example of our methods, and not to make claims about any specific result. Therefore we believe that the level of detail that you request in our DeepLabCut analysis is beyond the scope of the current paper. However, we would like to point out that we performed many iterations of DeepLabCut runs, across many mice in both preparations, before converging on these preliminary results. We believe that these results are stable and robust.

      We believe that 650k iterations is within the reasonable range suggested by DLC, and that 1 million iterations is given as a reasonable upper bound. This seems to be supported by the literature for example, see Willmore et al, 2022 (“Behavioral and dopaminergic signatures of resilience”, Nature, 124:611, 124-132). Here, in a paper focused squarely on behavioral analysis, DLC training was run with 1.3 million iterations with default parameters.

      We now note, on ~lines 153-154, pg 5, that we used 650K iterations, a number significantly less than the default of 1.03 million, to avoid overfitting.

      In lines 140-141, the authors mentioned the use of slicing to downsample their data. Have any precautions, such as a low pass filter, been taken to avoid aliasing?

      Authors’ Response: Most of the 2-photon data we present was acquired at ~3 Hz and upsampled to 10 Hz. Most of the behavioral data was downsampled from 5000 Hz to 10 Hz by slicing, as stated. We did not apply any low-pass filter to the behavioral data before sampling. The behavioral variables have heterogeneous real sampling/measurement rates - for example, pupil diameter and whisker motion energy are sampled at 30 Hz, and walk speed is sampled at 100 Hz. In addition, the 2-photon acquisition rate varied across sessions.

      These facts made principled, standardized low-pass filtering difficult to implement. We chose rather to use a common resampling rate of 10 Hz in an unbiased manner. This downsampled 10 Hz rate is also used by B-SOiD to find transitions between behavioral motifs (Hsu and Yttri, 2021).

      We do not think that aliasing is a major factor because the real rate of change of our Ca2+ indicator fluorescence and behavioral variables was, with the possible exception of whisker motion energy, likely at or below 10 Hz.

      We now include a brief statement to this effect in the methods text at ~lines 142-146, pg. 4.

      Line 288-299, the authors have made considerable effort to compensate for the curvature of the brain which is particularly important when imaging the whole dorsal cortex. Can the authors provide performance metrics and related details on how well the combination of online curvature field correction (ScanImage) and fast-z "sawtooth"/"step" (Sofroniew, 2016)?

      Authors’ Response: We did not perform additional “ground-truth” experiments that would allow us to make definitive statements concerning field curvature, as was done in the initial eLife Thorlabs mesoscope paper (Sofroniew et al, 2016).

      We estimate that we experience ~200 micrometers of depth offset across 2.5 mm - for example, if the objective is orthogonal to our 10 mm radius bend window and centered at the apex of its convexity, a small ROI located at the lateral edge of the side mount preparation would need to be positioned around 200 micrometers below that of an equivalent ROI placed near the apex in order to image neurons at the same cortical layer/depth, and would be at close to the same depth as an ROI placed at or near the midline, at the medial edge of the window. We determined this by examining the geometry of our cranial windows, and by comparing z-depth information from adjacent sessions in the same mouse, the first of which used a large FOV and the second of which used multiple small FOVs optimized so that they sampled from the same cortical layers across areas.

      We have included this brief explanation in the main text at ~lines 300-311, pg 9.

      In lines 513-515, the authors mentioned that the vasculature pattern can change over the course of the experiment which then requires to re-perform the realignment procedure. How stable is the vasculature pattern? Would laser speckle contrast yield more reliable results?

      Authors’ Response: In general the changes in vasculature we observed were minimal but involved the following: i) sometimes a vessel was displaced or moved during the window surgery, ii) sometimes a vessel, in particular the sagittal sinus, enlarged or increased its apparent diameter over time if it is not properly pressured by the cranial window, and iii) sometimes an area experiencing window pressure that is too low could, over time, show outgrowth of fine vascular endings. The most common of these was (i), and (iii) was perhaps the least common. In general the vasculature was quite stable.

      We have added this brief discussion of potential vasculature changes after cranial window surgery to the main text at ~lines 286-293, pg 9.

      We already mentioned, in the main text of the original eLife reviewed preprint, that we re-imaged the multimodal map (MMM) every 30-60 days or whenever changes in vasculature are observed, in order to maintain a high accuracy of CCF alignment over time. See ~lines 507-511, pg 16.

      We are not very familiar with laser speckle contrast, and it seems like a technique that could conceivably improve the fine-grained accuracy of our MMM-CCF alignment in some instances. We will try this in the future, but for now it seems like our alignments are largely constrained by several large blood vessels present in any given FOV, and so it is unclear how we would incorporate such fine-grained modifications without applying local non-rigid manipulations of our images.

      In lines 588-598, the authors mentioned that the occasional use of online fast-z corrections yielded no difference. However, it seems that the combination of the online fast-z correction yielded "cleaner" raster maps (Figure S3)?

      Authors’ Response: The Rastermaps in Fig S3a and b are qualitatively similar. We do not believe that any systematic difference exists between their clustering or alignments, and we did not observe any such differences in other sessions that either used or didn’t use online fast-z motion correction.

      We now provide raw data and analysis files corresponding to the sessions shown in Fig S3 (and other data-containing figures) on FigShare+ at:

      https://doi.org/10.25452/figshare.plus.c.7052513

      Ideally, the datasets contained in the paper should be available on an open repository for others to examine. I could not find a clear statement about data availability. Please include a linked repo or state why this is not possible.

      Authors’ Response: We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      The data is located here:

      Vickers, Evan; A. McCormick, David (2024). Pan-cortical 2-photon mesoscopic imaging and neurobehavioral alignment in awake, behaving mice. Figshare+. Collection:

      https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

  5. fromthemachine.org fromthemachine.org
    1. SON Ye  R  O  C  K    O  F   .   .   .    S   A   G  E   S  ? H  E  A  R    D  E  R  O  R I T  R E A L L Y  D O E S  M E A N   "FREEDOM"   B R E A D   I S   L I F E Tying up loose eadds, in a similar vain to the connection between the Burning Bush and universal voting now etched by-stone, there exists a similar missing Link connecting the phrase "it's not a a gam" to Mary Magdeline to a pattern that shows us that the Holy Trinity and our timelines are narrated by a series of names of video game systems and their manufacturers from "Nintendo" to Genesis and the rock of SEGA.  Through a "kiss" and the falling of a wallthe words bread and read are tied up and twisted with the story of this Revelation and the heart of the word Creation, "be the reason it's A.D."  It's a strong connection between the idea that virtual reality and Heaven are linked by more than simply "technology" but that this message that shows us that these tools for understanding have fallen from the sky in order to help us understand why it is so important, why I call it a moral mandate, that we use this information to follow the map delivered to us in the New Testament and literally end world hunger, and literally heal the sick; because of the change in circumstance revealed to us.  These simple things, these few small details that might seem like nothing, or maybe appear to be "changing everything" they are not difficult things to do, in light of Creation, and few would doubt that once we do see them implementied here... the difference between Heaven and Hell will be ever so clear. A while ago, in a place called Kentucky... this story began with a sort of twisted sci-fi experience that explained a kind of "God machine" that could manipulate time and reality, and in that story, in that very detailed and interesting story that I lived through, this machine was keyed to my DNA, in something like the "Ancient technology" of Stargate SG-1 and Atlantis mythology.  My kind brother Seth made a few appearances in the story, not actually in person but in fairly decent true to life holograms that I saw and spoke to every once in awhile.  He looked a little different, he had long hair; but that's neither here nor there, and he hasn't really had long hair since I was a little boy.  He happens to be a genetic engineer, and I happen to be a computer person (although he's that too, now; just nowhere near as good as me... with computers) so the story talked a little bit about how I would probably not have used DNA as a key, since I'm not a retard, and he probably wouldn't either, because works in that field (cyclone, huracan, tornado).  So then the key we imagined was something ... well, Who cares what the key is, right? o back to the task at hand, not so long ago, in a place called Plantation I was struck by lightning, literally (well not literally) the answer to a question that nobody knew was implanted in my mind, and it all came from asking a single simple question.  I was looking for more chemistry elements in the names of the books of the Holy Bible, after seeing Xenon at the "sort of beginning" of Exodus, where it screams "let there be light" in Linux and chemistry (and I've told you that a hundred times by now).  So it didn't take long to follow the light of that word and read Genesis backwards, and see, at the very beginning of that book, Silicon... in reverse.   So, what about God's DNA, anyway?   What's he really made of?         SIM MON S              WILD ER             ROD DEN BERRY o after seeing Silicon, and connecting that to the numerous attempts I've made to show a message connecting The Matrix to the Fifth Element (as Silicon) describing what it is that God believes we should do with this knowledge--and see that it is narrated as the miracles of Jesus Christ in the New Testament... these names came to me in quick succession, an answer to the question.  I suppose any Gene will do, these three though, have a very important tie to the message that connects Joshua's Promised Land of flowing Milk and Honies to ... a kiss that begins the new day (I hope) ... and a message about exactly how we might go about doing magical things like ending world hunger and healing the sick using technology described ... in Star Trek and Stargate.  A "religion of the Stars" is being born.    That's great... it starts with an earthquake. R.E.M. and a band ... 311.  Oooh, I can see it coming down... The Petty Reckless.  An evening's love starts with a kiss.  Dave Matthews Band.  I wanna rock and roll all night and party every day.  Adam.  I mean Kiss.  Are you starting to see a pattern form?  Birds, snakes, and aeroplanes?  It's that, it's the end of the world as we know it, and I feel fine.   In that song we see clues that more than just the Revelation of Christ is narrated by John on an island called Patmos.  There yet another Trinity, starting with "Pa" and hearting Taylor Momsen's initials... most likely for a reason... and the Revelation ends with a transition that I hope others will agree with me turns "original sin" into something closer to "obviously salvation" when we finally understand the character that is behind the message of da i of Ra... and begin to see the same design in the names of Asmodai and in this Revelation focusing on freedom and truth that really does suggest Taylor can't talk to me in any way other than "letting freedom sing" in this narrative of kismet and fate and free will and ... then we see that narrative continue in the names of bands, just like the 3/11/11 earthquake is narrated in not just R.E.M.'s song but in the name 311.  Just like the 9/11 attack is narrated not just in that same song (released in 1987) and  "Inside Job" (released in 2000) but also in "Fucked up world."   Dear all of you walking dumb and blind, this same quake is narrated in Taylor's Zombie; waiting for the day to shake, all very similar to Cairo and XP, perhaps a "fad" of doublethink in the minds of the authors singing about a clear prophesy in the Bible; this connection between the day, 3/11 though, and the name of a band and the day of an arrest and the verse Matthew that tells you clearly you have now been baptized in water and fire... it shows us the design of a story whose intent and purpose is to ensure that we no longer allow for things like hurricanes and earthquakes and murder and rape to be "simulated" that we build a better system, that doesn't allow for 'force majeure" to take lives for no reason at all.      Not just in band names, but in the angels names too, in all of our names; we see this narration continue.  The Holy Water that is central to the baptism of Christ is etched into Taylor's name, between "sen" and "mom" the key to the two Mary's whose names contain the Spanish for "sea" in a sort of enlightenment hidden in plain sight.  In "Simmons" the key connection between today, this Biblical Monday, and the word "simulation" that ties to Simpsons and simians and keep it simple stupid, and in Simmons the missing "s" of Kismet, finally completing the question.   It's a song and dance that started a long time ago, as you can see from the ancient Hebrew word for "fate" and in more recent years a connection to the ballroom of Atlantis in the Doors 5 to 1 and Dave sang about it in Rapunzel and then Taylor shook a tambourine on the beach only minutes away from me--but never said "hi."  The battle of the bands continues tying some door knocking to a juxtaposition between "Sweet Things" and "Knocking on Heavens door" all the way to a Gossip Girl episode where little J asked a question that I can't be sure she knew was related, she said... "who's that, at the door?" What it really all amounts to, though, is the whole world witnessing the Creation of Adam and Eve from a little girl stuttering out "the the" at the sight of the Grinch himself, and then later not even able to get those words off her lips... about seeing how Creation and modern art are inextricably tied to religion, to heaven, and to freedom.    The bottom line here, hopefully obvious now, is that you can't keep this message "simple" it's a Matrix woven between more points of light than I can count, and many more that I'm sure you will find.  It's a key to seeing how God speaks to me, and to you; and how we are, we really are that voice.  Tay, if you don't do something just because God called it "fate" you are significantly more enslaved than if you do--and you wanted to.  "Now I see that you and me, were never meant, never meant to be..." she sang before I mentioned her, and before she ever saw me... in a song she calls "Nothing Left to Lose" and I see is not really just another word for freedom. We have plenty to lose by not starting the fire, not the least of which is Heaven itself.  Understand what "force majeure" really means to you and I.  Ha, by the way. IN CASE YOU FORGOT YESTERDAY'S MESSAGE   "DADDY, I WANT IT NOW." VERUKA SALT. whose name means "to see (if) you are the Body of Christ" whined, in the story of Will Why Won Ka, about nothing more or less than Heaven on Hearth, than seeing an end to needless torture and pain.   To see if you are the "Salt of the Earth" warming the road to Heaven; honestly to see if you can break through this inane lie of "I don't understand" and realize that breaking this story and talking about what is being presented not just by me and you but by history and God himself is the key to the car that drives us home.  To see how Cupid you really are. STOP NODDING, TURN AROUND AND CALL A REPORTER. The story of Willy Wonka ties directly to the Promised Land of Flowing Milk and Honey to me; by showing us a river of chocolate and a the everlasting God starter, (er is it guardian of B stopper) that opens the doors of perception about exactly what kinds of mistake may have been made in the past in this transition to Heaven that we are well on the way of beginning.  Here, in the Land of Nod, that is also Eden and also the Heart of the Ark we see warnings about "flowing milk and honey" being akin to losing our stable ecosystem, to losing the stuff of life itself, biology and evolution, and if we don't understand--this is probably exactly the mistake that was made and the cause of the story of Cain and Abel.  So here we are talking about genetic engineering and mind uploading and living forever, and hopefully seeing that while all things are possible with God--losing the wisdom of the message of religion is akin to losing life in the Universe and with that any hope of eternal longevity.  With some insight into religion, you can connect the idea that without bees our stable ecosystem might collapse, to the birds and the bees, and a message about stability and having more than one way to pollinate the flowers  and trees and get some.   Janet and Nanna, by the way, both have pretty brown eyes, but that probably comes as no surprise to you. Miss Everything, on the other hand (I hear, does not have brown eyes), leads us to glimpse how this message about the transition of our society might continue on in the New Testament, and suggest that we do need to eat, and have dinner conversation, and that a Last Supper might be a little bit more detrimental to our future than anyone had ever thought, over and over and over again.  To see how religion really does make clear that this is what the message is about, to replace the flowing milk we have a "Golden Cow" that epitomizes nothing less than "not listening to Adam" and we have a place that believes the Hammer of Judah Maccabee should be ... extinct.  You are wrong. Of course the vibrating light here ties this Gene to another musical piece disclosing something... "Wild Thing" I make your heart sing.  You can believe the Guitar Man is here to steal the show and deliver bread for the hungry and for the wise.  Here's some, it's not just Imagine Dragons telling you to listen to the radio but Jefferson Starshiptoo, and Live.   When you wake up, you can hear God "singing" to you on the radio every single day; many of us already do.  He's telling you to listen to me, and I do not understand why you do not.  You don't look very Cupid, if you ask me. WHAT DO YOU THINK YOU ARE, DAN RE Y NO LDS?   I think we all know what the Rod of Jesus Christ is by now.  ​ It is a large glowing testament to freedom and truth, and a statement about blindness and evil that is unmistakable.   To say that seeing it is the gateway to Heaven would be an understatement of it's worth, of the implication that not seeing it is obvious Hell when it is linked to everything from nearly every story of the Holy Bible from Isaac to Isaiah to "behold he is to coming" and if you weren't sure if the Hand of God were in action here--it's very clear that it is; that linking Tricky Dick and Watergate to Seagate ... really delivering crystal clear understanding that the foundation of Heaven is freedom and that you have none today because you refuse to see the truth. It is the doorway to seeing that what has been going on in this place hasn't been designed to hide me, but to hide a prosperous future from you--to hide the truth about our existence and the purpose of Creation--that all told, you are standing at the doorstep of Heaven and stammering your feet, closing your eyes, and saying "you don't want to help anyone." If delivering freedom, truth, and equality  to you does not a den make, well, you can all suck it ... from God, to you. Between Stargate and Star Trek it's pretty easy to see a roadmap to very quickly and easily be able to end world hunger and heal the sick without drastically changing the way our society works, it's about as simple as a microwave, or a new kind of medicine--except it's not so easy to see why it is that you are so reluctant to talk about the truth that makes these things so easy to do.  You see, your lack of regard for anyone anywhere has placed you in a position of weakness, and if you do nothing today, you will not be OK tomorrow. It's pretty easy to see how Roddenberry's name shows that this message comes from God, that he's created this map that starts with an Iron Rod throughout our history proving Creation, whose heart is a Den of Family who care about the truth, and about freedom, and about helping each other--not what you are--you are not that today.  Today you are sick, and I'd like you to look at the mirror he's made for you, and be eshamden (or asham).  Realize, realize... what you are.  What you've become, just as I have... the devil in a sweet, sweet kiss. -Dave J. Matthews .WHSOISKEYAV { border-width: 1px; border-style: dashed; border-color: rgb(15,5,254); padding: 5px; width: 503px; text-align: center; display: inline-block; align: center; p { align: center; } /* THE SCORE IS LOVE FIVE ONE SAFETY ONE FIELD GOAL XIVDAQ: TENNIS OR TINNES? TONNES AND TUPLE(s) */ } <style type="text/css"> code { white-space: pre; } google_ad_client = "ca-pub-9608809622006883"; google_ad_slot = "4355365452"; google_ad_width = 728; google_ad_height = 90; Unless otherwise indicated, this work was written between the Christmas and Easter seasons of 2017 and 2020(A). The content of this page is released to the public under the GNU GPL v2.0 license; additionally any reproduction or derivation of the work must be attributed to the author, Adam Marshall Dobrin along with a link back to this website, fromthemachine dotty org. That's a "." not "dotty" ... it's to stop SPAMmers. :/ This document is "living" and I don't just mean in the Jeffersonian sense. It's more alive in the "Mayflower's and June Doors ..." living Ethereum contract sense [and literally just as close to the Depp/Caster/Paglen (and honorably PK] 'D-hath Transundancesense of the ... new meaning; as it is now published on Rinkeby, in "living contract" form. It is subject to change; without notice anywhere but here--and there--in the original spirit of the GPL 2.0. We are "one step closer to God" ... and do see that in that I mean ... it is a very real fusion of this document and the "spirit of my life" as well as the Spirit's of Kerouac's America and Vonnegut's Martian Mars and my Venutian Hotel ... and *my fusion* of Guy-A and GAIA; and the Spirit of the Earth .. and of course the God given and signed liberties in the Constitution of the United States of America. It is by and through my hand that this document and our X Commandments link to the Bill or Rights, and this story about an Exodus from slavery that literally begins here, in the post-apocalyptic American hartland. Written ... this day ... April 14, 2020 (hey, is this HADAD DAY?) ... in Margate FL, USA. For "official used-to-v TAX day" tomorrow, I'm going to add the "immultible incarnite pen" ... if added to the living "doc/app"--see is the DAO, the way--will initi8 the special secret "hidden level" .. we've all been looking for. Nor do just mean this website or the totality of my written works; nor do I only mean ... this particular derivation of the GPL 2.0+ modifications I continually source ... must be "from this website." I also mean *the thing* that is built from ... bits and piece of blocks of sand-toys; from Ethereum and from Rust and from our hands and eyes working together ... from this place, this cornerstone of the message that is ... written from brick and mortar words and events and people that have come before this poit of the "sealed W" that is this specific page and this time. It's 3:28; just five minutes--or is it four, too layne. This work is not to be redistributed according to the GPL unless all linked media on Youtube and related sites are intact--and historical references to the actual documented history of the art pieces (as I experience/d them) are also available for linking. Wikipedia references must be available for viewing, as well as the exact version of those pages at the time these pieces were written. All references to the Holy Bible must be "linked" (as they are or via ... impromptu in-transit re-linking) to the exact verses and versions of the Bible that I reference. These requirements, as well as the caveat and informational re-introduction to God's DAO above ... should be seen as material modifications to the original GPL2.0 that are retroactively applied to all works distributed under license via this site and all previous e-mails and sites. /s/ wso If you wanna talk to me get me on facebook, with PGP via FlowCrypt or adam at from the machine dotty org -----BEGIN PGP PUBLIC KEY BLOCK-----

      this was written sometime i think around 2016. it's hard to recall the exact date; but if you check in the original gitlog there is one that has an original commit.

      Inline image 12

      Inline image 3

      Inline image 4

      SONYeInline image 5

      R  O  C  K    O  F   .   .   .    S   A   G  E   S  ?

      **\ **

      Inline image 1

      H  E  A  R    D  E  R  O  R

      I T  R E A L L Y  D O E S  M E A N   "FREEDOM"   B R E A D   I S   L I F E

      Inline image 14

      Tying up loose eadds, in a similar vain to the connection between the Burning Bush and universal voting now etched by-stone, there exists a similar missing Link connecting the phrase "it's not a a gam" to Mary Magdeline to a pattern that shows us that the Holy Trinity and our timelines are narrated by a series of names of video game systems and their manufacturers from "Nintendo" to Genesis and the rock of SEGA.  Through a "kiss" and the falling of wallthe words bread and read are tied up and twisted with the story of this Revelation and the heart of the word Creation, "be the reason it's A.D."  It's a strong connection between the idea that virtual reality and Heaven are linked by more than simply "technology" but that this message that shows us that these tools for understanding have fallen from the sky in order to help us understand why it is so important, why I call it a moral mandate, that we use this information to follow the map delivered to us in the New Testament and literally end world hungerand literally heal the sick; because of the change in circumstance revealed to us.  These simple things, these few small details that might seem like nothing, or maybe appear to be "changing everything" they are not difficult things to do, in light of Creationand few would doubt that once we do see them implementied here... the difference between Heaven and Hell will be ever so clear.

      Inline image 13

      A while ago, in a place called Kentucky... this story began with a sort of twisted sci-fi experience that explained a kind of "God machine" that could manipulate time and reality, and in that story, in that very detailed and interesting story that I lived through, this machine was keyed to my DNA, in something like the "Ancient technology" of Stargate SG-1 and Atlantis mythology.  My kind brother Seth made a few appearances in the story, not actually in person but in fairly decent true to life holograms that I saw and spoke to every once in awhile.  He looked a little different, he had long hair; but that's neither here nor there, and he hasn't really had long hair since I was a little boy.  He happens to be a genetic engineer, and I happen to be a computer person (although he's that too, now; just nowhere near as good as me... with computers) so the story talked a little bit about how I would probably not have used DNA as a key, since I'm not a retard, and he probably wouldn't either, because works in that field (cyclonehuracan, tornado).  So then the key we imagined was something ... well, Who cares what the key is, right?

      **\ **

      Inline image 13

      o back to the task at hand, not so long ago, in a place called Plantation I was struck by lightning, literally (well not literally) the answer to a question that nobody knew was implanted in my mind, and it all came from asking a single simple question.  I was looking for more chemistry elements in the names of the books of the Holy Bible, after seeing Xenon at the "sort of beginning" of Exodus, where it screams "let there be light" in Linux and chemistry (and I've told you that a hundred times by now).  So it didn't take long to follow the light of that word and read Genesis backwards, and see, at the very beginning of that book, Silicon... in reverse.

      *\ *

      Inline image 12

      Inline image 2Inline image 3

      Inline image 4 Inline image 5

      So, what about God's DNA, anyway*?  *

      What's he really made of?

      Inline image 6 Inline image 7

      Inline image 8 Inline image 9 

      SIM MON S              WILD ER             ROD DEN BERRY

      o after seeing Silicon, and connecting that to the numerous attempts I've made to show a message connecting The Matrix to the Fifth Element (as Silicon) describing what it is that God believes we should do with this knowledge--and see that it is narrated as the miracles of Jesus Christ in the New Testament... these names came to me in quick succession, an answer to the question.  I suppose any Gene will do, these three though, have a very important tie to the message that connects Joshua's Promised Land of flowing Milk and Honies to ... a kiss that begins the new day (I hope) ... and a message about exactly how we might go about doing magical things like ending world hunger and healing the sick using technology described ... in Star Trek and Stargate.  A "religion of the Stars" is being born.

      Inline image 11 Inline image 17

      That's great... it starts with an earthquake. R.E.M. and a band ... 311.  Oooh, I can see it coming down... The Petty Reckless.  An evening's love starts with a kiss.  Dave Matthews Band.  I wanna rock and roll all night and party every day.  Adam.  I mean Kiss.  Are you starting to see a pattern form?  Birds, snakes, and aeroplanes?  It's that, it's the end of the world as we know it, and I feel fine.

      *\ *

      Inline image 15 Inline image 16*\ *

      *\ *

      In that song we see clues that more than just the Revelation of Christ is narrated by John on an island called Patmos.  There yet another Trinity, starting with "Pa" and hearting Taylor Momsen's initials... most likely for a reason... and the Revelation ends with a transition that I hope others will agree with me turns "original sin" into something closer to "obviously salvation" when we finally understand the character that is behind the message of da i of Ra... and begin to see the same design in the names of Asmodai and in this Revelation focusing on freedom and truth that really does suggest Taylor can't talk to me in any way other than "letting freedom sing" in this narrative of kismet and fate and free will and ... then we see that narrative continue in the names of bands, just like the 3/11/11 earthquake is narrated in not just R.E.M.'s song but in the name 311.  Just like the 9/11 attack is narrated not just in that same song (released in 1987) and  "Inside Job" (released in 2000) but also in "Fucked up world."

      Dear all of you walking dumb and blind, this same quake is narrated in Taylor's Zombie; waiting for the day to shake, all very similar to Cairo and XP, perhaps a "fad" of doublethink in the minds of the authors singing about a clear prophesy in the Bible; this connection between the day, 3/11 though, and the name of a band and the day of an arrest and the verse Matthew that tells you clearly you have now been baptized in water and fire... it shows us the design of a story whose intent and purpose is to ensure that we no longer allow for things like hurricanes and earthquakes and murder and rape to be "simulated" that we build a better system, that doesn't allow for 'force majeure" to take lives for no reason at all.

      Inline image 19 Inline image 20 Inline image 21

      Not just in band names, but in the angels names too, in all of our names; we see this narration continue.  The Holy Water that is central to the baptism of Christ is etched into Taylor's name, between "sen" and "mom" the key to the two Mary's whose names contain the Spanish for "sea" in a sort of enlightenment hidden in plain sight.  In "Simmons" the key connection between today, this Biblical Monday, and the word "simulation" that ties to Simpsons and simians and keep it simple stupid*, and in Simmons the missing "s" of Kismet, finally completing the question.***

      ***\


      Inline image 23 Inline image 24*\


      *\ *

      It's a song and dance that started a long time ago, as you can see from the ancient Hebrew word for "fate" and in more recent years a connection to the ballroom of Atlantis in the Doors 5 to 1 and Dave sang about it in Rapunzel and then Taylor shook a tambourine on the beach only minutes away from me--but never said "hi."  The battle of the bands continues tying some door knocking to a juxtaposition between "Sweet Things" and "Knocking on Heavens door" all the way to a Gossip Girl episode where little J asked a question that I can't be sure she knew was related, she said... "who's that, at the door?"

      *\ *

      What it really all amounts to, though, is the whole world witnessing the Creation of Adam and Eve from a little girl stuttering out "the the" at the sight of the Grinch himself, and then later not even able to get those words off her lips... about seeing how Creation and modern art are inextricably tied to religion, to heaven, and to freedom.

      *\ *

      Inline image 25 Inline image 26*\ *

      *\ *

      The bottom line here, hopefully obvious now, is that you can't keep this message "simple" it's a Matrix woven between more points of light than I can count, and many more that I'm sure you will find.  It's a key to seeing how God speaks to me, and to you; and how we are, we really are that voice.  Tay, if you don't do something just because God called it "fate" you are significantly more enslaved than if you do--and you wanted to.  "Now I see that you and me, were never meant, never meant to be..." she sang before I mentioned her, and before she ever saw me... in a song she calls "Nothing Left to Lose" and I see is not really just another word for freedom.

      We have plenty to lose by not starting the fire, not the least of which is Heaven itself.  Understand what "force majeure" really means to you and I.  Ha, by the way.

      Inline image 22

      IN CASE YOU FORGOT YESTERDAY'S MESSAGE

      **\ **

      Inline image 6*\ *

      *\ *

      Inline image 27 Inline image 12

      "DADDY, I WANT IT NOW."

      VERUKA SALT. whose name means "to see (if) you are the Body of Christ" whined, in the story of Will Why Won Ka, about nothing more or less than Heaven on Hearth, than seeing an end to needless torture and pain.   To see if you are the "Salt of the Earth" warming the road to Heaven; honestly to see if you can break through this inane lie of "I don't understand" and realize that breaking this story and talking about what is being presented not just by me and you but by history and God himself is the key to the car that drives us home.  To see how Cupid you really are.

      Inline image 29

      STOP NODDING, TURN AROUND AND CALL A REPORTER.

      The story of Willy Wonka ties directly to the Promised Land of Flowing Milk and Honey to me; by showing us a river of chocolate and a the everlasting God starter, (er is it guardian of B stopper) that opens the doors of perception about exactly what kinds of mistake may have been made in the past in this transition to Heaven that we are well on the way of beginning.  Here, in the Land of Nod, that is also Eden and also the Heart of the Ark we see warnings about "flowing milk and honey" being akin to losing our stable ecosystem, to losing the stuff of life itself, biology and evolution, and if we don't understand--this is probably exactly the mistake that was made and the cause of the story of Cain and Abel.  So here we are talking about genetic engineering and mind uploading and living forever, and hopefully seeing that while all things are possible with God--losing the wisdom of the message of religion is akin to losing life in the Universe and with that any hope of eternal longevity.\ With some insight into religion, you can connect the idea that without bees our stable ecosystem might collapse, to the birds and the bees, and a message about stability and having more than one way to pollinate the flowers  and trees and get some.   Janet and Nanna, by the way, both have pretty brown eyes, but that probably comes as no surprise to you.\ Miss Everything, on the other hand (I hear, does not have brown eyes), leads us to glimpse how this message about the transition of our society might continue on in the New Testament, and suggest that we do need to eat, and have dinner conversation, and that a Last Supper might be a little bit more detrimental to our future than anyone had ever thought, over and over and over again.  To see how religion really does make clear that this is what the message is about, to replace the flowing milk we have a "Golden Cow" that epitomizes nothing less than "not listening to Adam" and we have a place that believes the Hammer of Judah Maccabee should be ... extinct.  You are wrong.

      Inline image 30*\ *

      *\ *

      Of course the vibrating light here ties this Gene to another musical piece disclosing something... "Wild Thing" I make your heart sing.  You can believe the Guitar Man is here to steal the show and deliver bread for the hungry and for the wise.  Here's some, it's not just Imagine Dragons telling you to listen to the radio but Jefferson Starship*too, and Live.  *

      *\ *

      When you wake up, you can hear God "singing" to you on the radio every single day; many of us already do.  He's telling you to listen to me, and I do not understand why you do not.  You don't look very Cupid, if you ask me.**

      ***\


      Inline image 31

      Inline image 32

      Inline image 33

      WHAT DO YOU THINK YOU ARE,

      DAN RE Y NO LDS?

      **\ **

      Inline image 14 Inline image 28

      I think we all know what the Rod of Jesus Christ is by now.

      Inline image 35​

      It is a large glowing testament to freedom and truth, and a statement about blindness and evil that is unmistakable.   To say that seeing it is the gateway to Heaven would be an understatement of it's worth, of the implication that not seeing it is obvious Hell when it is linked to everything from nearly every story of the Holy Bible from Isaac to Isaiah to "behold he is to coming" and if you weren't sure if the Hand of God were in action here--it's very clear that it is; that linking Tricky Dick and Watergate to Seagate ... really delivering crystal clear understanding that the foundation of Heaven is freedom and that you have none today because you refuse to see the truth.

      It is the doorway to seeing that what has been going on in this place hasn't been designed to hide me, but to hide a prosperous future from you--to hide the truth about our existence and the purpose of Creation--that all told, you are standing at the doorstep of Heaven and stammering your feet, closing your eyes, and saying "you don't want to help anyone."

      Inline image 36

      If delivering freedom, truth, and equality  to you does not a den make,

      well, you can all suck it

      ... from Godto you.

      **\ **

      Inline image 37

      Between Stargate and Star Trek it's pretty easy to see a roadmap to very quickly and easily be able to end world hunger and heal the sick without drastically changing the way our society works, it's about as simple as a microwave, or a new kind of medicine--except it's not so easy to see why it is that you are so reluctant to talk about the truth that makes these things so easy to do.  You see, your lack of regard for anyone anywhere has placed you in a position of weakness, and if you do nothing today, you will not be OK tomorrow.\ It's pretty easy to see how Roddenberry's name shows that this message comes from God, that he's created this map that starts with an Iron Rod throughout our history proving Creation, whose heart is a Den of Family who care about the truth, and about freedom, and about helping each other--not what you are--you are not that today.  Today you are sick, and I'd like you to look at the mirror he's made for you, and ***be eshamden (or asham). ***

      Inline image 13

      Realize, realize... what you are.  What you've become, just as I have... the devil in a sweet, sweet kiss.**

      ***\


      -Dave J. Matthews

      Inline image 1

      Unless otherwise indicated, this work was written between the Christmas and Easter seasons of 2017 and 2020(A). The content of this page is released to the public under the GNU GPL v2.0 license; additionally any reproduction or derivation of the work must be attributed to the author, Adam Marshall Dobrin along with a link back to this website, fromthemachine dotty org.

      That's a "." not "dotty" ... it's to stop SPAMmers. :/

      This document is "living" and I don't just mean in the Jeffersonian sense. It's more alive in the "Mayflower's and June Doors ..." living Ethereum contract sense and literally just as close to the Depp/C[aster/Paglen (and honorably PK] 'D-hath Transundancesense of the ... new meaning; as it is now published on Rinkeby, in "living contract" form. It is subject to change; without notice anywhere but here--and there--in the original spirit of the GPL 2.0. We are "one step closer to God" ... and do see that in that I mean ... it is a very real fusion of this document and the "spirit of my life" as well as the Spirit's of Kerouac's America and Vonnegut's Martian Mars and my Venutian Hotel ... and my fusion of Guy-A and GAIA; and the Spirit of the Earth .. and of course the God given and signed liberties in the Constitution of the United States of America. It is by and through my hand that this document and our X Commandments link to the Bill or Rights, and this story about an Exodus from slavery that literally begins here, in the post-apocalyptic American hartland. Written ... this day ... April 14, 2020 (hey, is this HADAD DAY?) ... in Margate FL, USA. For "official used-to-v TAX day" tomorrow, I'm going to add the "immultible incarnite pen" ... if added to the living "doc/app"--see is the DAO, the way--will initi8 the special secret "hidden level" .. we've all been looking for.

      Nor do just mean this website or the totality of my written works; nor do I only mean ... this particular derivation of the GPL 2.0+ modifications I continually source ... must be "from this website." I also mean the thing that is built from ... bits and piece of blocks of sand-toys; from Ethereum and from Rust and from our hands and eyes working together ... from this place, this cornerstone of the message that is ... written from brick and mortar words and events and people that have come before this poit of the "sealed W" that is this specific page and this time. It's 3:28; just five minutes--or is it four, too layne.

      This work is not to be redistributed according to the GPL unless all linked media on Youtube and related sites are intact--and historical references to the actual documented history of the art pieces (as I experience/d them) are also available for linking. Wikipedia references must be available for viewing, as well as the exact version of those pages at the time these pieces were written. All references to the Holy Bible must be "linked" (as they are or via ... impromptu in-transit re-linking) to the exact verses and versions of the Bible that I reference. These requirements, as well as the caveat and informational re-introduction to God's DAO above ... should be seen as material modifications to the original GPL2.0 that are retroactively applied to all works distributed under license via this site and all previous e-mails and sites. /s/ wso\ If you wanna talk to me get me on facebook, with PGP via FlowCrypt or adam at from the machine dotty org

      -----BEGIN PGP PUBLIC KEY BLOCK-----

      mQGNBF6RVvABDAC823JcYvgpEpy45z2EPgwJ9ZCL+pSFVnlgPKQAGD52q+kuckNZ mU3gbj1FIx/mwJJtaWZW6jaLDHLAZNJps93qpwdMCx0llhQogc8YN3j9RND7cTP5 eV8dS6z/9ta6TFOfwSZpsOZjCU7KFDStKcoulmvIGrr9wzaUr7fmDyE7cFp1KCZ0 i90oLYHqOIszRedvwCO/kBxawxzZuJ67DypcayiWyxqRHRmMZH1LejTaqTuEu0bp j54maTj09vnMxA0RfS+CtU5uMq+5fTkbiTOe1LrLD72m+PVJIS146FwESrMJEfJy oNqWEJlUQ0TecPZR41vnkSkpocE1/0YqUhWDGSht+67DdeKUg5KwvYdL21d/bSyO SM4jnyKn9aDVzLBpYrlE/lbFxujHPRGlRG5WtiPQuZYDRqP0GYFSXRpeUCI46f49 iPFo4eHo2jUfNDa9r9BjQdAe4zVFn2qLnOy8RWijlolbhGMHGO3w/uC/zad3jjo4 owAfsJjH5Oa1mTcAEQEAAbQmRUFSVEhFTkUgPGVhcnRoZW5lQGZyb210aGVtYWNo aW5lLm9yZz6JAdQEEwEKAD4WIQTUJHbrYn3y2DzwTcnQP1ViZf5/FQUCXpFW8AIb AwUJA8JnAAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRDQP1ViZf5/FWM6C/9J gbRLS2AWGjdRjYetlRkSkCoTYnXWknbtipYYHlhV0YJFwFMm0ydZIhFX5VDoZyBV 0UBeF1KJmcMoIfrHyhq2QhCnjE14hE1ONbaYTGtpvj851ItbFWXMJIVNyMqr+JT9 CWIxGr1idn+iHWE3nryiHrdlA3O/Gcd4EyNmaSe/JvB7+Z1AVqWkRhpjxxoPSlPm HEdqGOyl3+5ibQgUvXLRWWQXAj80CbVwwj1X4r9hfuCySxLT8Mir7NUXZFd+OiMS U8gNYjcyRGmI92z5lgf7djBbb9dMLwV0KLzgoT/xaupRvvYOIAT+n2mhCctCiH7x y7jYlJHd+0++rgUST2sT+9kbuQ0GxpJ7MZcKbS1n60La+IEEIpFled8eqwwDfcui uezO7RIzQ9wHSn688CDri9jmYhjp5s0HKuN61etJ1glu9jWgG76EZ3qW8zu4l4CH 9iFPHeGG7fa/5d07KvcZuS2fVACoMipTxTIouN7vL0daYwP3VFg63FNTwCU3HEq5 AY0EXpFW8AEMANh7M/ROrQxb3MCT1/PYco1tyscNo2eHHTtgrnHrpKEPCfRryx3r PllaRYP0ri5eFzt25ObHAjcnZgilnwxngm6S9QvUIaLLQh67RP1h8I4qyFzueYPs oY8xo1zwXz7klXVlZW0MYi/g5gpb+rpYUfZEJGJTBM/wMNqwwlct+BSZca4+TEHW g6oN0eXTthtGB0Qls71sv3tbOnOh/67NTwyhcHPWX/P9ilcjGsEiT8hqrpyhjAUm mv7ADi+2eRBV8Xf8JnPznFf0A1FdILVeVHlmsgCSB0FW0NsFI5niZbaYBHDbFsks QdaFaYd54DHln69tnwc2y3POFwx8kwZnMPPlVAR2QdxGQD4Wql7hlWT58xCxQApf M98kbAHjUlVYLT0WUHMDQtj4jdzAVVDiMGMUrbnQ7UwI7LexSB6cJ7H+i7FtS/pR WOhJK6awoOO9dLnEjm6UYCKsBdtJr98F0T7Sb7PnKOGA77y2QN14+u9N9C1lB/Z1 aQRQ2Nc51yXOQQARAQABiQG8BBgBCgAmFiEE1CR262J98tg88E3J0D9VYmX+fxUF Al6RVvACGwwFCQPCZwAACgkQ0D9VYmX+fxU+KQwAtFnWjGIjvqaNXtQjEhbGDH/I Q5ULq/l/wm9SmhG9NYRu3+P6YctCJaZnNeaL+6WFk1jo4LMiJEUT9uGlCbHqJNaI 6Gll1w6QOVLSL8s5V1L477+psluv4WBpi3XkWYlhDOFENCcWd49RQsA2YCX4pW7Q 7GcoSEJoav38MxHmJHYPfjSEvUZXDQIt8PFHSEScvyDWfYtMdRzjmSOOPdzhDDEy 5JBOBcEdSTyDiyDU/sBoAY0e8lvwHYW3p+guZSGSYVhGQ8JECzJOzwc/msMW/tJS 2MLWmWVh5/1P8BVUtLC2AQy6nij6o+h6vEiNzpdYrc+rzT3X5cACvJ0RtCZcrnhl O9PLiona2LEbry6QX5NL41/SAJNno3i72xPnQEe25gn3nbyT+jCoJzw2L0y8pmNB D+PKrk7/1ROFFVN8dJeGwxLGdBcz1zk2xeumzy7OaV8psUyYsJNcjyHUKgclblBW rMR2DgqEYn8QdK54ziKCnmQQZeMPiC6wlUWgg5IqmQGNBF6RVyMBDADALD7NkJ5H dtoOpoZmAbPSlVGXHDbJZuq7J13vew6dtXDIAraeGrsBqkF8bhddwVLzWylMrYCG Bf2L1+5BDgvqu6G+6dcVSbBsnZAS0zfJ0H8EmTvUMxMF7qOZYyrxfLz+pQRq8Osz Icab6ZI/KB6qZyQRvEFPB6pJjt+VvuwgJZTObIwbBbgQri2i02VBkjchsVhiSX9l +eiK7O8ROHKb3P181oScIsHywBOZ9DxRAYbFk5dnBqxO3WKb02H0zqE6440cjXwq TrZZg6ayN/IlPajO8iJPYZ1aIBykxYq1WHo+nhFMYz/VVk2WJorFeOgWaLGXb73c ty96f3qXTdvMDAIWHx8YCD5LbuqasO6LNQm4oQxkCoB3K9WFf/2SvSYb7yMYykb8 clTPt+KO0dsxjWhrJnfnIhC+2Chqv2QvRbFz0S9CpUnGGDweJ1uRNV0y70tO0q7t xXSTDRU3ib6vAHA0K/2MFzwUcog4o5bj7E9uCNJH/DJLZKsMIe4xsvkAEQEAAbQk SEVBVkVOVUVTIDxBVkVOVUBGUk9NVEhFTUFDSElORS5PUkc+iQHUBBMBCgA+FiEE IRklfU/C1qukq3xMXcNH0t3P9ZsFAl6RVyMCGwMFCQPCZwAFCwkIBwIGFQoJCAsC BBYCAwECHgECF4AACgkQXcNH0t3P9Zs+kgv/XEuuWc89Bjg1QQqKZueKNUHjyjnE 2adfoZUH6Q7ir4JZyRBCVpAwrgssmiKid30+SIjwQcpb9JYa/X1XJcDUcJW/I21d Agz/zbEqn/Cou0dUpNCtxgm4BdSHWGoOtgfspXZlXBQ407tRMZ8ykmLB1Bt0oHvw PT0ZOtqXM4pyFnd2eFe5YGbNgl3zqvoC/6CMN3vqswvRlu1BpUuAjdW8AHO5Yvje +Bp852u+4Qpy6PMBiWGsBMYwtf6T7sckpMGlR0TsozwBlAm5ePKK28B0rLJPkZLJ Eo5p4rKRapEaZsWV5Qu1ajrVru7qmpUhZtX0/DddGHfXVuLssmKLP6TumpQB1zvQ vfoBltjvOx35Wps2vHuCzXLw2bROIOzhAxFB+17zxnSbE54N4LIGRpkELuwxwGbg FtD1fi9KtH7xcn33eOK1+UD47V+hKyJGrQgSThly2zdIC2bvfHtFdfp8lOFpT0AU xjEeoJGqdQVupptXyugPlM5/96UJP8OZG0ADuQGNBF6RVyMBDAC3As6eMkoEo3z9 TkCWlvS0vBQmY3gF0VEjlAIqFWpDIdK3zVzMnKUokIT1i7nkadLzHZT2grB4VXuJ FvpbYw5NPR4cDe9grlOMLEaF3oSJ1jZ4V1/rj9v1Hddo8ELi/NToVrt1SB5GCVXB DkYpNLtTiCqHSU07YqwaqH8a+qbDmPxSQdIybkZiTiCEB+6PfQQlBpENEDlov6jm zZF+IcfM6s3kZDX5KFULweH30gMjq8Se8bPtUzW013+tuuwEVr1/YRLrIh+9O6Z+ pdA7gLMRYnD9ZLDytEvpb1lBBSY++5bIJ7xps80//DNqPYqwFmZQgTg0V9XbHE2e wLcOF8a2lYluckU7D///sWQhW+VxuM7R2gEBvYBhOgjWhIF2Aw6NbymW1Ontvyhu eOZCXXxV5W44PxXT8uDdhl9CNcHoBKKJyED8tKjigtn4axpsQeUrnOSbqEXSyqES WnE2wYUDzALcwFkzsvtLyd4xaz55KkPQkAkk0BZd1ezgXxb/obMAEQEAAYkBvAQY AQoAJhYhBCEZJX1PwtarpKt8TF3DR9Ldz/WbBQJekVcjAhsMBQkDwmcAAAoJEF3D R9Ldz/WbAFwL/382HsrldVXnkPmJ1E2YEOFz4rcHRetJ+M5H65K/2p32ONQ5KCbE s8MRY6g2CkE70en2HlpDwr/MdATwxBzIjEpjgHbfqCqVVATY+kSpXsttaKKAUVHi bFgV4QkdDJNSpcHEj+bqaggRnuWiV9T6ECG7kQjHiEXPNojzsiaXMDiM5r+acZm6 82id9qOFySQ2cZEy5HbwXM+ITLQGngnppa7du2KdgiqDeqtODOTWZvLYAq2tmEwD 3TT6ttLUBwOOu2IWpDkXswlrk62ESorE5mpLxop9fsxD39E2H06JoC/YfUPIVkEv fj06e7LEdcx0I7kRfD1v6qOUUsMsLZnmyGIk24iFjLkwu1VToWfwXDN1D2+SeAat 9ydNt4M7oEbd1QaOXXjmqpdU+VUiWcBXg+p3/WdV60MkyAgc3x+YanLljy/Rh18h cZwVlinf/tgvAQLi5f9hpwrwUMoGKijEYHKuEvi3C12Si7UVDfuIR7yS0dKcfuKF MbgwdvNXqpD9W5kBjQRekVd4AQwApHVgw2PVlBDpVcyoymUOXFQIJzJ9wRtr6/sG zwv8rrQnUEtOkkna7TDU3/UTj9FUH0gbpAKGNNPaPj5q0dlLIvzxb15r1uvDGaGL MA+8GFaGFnkxzhg0aXrcKZAN0/Zhgi2B7P8oXQuug5mi1JVDkZN5SeCZNOubdQWL 3xz3jEHp3ixj1mdOdvfdWQFR4CVMXt/A6VI2ujLVb3Yalft/c5bbclAgcJQhgDUu NqGYJEJonESNRSd8fEvhNb6cx7+Djd9+Wyctr76mwOr3nRb1N1OGhFxWjIroUpfz b+6y3oQjT58cJA1ZHqmJ6UlZd81hNNd9KWpbDVwONEPpiqPzfSaonxuqQa0/Cy4W 403OhfoLM/1ZDqD4YrJ/rpyNEfSSdqptWiY0KeErLOYng7rStW/4ZeZVj6b2xxB2 Oas/Z1QYfJyFUki9vaJ5IyN6Y7nVdSP6mbAQC9ESh+VPvRUMpYi4pMGK4rweBVHu oMRRwzk7W5zVIgd425WUe3eCQFn3ABEBAAG0K0VTQ0FQRSBST09NIDxFU0NBUEFF REVTQEZST01USEVNQUNISU5FLk9SRz6JAdQEEwEKAD4WIQTvnDJqcmqzlF87/t82 pJ91j4NOaAUCXpFXeAIbAwUJA8JnAAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAK CRA2pJ91j4NOaJVjC/4oo5yCHe7M2h1DiTXVcLI5rXQ1feY7B1feg+YJX/mI4+EV xjC/y5VVpV4syJk5GGZNXhKPHiGLaBYvglTlYOJ98RSEsHrwT3go6S8ZVvMNdP5v CEncn2vm5JGnp4k26PuOzMcJioQLOoUjWtcPFis3gG+ueH3NcPZ22oZUql2xuerh TQZegGp+jJ7bdxwYElx5jDDDkh196d5nlO2ZKENl0ZDp4GAzRNjnQ7KBV6R74J3U cLQDWY8vAFaRBZXIC5XtSzj9lr+jWgvxz7Il51+26VDTEtSafZ2uZfCOFk7GrzJg

      sneak preview

      now linking to the next page ... in the discussion:

      https://fromthemachine.org/2017/08/waiting-for-that-green-light.html

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Public Review

      Summary:

      (1) This work describes a simple mechanical model of worm locomotion, using a series of rigid segments connected by damped torsional springs and immersed in a viscous fluid.

      (2) It uses this model to simulate forward crawling movement, as well as omega turns.

      Strengths:

      (3) The primary strength is in applying a biomechanical model to omega-turn behaviors.

      (4) The biomechanics of nematode turning behaviors are relatively less well described and understood than forward crawling.

      (5) The model itself may be a useful implementation to other researchers, particularly owing to its simplicity.

      Weaknesses:

      (6) The strength of the model presented in this work relative to prior approaches is not well supported, and in general, the paper would be improved with a better description of the broader context of existing modeling literature related to undulatory locomotion.

      (7) This paper claims to improve on previous approaches to taking body shapes as inputs.

      (8) However, the sole nematode model cited aims to do something different, and arguably more significant, which is to use experimentally derived parameters to model both the neural circuits that induce locomotion as well as the biomechanics and to subsequently compare the model to experimental data.

      (9) Other modeling approaches do take experimental body kinematics as inputs and use them to produce force fields, however, they are not cited or discussed.

      (10) Finally, the overall novelty of the approach is questionable.

      (11) A functionally similar approach was developed in 2012 to describe worm locomotion in lattices (Majmudar, 2012, Roy. Soc. Int.), which is not discussed and would provide an interesting comparison and needed context.

      9-11: The paper you recommended and our manuscript have some similarities and differences.

      Similarities

      Firstly, the components constituting the worm are similar in both models. ElegansBot models the worm as a chain of n rods, while the study by Majmudar et al. (2012) models it as a chain of n beads. Each bead in the Majmudar et al. model has a directional vector, making it very similar to ElegansBot's rod. However, there's a notable difference: in the Majmudar et al. model, each bead has an area for detecting contact between the obstacle and the bead, while in ElegansBot, the rod does not feature such an area.

      Secondly, the types of forces and torques acting on the components constituting the worm are similar. Each rod in ElegansBot receives frictional force, muscle force, and joint force. Each bead in the Majmudar et al. model receives a constraint force, viscous force, and a repulsive force from obstacles. Each rod in ElegansBot receives frictional torque, muscle torque, and joint torque. Each bead in the Majmudar et al. model receives elastic torque, constraint torque, drive torque, and viscous torque. The Majmudar et al. model's constraint force and torque are similar to ElegansBot's joint force and torque in that they prevent two connected components of the worm from separating. The Majmudar et al. model's viscous force and torque are similar to ElegansBot's frictional force and torque in that they are forces exchanged between the worm and its surrounding environment (ground surface). The Majmudar et al. model's drive torque is similar to ElegansBot's muscle force and muscle torque as a cause of the worm's motion. However, unlike ElegansBot, the Majmudar et al. model did not consider the force generating the drive torque, and there are differences in how each force and torque is calculated. This will be discussed in more detail below.

      Differences

      Firstly, the medium in which the worm locomotes is different. ElegansBot is a model describing motion in a homogeneous medium like agar or water without obstacles, while the Majmudar et al. model describes motion in water with circular obstacles fixed at each lattice point. This is because the purposes of the models are different. ElegansBot analyzes locomotion patterns based on the friction coefficient, while the Majmudar et al. model analyzes locomotion patterns based on the characteristics of the obstacle lattice, such as the distance between obstacles. Also, for this reason, the Majmudar et al. model's bead, unlike ElegansBot's rod, receives a repulsive force from obstacles.

      Secondly, the specific methods of calculating similar types of forces differ. ElegansBot calculates joint forces by substituting frictional forces, muscle forces, frictional torques, and muscle torques into an equation derived from differentiating a boundary condition equation twice over time, where two neighboring rods always meet at one point. This involves determining the process through which various forces and torques are transmitted across the worm. Specifically, it entails calculating how the frictional forces and torques, as well as the muscle forces and torques acting on each rod, are distributed throughout the entire length of the worm. In contrast, The Majmudar et al. model uses Lagrange multipliers method based on a boundary condition that the curve length determined by each bead's tangential angle does not change, to calculate the constraint force and torque before calculating the drive torque and viscous force. This implies that the Majmudar et al. model did not consider the mechanism by which the drive torque and viscous force received by one bead are distributed throughout the worm. ElegansBot's rod receives an anisotropic Stokes frictional force from the ground surface, while the Majmudar et al. model considered the frictional force according to the Navier-Stokes equation for incompressible fluid, assuming the fluid velocity at the bead's location as the bead's velocity.

      Thirdly, unlike the Majmudar et al. model, ElegansBot considers the inertia of the worm components. Therefore, ElegansBot can simulate regardless of how low or high the ground surface's friction coefficient is. the Majmudar et al. model is not like this.

      (12) The idea of applying biomechanical models to describe omega turns in C. elegans is a good one, however, the kinematic basis of the model as used in this paper (the authors do note that the control angle could be connected to a neural model, but don't do so in this work) limits the generation of neuromechanical control hypotheses.

      8, 12: We do not agree with the claim that ElegansBot could limit other researchers in generating neuromechanical control hypotheses. The term θ_("ctrl" ,i)^((t) ) used in our model is designed to be replaceable with neuromechanical control in the future.

      (13) The model may provide insights into the biomechanics of such behaviors, however, the results described are very minimal and are purely qualitative.

      (14-1) Overall, direct comparisons to the experiments are lacking or unclear.

      14-1: If you look at the text explaining Fig. 2 and 5 (Fig. 2 and 4 in old version), it directly compares the velocity, wave-number, and period as numerical indicators representing the behavior of the worm, between the experiment and ElegansBot.

      (14-2) Furthermore, the paper claims the value of the model is to produce the force fields from a given body shape, but the force fields from omega turns are only pictured qualitatively.

      13, 14-2: We gratefully accept the point that our analysis of the omega-turn is qualitative. Therefore, we have conducted additional quantitative analysis on the omega-turn and inserted the results into the new Fig. 4. We have considered the term 'Force field' as referring to the force vector received by each rod. We have created numerical indicators representing various behaviors of the worm and included them in the revised manuscript.

      (15) No comparison is made to other behaviors (the force experienced during crawling relative to turning for example might be interesting to consider) and the dependence of the behavior on the model parameters is not explored (for example, how does the omega turn change as the drag coefficients are changed).

      Thank you for the great idea. To compare behaviors, first, a clear criterion for distinguishing behaviors is needed. Therefore, we have created a new mathematical definition for behavior classification in the revised manuscript (“Defining Behavioral Categories” in Method). After that, we compared the force and power (energy consuming rate) between each forward locomotion, backward locomotion, and omega-turn (Fig. 4). And in the revised manuscript, we newly analyzed how the turning behavior changes with variations in the friction coefficients in Figs. S4-S7.

      (16) If the purpose of this paper is to recapitulate the swim-to-crawl transition with a simple model, and then apply the model to new behaviors, a more detailed analysis of the behavior of the model variables and their dependence on the variables would make for a stronger result.

      In our revised manuscript, we have quantitatively analyzed the changes occurring in turning behavior from water to agar, and the results are presented in Figs. S9 and S10.

      (17) In some sense, because the model takes kinematics as an input and uses previously established techniques to model mechanics, it is unsurprising that it can reproduce experimentally observed kinematics, however, the forces calculated and the variation of parameters could be of interest.

      (18) Relatedly, a justification of why the drag coefficients had to be changed by a factor of 100 should be explored.

      (19) Plate conditions are difficult to replicate and the rheology of plates likely depends on a number of factors, but is for example, changes in hydration level likely to produce a 100-fold change in drag? or something more interesting/subtle within the model producing the discrepancy?

      18, 19: As mentioned in the paper, we do not know if the friction coefficients in the study of Boyle et al. (2012) and the friction coefficients in the experiment of Stephens et al. (2016) are the same. In our revised manuscript, we have explored more in detail the effects of the friction coefficient's scale factor, and explained why we chose a scale factor of 1/100 (“Proper Selection of Friction Coefficients” in Supplementary Information). In summary, we analyzed the changes in trajectory due to scaling of the friction coefficient, and chose the scale factor 1/100 as it allowed ElegansBot to accurately reproduce the worm's trajectory while also being close to the friction coefficients in the Boyle et al. paper.

      (20) Finally, the language used to distinguish different modeling approaches was often unclear.

      (21) For example, it was unclear in what sense the model presented in Boyle, 2012 was a "kinetic model" and in many situations, it appeared that the term kinematic might have been more appropriate. Thank you for the feedback. As you pointed it out, we have corrected that part to 'kinematic' in the revised manuscript.

      (22) Other phrases like "frictional forces caused by the tension of its muscles" were unclear at first glance, and might benefit from revision and more canonical usage of terms.

      We agree that the expression may not be immediately clear. This is due to the word limit for the abstract (the abstract of eLife VOR should be under 200 words, and our paper's abstract is 198 words), which forced us to convey the causality in a limited number of words. Therefore, although we will not change the abstract, the expression in question means that the muscle tension, which is the cause of the worm's locomotion, ultimately generates the frictional force between the worm and the ground surface.

      Recommendations For The Authors

      (23) As I stated in my public review, I think the paper could be made much stronger if a more detailed exploration of turning mechanics was presented.

      (24) Relatedly, rather than restricting the analysis to individual videos of turning behaviors, I wonder if a parameterized model of the turning kinematics would be fruitful to study, to try to understand how different turning gaits might be more or less energetically favorable.

      We thank the reviewer once again for their suggestion. Thanks to their proposal, we were able to conduct additional quantitative analysis on turning behavior.

      Reviewer #2

      Public Review

      Summary:

      (1) Developing a mechanical model of C. elegans is difficult to do from basic principles because it moves at a low (but not very small) Reynolds number, is itself visco-elastic, and often is measured moving at a solid/liquid interface.

      (2) The ElegansBot is a good first step at a kinetic model that reproduces a wide range of C. elegans motiliy behavior.

      Strengths: (3) The model is general due to its simplicity and likely useful for various undulatory movements.

      (4) The model reproduces experimental movement data using realistic physical parameters (e.g. drags, forces, etc).

      (5) The model is predictive (semi?) as shown in the liquid-to-solid gait transition.

      (6) The model is straightforward in implementation and so likely is adaptable to modification and addition of control circuits.

      Weaknesses:

      (7) Since the inputs to the model are the actual shape changes in time, parameterized as angles (or curvature), the ability of the model to reproduce a realistic facsimile of C. elegans motion is not really a huge surprise. (8) The authors do not include some important physical parameters in the model and should explain in the text these assumptions.

      (9. 1) The cuticle stiffness is significant and has been measured [1].

      (10. 2) The body of C. elegans is under high hydrostatic pressure which adds an additional stiffness [2].

      (11. 3) The visco-elasticity of C. elegans body has been measured. [3]

      Thank you for asking. The stiffness of C. elegans is an important consideration. We took this into account when creating ElegansBot, but did not explain it in the paper. The detailed explanation is as follows. C. elegans indeed has stiffness due to its cuticle and internal pressure. This stiffness is treated as a passive elastic force (elastic force term of lateral passive body force) in the paper of Boyle et al. (2012). However, the maximum spring constant of the passive elastic force is 1/20 of the maximum spring constant of the active elastic force. If we consider this fact in our model, the elastic term of the muscle torque is as follows: ( is the active torque elasticity coefficient, is the passive torque elasticity coefficient)

      where

      Therefore, there is no need to describe the active and passive terms separately in

      Furthermore, since , assuming , then and .

      (12) There is only a very brief mention of proprioception.

      (13) The lack of inclusion of proprioception in the model should be mentioned and referenced in more detail in my opinion.

      As you emphasized, proprioception is an important aspect in the study of C. elegans' locomotion. In our paper, its importance is briefly introduced with a sentence each in the introduction and discussion. However, our research is a model about the process of the creation of body motion originated from muscle forces, and it does not model the sensory system that senses body posture. Therefore, there is no mention of using proprioception in our paper's results section. What is mentioned in the discussion is that ElegansBot can be applied as the kinetic body model part in a combination model of a kinetic body model and a neuronal circuit model that receives proprioception as a sensory signal.

      (14) These are just suggested references.

      (15) There may be more relevant ones available.

      The papers you provided contain specific information about the Young's modulus of the C. elegans body. The first paper (Rahimi et al., 2022) measured the Young's modulus of the cuticle after chemically isolating it from C. elegans, while the second paper (Park et al., 2007) and third paper (Backholm et al., 2013) measured the elasticity and Young's modulus of C. elegans without separating the cuticle. Based on the Young's modulus provided in each paper (although the second and third papers did not measure stiffness in the longitudinal direction), we derived the elastic coefficient (assuming a worm radius of 25 μm, cuticle thickness of 0.5 μm, and 1/25 of longitudinal length of the cuticle of 40 μm). The range was quite broad, from 9.82ⅹ1011 μg/sec2 (from the first paper) to 2.16 ⅹ 108 μg / sec2 (from the third paper). Although the elastic coefficient value in our paper falls within this range, since the range of the elastic coefficient is wide, we think we can modify the elastic coefficient in our paper and will be able to reapply our model if more accurate values become known in the future.

      Reviewer #3

      Public Review

      Summary:

      (1) A mechanical model is used with input force patterns to generate output curvature patterns, corresponding to a number of different locomotion behaviors in C. elegans

      Strengths:

      (2) The use of a mechanical model to study a variety of locomotor sequences and the grounding in empirical data are strengths.

      (3) The matching of speeds (though qualitative and shown only on agar) is a strength.

      Weaknesses:

      (4) What is the relation between input and output data?

      ElegansBot takes the worm's body control angle as the input, and produces trajectory and force of each segment of the worm as the output.

      (5) How does the input-output relation depend on the parameters of the model?

      If 'parameter' is understood as vertical and horizontal friction coefficients, then the explanation for this can be found in Fig. 5 (Fig. 4 in the old version).

      (6) What biological questions are addressed and can significant model predictions be made?

      Equation of motion deciphering locomotion of C. elegans including turning behaviors which were relatively less well understood.

      Recommendations For The Authors

      (7) The novelty and significance of the paper should be clarified.

      We have added quantitative analyses of turning behavior in the revised manuscript, and we hope this will be helpful to you.

      (8) Previously much more detailed models have been published, as compared to this one.

      We hope the reviewer can point out any previous model that we may have missed.

      (9) The mechanics here are simplified (e.g. no information about dorsal/ventral innervation but only a bending angle) setting limitations on the capacity for model predictiveness.

      (10) Such limitations should be discussed.

      We view the difference between dorsal/ventral innervation and bending angle not as a matter of simplification, but rather as a reflection of the hierarchy that our model implements. Our model does not consider dorsal/ventral innervation, but it uses the bending angle to reproduce behavior in various input and frictional environments, which signifies the strong predictiveness of ElegansBot (Figure 2, 3, 5 (2, 3, 4 in the old version)). Moreover, if the midline of C. elegans is incompressible, then modeling by dividing into dorsal/ventral, as opposed to modeling solely with the bending angle, does not increase the degree of freedom of the worm model, and therefore does not increase its predictiveness.

      (11) The aims of the paper and results need to be supported quantitatively and analyzed through parameter sweeps and intervention.

      We have conducted additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

      (12) The methods are given only in broad brushstrokes, and need to be much more clear (and ideally sharing all code).

      We have thoroughly detailed every aspect of this research, from deriving the physical constants of C. elegans, agar, and water to developing the formulas and proofs necessary for operating ElegansBot and its applications. This comprehensive information is all presented in the Results, Methods, and Supplementary Information sections, as well as in the source code. Moreover, we have already ensured that our research can be easily reproduced by providing detailed explanations and by making ElegansBot accessible through public software databases (PyPI, GitHub). To further aid in its application and understanding, especially for those less familiar with the subject, we have also included minimal code as examples in the database. This code is designed to simplify the process of reproducing the results of the paper, thereby making our research more accessible and understandable. Therefore, we believe that readers will easily gain significant assistance from the extensive information we have provided. Should readers require further help, they can always contact us, and we will be readily available to offer support.

      (13) The supporting figures and movies need to include a detailed analysis to evidence the claims.

      We have conducted and provided additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      This manuscript provides some valuable findings concerning the hippocampal circuitry and the potential role of adult-born granule cells in an interesting long-term social memory retrieval. The behavior experiments and strategy employed to understand how adult-born granule cells contribute to long-term social discrimination memory are interesting.

      We thank the reviewer for the positive evaluation.

      I have a few concerns, however with the strength of the evidence presented for some of the experiments. The data presented and the method described is incomplete in describing the connection between cell types in CA2 and the projections from abGCs. Likewise, I worry about the interpretation of the data in Figures 1 and 2 given the employed methodology. I think that the interpretation should be broadened. This second concern does not impact the interest and significance of the findings.

      In response to this concern, we have removed the data concerning abGC projections to PCP4+ and PV-GFP+ cell bodies from Figure 1 and have focused this analysis on dendrites. We now provide high magnification images of dendrites and expand on the methodology, results, and interpretations in the manuscript. We also broaden the interpretation throughout the manuscript to address the reviewer’s concern.

      Strengths:

      The behavior experiments are beautifully designed and executed. The experimental strategy is interesting.

      We appreciate these positive comments.

      Weaknesses:

      The interpretation of the results may not be justified given the methods and details provided.

      We have addressed this concern by providing more methodological details and broadening our interpretation of the results.

      Reviewer #2:

      Summary:

      Laham et al. investigate how the projection from adult-born granule cells into CA2 affects the retrieval of social memories at various developmental points. They use chemogenetic manipulations and electrophysiological recordings to test how this projection affects hippocampal network properties during behavior. I find the study to be very interesting, the results are important for our understanding of how social memories of different natures (remote or immediate) are encoded and supported by the hippocampal circuitry. I have some points that I added below that I think could help clarify the conclusions:

      We appreciate the positive assessment and have addressed the more specific points below.

      My major concern with the manuscript was that making the transitions between the different experiments for each result section is not very smooth. Maybe they can discuss a bit in a summary conclusion sentence at the end of each result section why the next set of experiments is the most logical step.

      In response, we have added summary conclusion sentences at the end of each result section.

      In line 113, the authors say that "the DG is known to influence hippocampal theta-gamma coupling and SWRs". Another recent study Fernandez-Ruiz et al. 2021, examined how various gamma frequencies in the dentate gyrus modulate hippocampal dynamics.

      We cite this paper in the revised manuscript.

      Having no single cells in the electrophysiological recordings makes it difficult to interpret the ephys part. Perhaps having a discussion on this would help interpret the results. If more SWRs are produced from the CA2 region (perhaps aided by projections from abGC), more CA2 cells that respond to social stimuli (Oliva et al. 2020) would reactivate the memories, therefore making them consolidate faster/stronger. On the other hand, the projections from abGC that the authors see, also target a great deal of PV+ interneurons, which have been shown to pace the SWRs frequency (Stark et al 2014, Gan et al 2017), which further suggests that this projection could be involved in SWRs modulation.

      We discuss these possibilities and cite Gan et al 2017, Schlingloff et al., 2014, and Stark et al., 2014 in the revised manuscript.

      The authors should cite and discuss Shuo et al., 2022 (A hypothalamic novelty signal modulates hippocampal memory).

      We mention Chen et al (A hypothalamic novelty signal modulates hippocampal memory.) in the revised manuscript. “Shuo” is the first name of the first author on this paper, so we believe that this is the same paper to which the reviewer refers.

      I think the authors forgot to refer to Fig 3a-f, maybe around lines 163-168.

      We thank the reviewer for pointing out this error. In the revised manuscript, we refer to all figure panels. Since Fig 3 is now broken into two figures (Fig 3 and 4), the panel lettering has changed in the revised manuscript.

      Are the SWRs counted only during interaction time or throughout the whole behavior session for each condition?

      The SWRs are counted throughout the whole behavior session for each condition. This is now stated in the revised manuscript.

      Figure 3t shows a shift in the preferred gamma phase within theta cycles as a result of abGC projections to CA2 ablation with CNO, especially during Mother CNO condition. I think this result is worth mentioning in the text.

      We now mention this finding in the revised manuscript.

      Figure 3u in the legend mention "scale bars = 200um", what does this refer to?

      The scale bar refers to that shown in Figure 3b, which is now indicated in the legend.

      What exactly is calculated as SWR average integral? Is it a cumulative rate? Please clarify.

      The integral measure provides information regarding the average total power of SWR events. It sums z-scored amplitude values from beginning to the end of each SWR envelope, and then takes the average across all summed envelopes. SWR integral has been shown to influence SWR propagation (De Filippo and Schmitz, 2023). This is now described in the text.

      Alexander et al 2017, "CA2 neuronal activity controls hippocampal oscillations and social behavior", examined some of the CA2 effects in the hippocampal network after CNO silencing, and the authors should cite it.

      Alexander et al., 2018, which we believe is the relevant paper, is now cited in the revised manuscript.

      Strengths:

      Behavioral experiments after abGC projections to CA2 are compelling as they show clearly distinct behavioral readout.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      Electrophysiological experiments are difficult to interpret without additional quantifications (single-cell responses during interactions etc.)

      We have addressed this concern by expanding the interpretation of our results.

      Reviewer #3:

      Laham et al. present a manuscript investigating the function of adult-born granule cells (abGCs) projecting to the CA2 region of the hippocampus during social memory. It should be noted that no function for the general DG to CA2 projection has been proposed yet. The authors use targeted ablation, chemogenetic silencing, and in vivo ephys to demonstrate that the abGCs to CA2 projection is necessary for the retrieval of remote social memories such as the memory of one's mother. They also use in vivo ephys to show that abGCs are necessary for differential CA2 network activity, including theta-gamma coupling and sharp wave-ripples, in response to novel versus familiar social stimuli.

      The question investigated is important since the function of DG to CA2 projection remained elusive a decade after its discovery. Overall, the results are interesting but focused on the social memory of the mother, and their description in the manuscript and figures is too cursory. For example, raw interaction times must be shown before their difference. The assumption that mice exhibit social preference between familiar or novel individuals such as mother and non-mother based on social memory formation, consolidation, and retrieval should be better explained throughout the manuscript. Thus, when describing the results, the authors should comment on changes in preference and how this can be interpreted as a change in social memory retrieval. Several critical experimental details such as the total time of presentation to the mother and non-mother stimulus mice are also lacking in the manuscript. The in vivo e-phys results are interesting as well but even more succinct with no proposed mechanism as to how abGCs could regulate SWR and PAC in CA2.

      In response to these comments, we provide raw interaction times in a new Figure (Fig. S1). We also provide more information about the experiments and figures in the revision. We explain the rationale for our behavioral interpretations and discuss proposed mechanisms for how abGCs regulate SWR and PAC.

      The manuscript is well-written with the appropriate references. The choice of the behavioral test is somewhat debatable, however. It is surprising that the authors chose to use a direct presentation test (presentation of the mother and non-mother in alternation) instead of the classical 3-chamber test which is particularly appropriate to investigate social preference. Since the authors focused exclusively on this preference, the 3-chamber test would have been more adequate in my opinion. It would greatly strengthen the results if the authors could repeat a key experiment from their investigation using such a test. In addition, the authors only impaired the mother's memory. An additional experiment showing that disruption of the abGCs to CA2 circuit impairs social memory retrieval would allow us to generalize the findings to social memories in general. As the manuscript stands, the authors can only conclude the importance of this circuit for the memory of the mother. Developmental memory implies the memory of familiar kin as well.

      We selected the direct social interaction test because it allows for more naturalistic social behaviors than measuring investigation times toward social stimuli located inside wire mesh containers. We also decided to focus our studies on the retrieval of mother memories because these are likely the first social memories to be formed. We emphasize that our results cannot be generalized to memories of other social stimuli but given studies on recent social memory formation and retrieval in adults that manipulate abGCs and CA2 separately, we feel that it is likely that this circuit is involved in these functions as well. However, we specify throughout the manuscript that our experiments can only tell us about mother memories. We have also changed the title to reflect this.

      The in vivo ephys section (Figure 3) is interesting but even more minimalistic and it is unclear how abGCs projection to CA2 can contribute to SWR and theta-gamma PAC. In Figure 1, the authors suggest that abGCs project preferentially to PV+ neurons in CA2. At a minimum, the authors should discuss how the abGCs to PV+ neurons to CA2 pyramidal neurons circuit can facilitate SWR and theta-gamma PAC.

      We have divided Figure 3 into two figures (Figures 3 and 4) and revised the electrophysiology section of the results section. In the revised paper, we now discuss how abGC projections to PV+ interneurons may facilitate SWR and PAC.

      Finally, proposing a function for 4-6-week-old abGCs projecting to CA2 begs two questions: What are abGCs doing once they mature further, and more generally, what is the function of the DG to CA2 projection? It would be interesting for the authors to comment on these questions in the discussion.

      In response to these comments, we discuss possible answers to these interesting questions.

      Recommendations for the authors:

      Reviewer #1:

      Specifically, in Figure 1, for the analysis of the synapses formed between abGCs and CA2 PNS (as identified by PCP4 expression) and CA2 PV+ cells (as identified by cre-dependent AAV-mCherry expression) in PV-cre line. In panels c and d the soma of a CA2 PN cell is shown, as well as the soma of a PV cell is shown. Why was the soma analyzed? What relevance is there for this? It is my understanding that synapses form on dendrites- this would be much more relevant to show, in my opinion. Also, the methods for panels e and f state that the 3R-Tau+ intensity was analyzed only in stratum lucidum. (There was a normalization for the overall 3R-Tau intensity in SL of CA2 that was obtained by dividing the 3R-Tau intensity of corpus callosum). I don't understand then how a comparison of 3RTau intensity could have been done for CA2 PN soma. There are no CA2 PN soma in stratum lucidum. (This is fairly clearly shown in Figure 1aiii, with the PCP4 staining showing the soma in the somatic layer... not in stratum lucidum). What is being analyzed here?

      If the 3R-Tau intensity for dendrites is higher for PV cell dendrites, an example image of dendrites would be very helpful. How was the CA2 PV cell dendrite delimited from the CA2 PN dendrites at 40x magnification for the 3R-Tau intensity? Why were pre-synaptic puncta not examined? Is it possible to determine the post-synaptic target with these methods? This result could be particularly interesting, but I find it very difficult to understand the quantification or the justification behind it. To truly know if a cell is getting a connection, the best method would be to perform whole-cell patch clamp recordings of the post-synpatic target cells and use optogenetics of the abGCs. I understand that perhaps this may be beyond the scope of the paper, but it is a severe limitation for these results.

      We have eliminated the cell body measures from Figure 1 and focus instead on the dendrite measures, which we agree are more relevant. We now provide high magnification example images of pyramidal cell (PCP4+) and PV+ interneuron (GFP+) dendrites in Figure 1. We thank the reviewer for pointing out the error about the stratum lucidum as some of the dendrites analyzed are located in the pyramidal cell layer. In addition, neither PCP4 nor GFP label the full extent of dendrites emanating from CA2 pyramidal cells or PV+ interneurons respectively. We mention this in the revised manuscript because abGC projections to more distal dendrites might show a different pattern than that which was observed for proximal dendrites. We also provide more details about how the dendrites were delimited for the analysis, and mention that these results cannot definitively inform us about whether functional synaptic connections have been formed.

      Canulation over CA2 is potentially not specific to CA2 terminals. It would be optimal if the authors had some histology demonstrating specific cannula placement, as these surgeries are really tough to get perfectly centered over CA2. Even if it is perfectly centered, how much would the CNO diffuse into CA3? I think that given the methodology, the authors really need to consider that the behavioral results are not only a result of blocking abGC terminals in CA2 alone. Would it really change much if the abGC terminals are also silenced in CA3a/b as well? The McHugh lab has shown that area CA3 is also playing a role in social memory (Chiang, M.-C., Huang, A. J. Y., Wintzer, M. E., Ohshima, T. & McHugh, T. J. A role for CA3 in social recognition memory. Behav Brain Res 354, 2018). It may be that both areas CA2 and CA3 are important for the phenomenon being demonstrated in Figure 2. I think the impact of the study is just as interesting, as this examination of early social memories is very interesting and nicely done. In fact, areas CA2 and CA3 may be acting together (please see Stöber, T. M., Lehr, A. B., Hafting, T., Kumar, A. & Fyhn, M. Selective neuromodulation and mutual inhibition within the CA3-CA2 system can prioritize sequences for replay. Hippocampus 30, 1228-1238, 2020).

      We agree that it is possible that CNO infusions targeted at the CA2 would also influence CA3a/b and have revised the paper to include this possible interpretation. We also cite the suggested paper on CA3 involvement in social memory (Chiang et al., 2018) and the paper on CA2-CA3 interactions (Stöber et al, 2020).

      Figure 3 is packed with information, but not communicated in a reasonable way. Much more information and a description of the experimental protocol need to be presented. Furthermore, why are there no example traces for the SWRs recorded? There should be more analysis than just a difference score and frequency. How is j, k, and l analyzed and interpreted? Why no example traces there? Also, the n's seem way too small for Figure 3mr. Are there only 32 or three animals used for some of these conditions? This is insufficient in my opinion to conclude much for a 5-minute interaction.

      In response to this concern, we have divided Figure 3 into 2 figures – Figure 3 and Figure 4. In Figure 3, we provide example traces for SWRs, with additional SWR data presented in Figures S3 and S4, including data to complement the difference score data in Figure 3. In Figure 4, we include traces of phase amplitude coupling. We also provide more information in the methods about how the phase amplitude coupling data were analyzed. For Figure 4, we used methods described by Tort et al., 2010 to produce a modulation index, which is a measure of the intensity of coupling between theta phase and gamma amplitude. This method additionally allows for visualization of how gamma amplitude is modified across individual theta phase cycles. Regarding the question about n sizes in the 10-12 week abGC group (Fig. 3), the numbers are lower than in the 4-6 week abGC group because by 6 weeks after the first set of recordings, the electrodes in some of the mice were no longer usable. The n sizes for this specific study are 4-5 per group for Nestin-cre mice; 7-8 for Nestin-cre:Gi. This is now clarified in the figure legend.

      The discussion section of this paper does not put these results into a broader context with the field. There are other studies examining abGCs and their roles in novelty and memory formation (the work from Juna Song's lab, for example). These should be properly mentioned and discussed.

      In response, we have added discussion on the roles of abGCs in nonsocial novelty and memory formation and have cited papers from the Song lab.

      In the figure legend for Figure 2, there is no specific explanation for panel h. Perhaps the label is missing in the legend.

      We thank the reviewer for noting this error and now include a description in the revised manuscript.

      Reviewer #2:

      Adding more quantifications (single cells, isolating data during interactions versus noninteraction times) would help understand the results better. In the lack of this, adding a more clear rationale (even if only through the presentation of hypotheses) in between the transitions of the different results sections would make the study easier to read.

      In response to this comment, we have added transition sentences between results sections to clarify the rationale and make the manuscript easier to understand.

      Reviewer #3:

      Line 110: "Hippocampal phase-amplitude coupling (PAC) and generation of sharp waveripples (SWRs) have been linked to novel experience, memory consolidation, and retrieval (Colgin, 2015; Fernandez Ruiz et al., 2019; Meier et al., 2020; Joo and Frank, 2018; Vivekananda et al., 2021). The DG is known to influence hippocampal theta-gamma coupling and SWRs (Bott et al, 2016; Meier et al., 2020), yet no studies have examined the influence of abGCs on these oscillatory patterns." This information comes too early in the result section and is somewhat confusing.

      In response to this comment, we have moved this information and provided a better description.

      Line 118: "we found that mice with normal levels of abGCs can discriminate between their own mother and a novel mother." Be more descriptive of the results (present the raw interaction times with the statistical test to compare them), this is the conclusion.

      In response to this comment, we provide the raw interaction times in a new Figure (Fig. S1) and describe the results in more detail.

      Line 121: "These effects were not due to changes in physical activity". Be more specific. Did you subject the mice to a specific test? If not, how did you calculate locomotion? The data presented in the supplementary figure 1a only states the % locomotion.

      Locomotion was manually scored whenever an animal moved in the testing apparatus. Speed was not recorded. Total locomotion was divided by trial duration to create a % locomotion measure. We have added these details to the methods.

      Line 124: "Coinciding with the recovery of adult neurogenesis, GFAP-TK animals regained the ability to discriminate between their mother and a novel mother". Explain how the difference in interaction time can be interpreted as the ability to discriminate. You could also compute the discrimination index used by several other laboratories (difference of interaction normalized by the total interaction time).

      In response to this comment, we describe how the difference in interaction time can be interpreted as the ability to discriminate between novel and familiar mice.

      Line 133: "Targeted CNO infusion in Nestin-Cre:Gi mice enabled the inhibition of GiDREADD+ abGC axon terminals present in CA2." Provide data or references to support this claim. Injection of a dye of comparable size to CNO would help. Otherwise, mention that nearby CA3a could be affected as well.

      We cannot rule out that nearby CA3a was affected by our cannula infusions of CNO into CA2. Furthermore, since dyes likely diffuse at different rates than CNO, we believe that a dye injection would not eliminate this concern completely. Therefore, we have revised the paper to acknowledge the likelihood that the CNO infusion affected parts of CA3 in addition to CA2. We also changed the title to focus more on the CA2 electrophysiological recordings, which we know were obtained only from the CA2.

      Line 150: "When reintroduced to the now familiar adult mouse 6 hours later, after the effects of CNO had largely worn off". Provide data or references supporting this claim.

      In response, we cite articles that show behavioral effects of CNO DREADD activation are returned to baseline 6 hrs later.

      Line 165: "We found that SWR production is increased during social interaction, with more SWRs produced during novel mouse investigation, presumably during encoding social memories, than during familiar mouse investigation, presumably during retrieval of developmental social memories". How does this compare to the results in Oliva et al, Nature 2021?

      The Oliva et al 2021 paper recorded CA2 SWRs during home cage and during post-social stimulus exposure periods of sleep. The timing of the study does not coincide with the measures we made, but we cite the paper.

      Line 168: "Inhibition of abGCs in the presence of a social stimulus". How does silencing abGC impact CA2 pyramidal neurons' firing rate?

      The direct answer to this question is unknown because we did not measure single units, but based on studies done in the CA3, it is likely that firing rate in CA2 would increase.

      Line 203: "abGCs possess a time-sensitive ability to support retrieval of developmental social memories." Can you speculate on the function of the cells later on?

      In the revised paper, we speculate about the function of abGCs after they mature and no longer support retrieval of developmental social memories.

      Line 229: "GFAP-TK mice were group housed by genotype". Why not housed them with CD1 littermates?

      We housed these mice according to genotype to avoid having mice with different levels of abGCs (GFAP-TK + VGCV and CD1 + VGCV) living together in social groups. We did this to avoid potential differences that might emerge in social behavior.

      Line 237: "Adult TK, Nestin-cre, and Nestin-cre:Gi offspring underwent a social interaction test in which they directly interacted with the mother". Specify how long was the social interaction time.

      In the revised manuscript, we specify that mice interacted with each social stimulus for 5 minutes.

      Line 240: "After a 1-hour delay spent in the home cage". Were the mice single-housed or with their littermates during this delay?

      In the revised manuscript, we indicate that mice were put back into the home cage with their cagemates during the 1 hr delay period.

      Line 241: "The order of stimulus exposure was counterbalanced in all tests." Can you show some data to confirm that the order of presentation did not impair the interaction? Have you considered using your own version of the classical 3-chamber test in order to assess directly the preference for one or the other female mouse?

      Our data suggest that the order of testing is not responsible for the observed results. Across all experimental groups without an abGC manipulation (i.e., all direct social interaction assays excluding VGCV+ GFAP-TK trials and CNO+ Nestin-cre:Gi trials), ~84.4% of animals demonstrate a social preference for the novel mother over the mother (CD1 + GFAP-TK VGCV- cohort: 28/33; CD1 VGCV+ cohort: 17/17; CD1 and TK recovery cohort: 24/31; Nestin-cre and Nestin-cre:GI 4-6-week-old abGC cohort: 77/95; 10-12-week-old abGC cohort: 49/55; Total = 195/231 mice with an investigation preference for the novel mother). If stimulus presentation order were to bias social investigation preference toward the first stimulus presented, we would expect the percentage of animals demonstrating a social preference for each stimulus to be around 50%, as roughly half the animals were first exposed to the mother with the other half first exposed to the novel mother. The social novelty preference percentage reported above is comparable to percentages we observe in our lab's novel to familiar social interaction experiments, in which all animals are first exposed to a novel conspecific. We have yet to conduct experiments testing adults using the modified 3-chamber assay described in Laham et al., 2021.

      Statistics: The statistical tests used throughout the paper are appropriate but their description is too cursory. Please provide F values and specify the name of the tests used in the figure legends before giving the exact p values.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors were attempting to determine the extent that CIH altered swallowing motor function; specifically, the timing and probability of the activation of the larygneal and submental motor pools. The paper describes a variety of different motor patterns elicited by optogenetic activation of individual neuronal phenotypes within PiCo in a group of mice exposed to CIH. They show that there are a variety of motor patterns that emerge in CIH mice; this is apparently different than the more consistent motor patterns elicited by PiCo activation in normoxic mice (previously published).

      Strengths:

      The preparation is technically challenging and gives valuable information related to the role of PiCo in the pattern of motor activation involved in swallowing and its timing with phrenic activity. Genetic manipulations allow for the independent activation of the individual neuronal phenotypes of PiCo (glutamatergic, cholinergic) which is a strength.

      We thank the reviewers for acknowledging and summarizing the strengths of this study.

      Weaknesses:

      (1) The data presented are largely descriptive in terms of the effect of PiCo activation on the probability of swallowing and the pattern of motor activation changes following CIH. Comparisons made between experimental data acquired currently and those obtained in a previous cohort of animals (possibly years before) are extremely problematic, with the potential confounding influence of changing environments, genetics, and litter effects. The statistical analyses (i.e. comparing CIH with normoxic) appear insufficiently robust. Exactly how the data were compared is not described.

      Yes, we agree the data are descriptive in terms of characterizing the effect of CIH on PiCo activation. However, we would like to emphasize that the data are also mechanistic because they characterize the effects of specifically, optogenetically manipulating PiCo neurons after being exposed to CIH.

      Thank you for this comment and for pointing out our misleading description in the paper. This manuscript is meant to independently characterize the effects of CIH to the response of PiCo stimulation. We are not making direct comparisons between the previously published manuscript where mice were exposed to room air. There has been no statistical analysis made between previously published control and current CIH data, since we are not making a direct comparison, only an observational comparison.

      To make this clearer, and to address the reviewers concern, we have removed the room air data from figures 1E, 2C and 3A. However, we believe it is important to keep the data from mice exposed to room air in Figure 2B since we did not include this information in the previously published manuscript. It is important to point out that all mice exposed to CIH have some form of submental activity during laryngeal activation in response to PiCo stimulation. This is not the case when mice are exposed to room air only. In this figure, only descriptive analysis are presented. We adjusted our wording throughout the text, particularly in the discussion, to eliminate any confusion that we are making direct comparisons between the two studies. The following sentence has been added to the discussion “While we do not intend to make direct quantitative comparisons between the previously published PiCo-triggered swallows in control mice exposed to room air (Huff et al 2023) and the data presented here for mice exposed to CIH, we believe it is important to compare the conclusions made in these two studies.” This was the motivation for using the eLife Advance format. Since the present study demonstrates that PiCo affects swallow patterning which was not observed in the control data.

      (2) There is limited mechanistic insight into how PiCo manipulation alters the pattern and probability of motor activation. For example, does CIH alter PiCo directly, or some other component of the circuit (NTS)? Techniques that silence or activation projections to/from PiCo should be interrogated. This is required to further delineate and define the swallowing circuit, which remains enigmatic.

      We agree with the reviewer that our study raises many more questions than we are able to answer at the moment. This however applies to most scientific studies. Even though swallowing has been studied for many decades, the underlying circuitry remains largely enigmatic. We will continue to investigate the role of PiCo and its interaction with the NTS, in healthy and diseased states. These investigations require many different techniques, and approaches, some of which are still in development. For example, we are currently conducting experiments that silence portions of the NTS related to swallow and PiCo: ChAT/Vglut2 neurons using novel unpublished viral approaches. However, these are separate and ongoing studies beyond the scope of the current one.

      To address the reviewer’s comment, we have added to the following to the limitation section: “In addition, this preparation does not allow for recording of PiCo neurons to evaluate the direct effects of CIH in PiCo neuronal activity”. The following has also been added to the discussion: “Rather, our data reveal CIH disrupts the swallow motor sequence which is likely due to changes in the interaction between PiCo and the SPG, presumably located in the cNTS. While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow motor patterning itself. Here we show for the first time that CIH leads to disturbances in the generation of the swallow motor pattern that is activated by stimulating PiCo. This suggests that PiCo is not only important for coordinating swallow and breathing, but also modulating swallow motor patterning. Further studies are necessary to directly evaluate the presumed interactions between PiCo and the cNTS.”

      (3) The functional significance of the altered (non-classic) patterns is unclear.

      Like in our original study, the preparation used to stimulate PiCo does not allow to simultaneously characterize the functional significance of swallowing. Therefore, we have included this as a limitation in the limitation section: “In this preparation we are unable to directly determine the functionality of the variable swallow motor pattern seen after CIH. Different experimental techniques, such as videofluoroscopy would need to be used to directly evaluate functional significance. This technique is beyond the scope of this study and not possible to perform in this preparation. We acknowledge this limits our ability to make direct comparisons between dysphagic swallows in OSA patients.”

      Reviewer #1 (Recommendations For The Authors):

      (1) A more rigorous experimental approach is required. Littermates should be separated and exposed to either room air or CIH at the same (or close to the same) time.

      As stated above, we did not directly compare mice exposed to room air with mice exposed to CIH. Hence, we believe this is not necessary, and it would have meant repeating all the experiments already published in the original eLife paper.

      (2) Robust statistical analyses are required to determine whether the effects of CIH on the pattern/probability of motor activation are required.

      Since control and CIH group were not compared in this study, statistical hypothesis testing is not appropriate or applicable.

      (3) Use a combination of retrograde, Cre- AAVs and Cre-dependent approaches to interrogate the circuitry to/from PiCO that forms the swallowing network. This is what is needed to push this area forward, in my view.

      Thank you for this suggestion, we will consider this suggestion as we plan for future experiments. Indeed, we are in the process of developing novel approaches. However, in this context we would like to emphasize that further network investigations are exponentially more complicated given that we need to use a Flpo/Cre approach to specifically characterize the glutamatergic-cholinergic PiCo neurons. Most other laboratories that have studied PiCo have avoided this experimental complication and used only a “cre-dependent” approach. This approach is much simpler, but the data are much less specific and the conclusions sometimes misleading. Stimulating for example cholinergic neurons in the PiCo area will also activate Nucleus ambiguus neurons, stimulating glutamatergic neurons will also activate glutamatergic neurons that are not necessarily the glutamatergic/cholinergic neurons that we use to define PiCo specifically. Readers that are unfamiliar with these different approaches often miss this important difference. Hence, compared to stimulating other areas, stimulating the cholinergic-glutamatergic neurons in PiCo is much more specific than e.g. stimulating preBötzinger complex neurons. There are no markers that will specifically stimulate only preBötzinger complex neurons or neurons in the parafacial Nucleus. Unfortunately, this difference is often overlooked.

      (4) It should be made more clear how each of the "non-classic" swallowing patterns could cause dysfunction - especially to the reader who is not completely familiar with the neural control of swallowing.

      We agree that it would be helpful to understand the functional implications of these alterations in swallow-related motor activation, however since our approach does not allow us to use any tools to measure or evaluate functional activity it would be inappropriate to make suggestions of this type without any data to back up our conclusion. This is why we have not speculated on the functional implications. We have added the following to the discussion section of this manuscript. “While fine wire EMG studies are an excellent evaluation tool to observe temporal motor pattern of sequential swallow related muscles; it must be combined with tools such as videofluoroscopic swallow study (VFSS) and/or high resolution manometry (HRM) in order to characterize the functional significance of these alterations to the swallow motor pattern shown in this study (Park et al., 2017). Since the preparation in this study utilizes only fine wire EMGs we are not able to evaluate or comment on the functional significance of the variable swallow motor patterns. ”

      Minor:

      The Results should be written in a way that better conveys the neurophysiological effects of the manipulations. As it stands, it reads like a statistical report on how activation of each neuronal phenotype is statistically different from each other. As such it is difficult to read and understand the salient findings.

      Thank you for this insight. We have adjusted the language in the results section.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigated the role of a medullary region, named Postinspiratory Complex (PiCo), in the mediation of swallow/laryngeal behaviours, their coordination with breathing, and the possible impact on the reflex exerted by chronic intermittent hypoxia (CIH). This region is characterized by the presence of glutamatergic/cholinergic interneurons. Thus, experiments have been performed in single allelic and intersectional allelic recombinase transgenic mice to specifically excite cholinergic/glutamatergic neurons using optogenetic techniques, while recording from relevant muscles involved in swallowing and laryngeal activation. The data indicate that in anaesthetized transgenic mice exposed to CIH, the optogenetic activation of PiCo neurons triggers swallow activity characterized by variable motor patterns. In addition, these animals show an increased probability of triggering a swallow when stimulation is applied during the first part of the respiratory cycle. They conclude that the PiCo region may be involved in the occurrence of swallow and other laryngeal behaviours. These data interestingly improve the ongoing discussion on neural pathways involved in swallow-breathing coordination, with specific attention to factors leading to disruption that may contribute to dysphagia under some pathological conditions.

      The Authors' conclusions are partially justified by their data. However, it should be acknowledged that the impact of the study is to a certain extent limited by the lack of knowledge on the source of excitatory inputs to PiCo during swallowing under physiological conditions, i.e. during water-evoked swallowing. Also the connectivity between this region and the swallowing CPG, a structure not well defined, or other brain regions involved in the reflex is not known.

      We thank the reviewer for the comments and the strength of the paper. However, with regards to the “lack of knowledge”, we would like to emphasize that PiCo was first described in 2016, while e.g. the preBötzinger complex was described in 1991. Thus, it is not fair to assume the same level of anatomical and physiological understanding for PiCo as we became accustomed to for the preBötzinger complex. We are fairly confident that in 25 years from now, our knowledge of the in- and outputs of PiCo will be much less limited than it currently is.

      Strengths:

      Major strengths of the manuscript:

      • The methodological approach is refined and well-suited for the experimental question. The in vivo mouse preparation developed for this study takes advantage of selective optogenetic stimulation of specific cell types with the simultaneous EMG recordings from upper airway muscles involved in respiration and swallowing to assess their motor patterns. The animal model and the chronic intermittent hypoxia protocol have already been published in previous papers (Huff et al. 2022, 2023).

      • The choice of the topic. Swallow disruption may contribute to the dysphagia under some pathological conditions, such as obstructive sleep apnea. Investigations aimed at exploring and clarifying neural structures involved in this behaviour as well as the connectivity underpinning muscle coordination are needed.

      • This study fits in with previous works. This work is a logical extension of previous studies from this group on swallowing-breathing coordination with further advances using a mouse model for obstructive sleep apnea.

      We thank the reviewers for acknowledging and summarizing the strengths of this study.

      Weaknesses:

      Major weaknesses of the manuscript:

      • The Authors should be more cautious in concluding that the PiCo is critical for the generation of swallowing itself. It remains to demonstrate that PiCo is necessary for swallowing and laryngeal function in a more physiological situation, i.e. swallow of a bolus of water or food. It should be interesting to investigate the effects of silencing PiCo cholinergic/glutamatergic neurons on normal swallowing. In this perspective, the title should be slightly modified to avoid "swallow pattern generation" (e.g. Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production).

      Thank you for pointing out that this manuscript suggest PiCo is necessary for swallow generation. We agree further interventions to silence specifically PiCo ChAt/Vglut2 neurons will be necessary to investigate this claim. Which we have begun to evaluate for a future study by developing a novel as yet unpublished approach. We have altered language throughout the text to limit the perception that PiCo is the swallow pattern generator. We have also changed the title to say: Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production

      • The duration of swallows evoked by optogenetic stimulation of PiCo is considerably shorter in comparison with the duration of swallows evoked by a physiological stimulus (water). This makes it hard to compare the timing and the pattern of motor response in CIH-exposed mice. In Figure 1, the trace time scale should be the same for water-triggered and PiCo-triggered swallows. In addition, it is not clear if exposure to CIH alters the ongoing respiratory activity. Is the respiratory rhythm altered by hypoxia? If a disturbed or irregular pattern of breathing is already present in CIH-exposed mice, could this alteration interfere with the swallowing behaviour?

      Thank you. We have changed the time scale so that all representative traces are on the same time scale.

      We explained in the original paper (Huff et al 2023) that the significant decrease in PiCo-evoked swallow duration compared to water evoked is likely due to the absence of oral/upper airway feedback. We are not making comparisons of the effects of CIH on swallow motor pattern between water-evoked and PiCo-evoked. Rather, we are only characterizing the effects of CIH on the swallow motor pattern in PiCo-evoked swallows. The purpose of Figure 1A is to show that the rostocaudal submental-laryngeal sequence in water-evoked swallows is preserved in “canonical” PiCo-evoked swallow like is shown in the original study. While we did not measure the effects of CIH on breathing and the respiratory pattern in this study, it has been established, by others, that CIH causes respiratory muscle weakness, impaired motor control of the upper airway and variable respiratory rhythm and rhythm generation. However, when characterizing the timing of swallow in relation to inspiration (Figure 1 Figure Supplement 1) and the reset of the respiratory rhythm (Figure 3 figure supplement 1) and by observationally comparing these results with mice exposed to room air (Huff et al 2023) we do not observe any obvious differences in swallow-breathing coordination. However, a separate study in wild-type mice focusing on a characterization of swallowing via water after CIH would be better suited to achieve a better understanding of the physiological changes of swallowing after CIH. We would like to point out that this has shown in Huff et al 2022 that altering respiratory rate/pattern via activation of various preBötzinger Complex neurons does not change swallow behavior. Except in the case of Dbx1 PreBötC neuron activation, which was independent of CIH. Increasing or decreasing respiratory rate via activation of PreBötC Vgat and SST neurons did not change the swallow pattern rather it changed the timing of when swallows occurred. It has been reported before by others that swallow has a hierarchical control over breathing and has the ability to shut breathing down. We believe that the swallowing behavior is independent of respiratory pattern and alterations in breathing pattern does not necessarily affect the swallow motor pattern rather could affect the swallow timing.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Lines 37-41 "Here we show that optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH does not alter swallow-breathing coordination, but unexpectedly the generation of swallow motor pattern was significantly disturbed."

      It should be better:

      "Here we show that optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH does not alter swallow-breathing coordination, but unexpectedly triggers variable swallow motor patterns".

      Thank you, this has been changed

      Lines 41-43 "This suggests, glutamatergic-cholinergic neurons in PiCo are not only critical for the gating of postinspiratory and swallow activity but also play important roles in the generation of swallow motor pattern." I suggest removing any language claiming PiCo is swallow gating and change "generation" in "modulation"

      "This suggests that glutamatergic-cholinergic neurons in PiCo are not only critical in regulating swallow-breathing coordination but also play important roles in the modulation of swallow motor pattern."

      Thank you, this has been changed

      Introduction:

      Line 88-90: Actually, in Huff et al. 2023 it is said "PiCo acts as an interface between the swallow pattern generator and the preBötzinger complex to coordinate swallow and breathing". Please, change accordingly. Please, remove Toor et al., 2019 since their conclusions are quite different.

      Line 100-101: Please, change the sentence according to the comments reported above.

      Thank you, this has been changed

      Results:

      Lines 104-105: Did you mean: "We confirmed that optogenetic stimulation of PiCo neurons in ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH triggers swallow and laryngeal activation similar to the control mice exposed to room air (Huff et al., 2023)." Otherwise, the sentence is not clear.

      Thank you, this has been changed

      Lines 129-130: This finding is not surprising since similar results have been reported in Huff et al. 2023.

      Thank you, we wanted to confirm that CIH did not alter this characteristic, which it did not. We believe that it is important to include this as it is a criterion for characterizing laryngeal activation.

      Lines 219: The number of water swallows is considerably lower than stimulation-evoked swallows. Why?

      We inject water into the mouth three times. Typically, there is one swallow in response to each water injection. Pico is stimulated 25 times at each duration. If we were to stimulate swallow with water as many times as optogenetic stimulation there would be an adaptive response to the water stimulation and the mouse would not respond. This does not seem to be the case with PiCo stimulation. Simple answer is, there are many more PiCo stimulations than water stimulation.

      Lines 228-232: "PiCo-triggered swallows are characterized by a significant decrease in duration compared to swallows evoked by water in ChATcre:Ai32 mice (265 {plus minus} 132ms vs 144 {plus minus} 101ms; paired t-test: p= 0.0001, t= 5.21, df= 8), Vglut2cre:Ai32 mice (308 {plus minus} 184ms vs 125 {plus minus} 44ms; paired t-test: p= 0.0003, t= 6.46, df= 7), and ChATcre:Vglut2FlpO:ChR2 mice (230 {plus minus} 67ms vs 130 {plus minus} 35ms; paired t-test: p= 0.0005, t= 5.62, df= 8) exposed to CIH (Table S1).".

      Thank you, this has been changed

      Line 252 and 254: remove SEM.

      Thank you, this has been changed

      Discussion

      Line 267: ...(Figure 1Bi), while 28% of PiCo-triggered swallows...

      Thank you, this has been changed

      Lines 283-290: "Thus, CIH does not alter PiCo's ability to coordinate the timing for swallowing and breathing. Rather, our data reveals that CIH disrupts the swallow motor sequence likely due to changes in the interaction between PiCo and the SPG, presumably the cNTS.

      While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow pattern generation itself. Thus, here we show for the first time that CIH resulted in the instability of the swallow motor pattern activated by stimulating PiCo, suggesting PiCo plays a role in its modulation.".

      Thank you, this has been changed

      Could the observed effects be due to a non-specific effect of hypoxia on neuronal excitability? In addition, it should be considered that PiCo-triggered swallows lack the behavioural setting of water-evoked swallows and do not activate the sensory component of the SPG to the same extent as the water-evoked swallows.

      Yes, this is very possible. We stated in our first manuscript that the decrease in PiCo-triggered swallow duration, as compared to water-triggered swallow duration, is likely because oral sensory components are not being activated to the same extent (Huff et al. 2023). Since we do not directly measure neuronal excitability, it is not known (in this study) whether CIH causes changes in the excitability to swallow related areas. However, others have shown increased excitability and activity of Vglut2 neurons after CIH exposure (Kline et al 2007,2010), and we have shown e.g. changes in the excitability of preBötC neurons (Garcia et al. 2016, 2017).

      Lines 293-300: The sentence is not clear. Is there any evidence indicating that glutamatergic neurons are differently affected by hypoxia than cholinergic neurons?

      Thank you, these sentences have been changed to increase clarity. The section now reads: There was no statistical difference in the probability of triggering a swallow during optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 neurons in mice exposed to room air (Huff et al 2023). However, when exposed to CIH, ChATcre:Ai32 and Vglut2:Ai32 mice have a lower probability of triggering a swallow -- in some mice swallow was never triggered via PiCo activation, while water-triggered swallows remained – compared to the ChATcre:Vglut2FlpO:ChR2 mice. While it is possible that portions of the presumed SPG remain less affected by CIH, which could offset these instabilities to produce functional swallows, our data suggest that PiCo targets microcircuits within the SPG that are highly affected by CIH. The NTS is a primary first site for upper airway and swallow-related sensory termination in the brainstem (Jean, 1984). CIH induces changes to the cardio-respiratory Vglut2 neurons, resulting in an increase in cNTS neuronal activity (Kline, 2010; Kline et al., 2007), as well as changes to preBötzinger neurons (Garcia et al., 2017; Garcia et al., 2016) and ChAT neurons in the basal forebrain (Tang et al., 2020). It is reasonable to suggests that CIH has differential effects on neurons that only express ChATcre and Vglut2cre versus the PiCo-specific interneurons that co-express ChATcre and Vglut2FlpO, emphasizing the importance of targeting and manipulating these PiCo-specific interneurons.”

      Lines 372-374: "Here we show that PiCo, a neuronal network which is critical for the generation of postinspiratory activity (Andersen et al. 2016) and implicated in the coordination of swallowing and breathing (Huff et al., 2023), is severely affected by CIH.".

      Thank you, this has been changed.

      Methods

      Line 398: Did you mean Slc17a6-IRES2-FlpO-D?

      Thank you, this has been changed.

      Line 399: were.

      Thank you, this has been changed.

      Line 403: ... expressing both ChAT and Vglut2 and will be reported as ChATcre:Vglut2FlpO.

      Thank you, this has been changed.

      Line 437: Mice of the ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 lines were kept in collective cages with food and water ad libitum placed inside custom-built chambers.

      Thank you, this has been changed.

      Line 479: (Figure 6a in Huff et al., 2022).

      Line 497: What does Fig 7 refer to?

      This should say Figure 1- figure supplement 2, This has been changed

      Lines 501-506: "First, swallow was stimulated by injecting 0.1cc of water into the mouth using a 1.0 cc syringe connected to a polyethylene tube. Second, 25 pulses of each 40ms, 80ms, 120ms, 160ms and 200ms continuous TTL laser stimulation at PiCo was repeated, at random, throughout the respiratory cycle. The lasers were each set to 0.75mW and triggered using Spike2 software (Cambridge Electronic Design, Cambridge, UK). These stimulation protocols were performed in all ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2." .

      Thank you, this has been changed.

      Line 526 and 540: (Fig.6 in Huff et al., 2022) and (Fig.6d in Huff et al., 2022).

      Thank you, this has been fixed

      Line 594: Figure 5 doesn't exist. Please, change the sentence.

      Thank you, this has been fixed

      Line 595 and 609: The reference Kirkcaldie et al. 2012 is referred to the neocortex and doesn't seem appropriate. Please, quote the atlas of Paxinos and Franklin.

      Thank you, this has been changed.

      Reference:

      Please, correct throughout the text editing of references by removing e.g J.M. or A. or David D. and so on. Only surnames should be mentioned.

      Thank you, this has been changed.

      Figures:

      Figure 1. A and B as well as the purple arrow are lacking. In addition, optogenetic stimulation is applied during different periods of inspiratory activity and this could impact the swallow motor pattern. In Bv, Non-LAR seems very similar to LAR. In panel E, please add the number of animals.

      Thank you, this has been fixed.

      We used the same optogenetic protocols in the original paper (Huff et al. 2023) and did not observe any changes to the swallow motor patter in relation to the time PiCo was stimulated. The only phase dependent response seen in both control and CIH is when PiCo Is stimulated during inspiration and a swallow is triggered, inspiration will be inhibited. Therefore, we do not believe variability in swallow motor pattern is dependent on the phase of breathing in which PiCo is stimulated.

      Biv LAR has a pause in EMG activity before the swallow begins (red arrow pointing to the pause). While Bv Non-LAR does not have this pause, rather the two behaviors converge (red arrow). In order for something to be considered an LAR the pause must be present which is why we separated these two motor patterns.

      Figure 1 - Figure Supplement 1. Why do the Authors call the lines "histograms"?

      Thank you, this has been fixed. This is a line graph of swallow frequency in relation to inspiration.

      Tables:

      In tables, data are provided as means and standard deviation. Please, specify this in the Method section.

      Thank you, the following is listed in the methods section: “All data are expressed as mean ± standard deviation (SD), unless otherwise noted.”

      Reviewer #3 (Public Review):

      In the present study, the authors investigated the effects of CIH on the swallowing and breathing responses to PICO stimulation. Their conclusion is that glutamatergic-cholinergic neurons from PICO are not only critical for the gating of post-inspiratory and swallow activity, but also play important roles in the generation of swallow motor patterns. There are several aspects that deserve the authors' attention and comments, mainly related to the study´s conclusions.

      • The authors refer to PICO as the generator of post-inspiratory rhythm. However, evidence points to this region as a modulator of post-inspiratory activity rather than a rhythmogenic site (Toor et al., 2019 - 10.1523/JNEUROSCI.0502-19.2019; Oliveira et al., 2021 - 10.1016/j.neuroscience.2021.09.015). For example, sustained activation of PICO for 10 s barely affected the vagus or laryngeal post-inspiratory activity (Huff et al., 2023 - 10.7554/eLife.86103).

      Yes, we did refer to PiCo as the postinspiratory rhythm generator as defined as Anderson et al. 2016. We base this statement on the following criteria and experiments: In Anderson et al. 2016, we demonstrate that PiCo can be isolated in vitro, that PiCo neurons are activated in phase with postinspiration, and that they are inhibited during inspiration by preBötC neurons via GABAergic mechanisms and not glycinergic mechanisms. We also demonstrate that optogenetically stimulating cholinergic neurons in the PiCo area resets the inspiratory rhythm both in vivo and in vitro. We also show that PiCo when isolated in transverse slices is autorhythmic and that PiCo, like the preBötC in transverse slices can generate respiratory rhythmic activity in vitro and independent of the preBötC. We also demonstrate that PiCo neurons are an order of magnitude more sensitive to opioids (DAMGO) than the preBötC and that local injections of DAMGO into the PiCo area in vivo abolishes postinspiration, and also abolishes the phase delay of the respiratory rhythm. None of these specific rhythmogenic properties have been studied by the Toor study or the Oliveira et al study. Hence, we do not understand why the reviewer cites these studies as evidence for modulation as opposed to rhythmogenic properties. The fact that PiCo is rhythmogenic should not be considered as an “exclusive property”. Specifically, this does not mean that PiCo is also “modulating” the swallow-breathing coordination as we have demonstrated more specifically in the Huff et al study. In the same sentence we also referred to the PreBӧtzinger complex as the inspiratory rhythm generator as defined by Smith et al 1991, and it seems that the reviewer did not object to this reference. But we would like to point out that the same criteria were used to define the preBötzinger complex as we used for PiCo, except that PiCo neurons are better defined than preBötzinger complex neurons. Dbx1 neurons are often used to characterize the PreBötC, but these neurons form a rostrocaudal and ventrodorsal column which involves also glia cells and transcends the preBötC. Glutamatergic neurons are everywhere, and so are Somatostatin or Neurokinin neurons. Moreover, the 1991 study was only performed in vitro, and did not include a histochemical analysis. We would also like to point out that the present manuscript is investigating the role of PiCo in swallow and laryngeal behaviors, and not specifically postinspiration. Thus, we are not entirely sure how this comment relates to this manuscript.

      • The optogenetic activation of glutamatergic and cholinergic neurons from PICO evoked submental and laryngeal responses, and CIH changed these motor responses. Therefore, the authors proposed that PICO is directly involved in swallow pattern generation and that CIH disrupts the connection between PICO and SPG (swallow pattern generator). However, the experiments of the present study did not provide evidence about connections between these two regions nor their possible disruption after CIH, or even whether PICO is part of SPG.

      We have edited the text to suggest PiCo modulates swallow motor sequence in addition to the coordination of swallow and breathing. We have also added that further experiments will be necessary to further investigate the connections between PiCo and SPG. But, unfortunately, compared to PiCo, the SPG is much less defined. As already stated above, it cannot be expected that a single study can address all possible open questions. Clearly, more work needs to be done outside of this study to answer all of these questions, which makes this an exciting area of research.

      • CIH affects several brainstem regions which might contribute to generating abnormal motor responses to PICO stimulation. For example, Bautista et al. (1995 - 10.1152/japplphysiol.01356.2011) documented that intermittent hypoxia induces changes in the activity of laryngeal motoneurons by neural plasticity mechanisms involving serotonin.

      Yes, we thank the reviewer for this comment and we agree that CIH effects multiple brainstem regions. We stated in the manuscript that we are measuring changes in two muscle complexes which spread among three motor neuron pools: hypoglossal nucleus, trigeminal nucleus, and nucleus ambiguus. We have added a discussion on laryngeal activity in the presence of acute bouts of extreme hypoxia, acute intermittent hypoxia, as well as chronic intermittent hypoxia.

      • To support the hypothesis that PICO is directly involved in swallow pattern generation the authors should perform the inhibition of Vglut2-ChAT neurons from PICO and then evoke swallow motor responses. If swallow is abolished when the neurons from this region are inhibited, it would indicate that PICO is crucial to generate this behavior.

      Thank you. We would like to clarify: “involvement” does not mean “necessary for”. Confusing this difference has caused much confusion and debate in the field. Just as an example: We can argue in great length whether inhibition is necessary for respiratory rhythmogenesis in vivo, but I think there is no question that inhibition is involved in respiratory rhythmogenesis in vivo. But to avoid any confusion, we have changed the text to suggest PiCo is involved in the modulation of swallow motor sequence. We agree various additional inhibition experiments are necessary to explain if PiCo is also a necessary component of the SPG, but this is not the question we have set out to address in this study. To specifically target PiCo we must not only inhibit Vglut2 neurons but neurons that express both ChAT and Vglut2. To our knowledge there are no inhibitory DREADD or opsin techniques for cre/FlpO to specifically target these neurons. As stated above, non-experts in the field do not appreciate this technical nuance. However, we have begun to develop novel techniques necessary to inhibit these specific neurons which will be published in the future.

      • In almost all the data presented, the authors observed different patterns of changes in the motor submental and laryngeal responses to PICO activation, including that animals submitted to CIH (6%) presented a "normal" motor response. However, the authors did not discuss the possible explanations and functional implications of this variability.

      We agree that it would be helpful to understand the functional implications of these alterations in swallow-related motor activation, however since we are not using any tools to measure or evaluate functional activity it would be inappropriate to make suggestions of this type without any data to back up our conclusion. This is why we have not included any functional implications. We have added the following to the manuscript. “While fine wire EMG studies are an excellent evaluation tool to observe temporal motor pattern of sequential swallow related muscles; it must be combined with tools such as videofluoroscopic swallow study (VFSS) and/or high resolution manometry (HRM) in order to characterize the functional significance of these alterations to the swallow motor pattern shown in this study (Park et al., 2017). Since the preparation in this study utilizes only fine wire EMGs we are not able to evaluate or comment on the functional significance of the variable swallow motor patterns.”

      • In Figure 4, the authors need to present low magnification sections showing the PICO transfected neurons as well as the absence of transfection in the ventral respiratory column. The authors could also check the scale since the cAmb seems very small.

      Thank you, added different histology images to have a more comparable cAmb. As well as added lower magnification to show absence of transfection in the VRC.

      • Finally, the title does not reflect the study. The present study did not demonstrate that PICO is a swallow pattern generator.

      We have also changed the title to say: Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production

    1. Re-establishing the three ordersI will now go back through these orders and show how the worldview I have espoused in this essay may be able to re-invigorate them. a. The nomological orderIn the worldview I’ve put forward in this essay, there is a different kind of nomological order. Here there is also an affinity, or deep continuity, between how the mind works and the structure of reality. As I argued in section 4, relevance realization, i.e., the process by which we become more behaviorally attuned to the world, is a particular manifestation of the general process by which the universe at large is continually being created and complexified. In a previous essay I showed that there is a great deal of overlap between relevance realization and the modern science of consciousness. I think Jordan Peterson was right when said that “we are really reflective, including in our consciousness, of something about the structure of reality itself.”Or, as John Vervaeke and colleagues put it, there really are “fundamental principles by which knowledge and reality co-operate” (Vervaeke et al., 2017), and this constitutes a kind of nomological order. b. The narrative orderThe Christian-Aristotelian narrative order was participatory. We were participating in the process by which the kingdom of heaven would be built on earth. In the worldview I’ve put forward in this essay, there is no final “goal” towards which the universe is aiming. Rather, the process itself is the goal. This constitutes an infinite game rather than a finite game. Although we are not participating in a narrative that brings about some final state of utopia, we are capable of participating in a process that is of ultimate value, both for ourselves and for the world at large. Vervaeke and colleagues said that the narrative order:…provided an overarching story into which the minutia of the cosmos―individuals and their own stories―could fit and belong. Further, it introduced the idea that the agency of persons could intervene in the cycle of repetition and meaningfully impact the course of cosmic history.What I am arguing for is not far off from that. Our individual stories do fit into the overarching story of the cosmos (which is, as Azarian suggested, a never-ending story of continual self-organization and complexification). Our actions — every decision we make — can therefore meaningfully impact the course of cosmic history. That constitutes a kind of narrative order. c. The normative orderThe normative order consisted of a connection between ontology and values. In the worldview I have put forward in this essay, there is also a connection between ontology and values. In section 6 I argued that our participation in the process of complexification is biologically and psychologically optimal. This process therefore constitutes an ontological structure that simultaneously informs us about the nature of the good. Ontologically speaking, this process underlies reality as we know it. Normatively, our participation in this process is of ultimate value. This constitutes a kind of normative order. In sum, the worldview put forward in this essay may be able to re-invigorate the three orders, the loss of which precipitated the “meaning crisis” in Western culture.

      Summary

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study presents a valuable finding on the possible use of vilazodone in the management of thrombocytopenia through regulating 5-HT1A receptor signaling. The evidence supporting the claims of the authors is solid, with the combined use of computational methods and biochemical assays. The work will be of broad interest to scientists working in the field of thrombocytopenia.

      Public Review:

      Reviewer #1 (Public Review):

      Summary:

      This is well-performed research with solid results and thorough controls. The authors did a good job of finding the relationship between the 5-HT1A receptor and megakaryocytopoiesis, which demonstrated the potential of vilazodone in the management of thrombocytopenia. The paper emphasizes the regulatory mechanism of 5-HT1A receptor signaling on hematopoietic lineages, which could further advance the field of thrombocytopenia for therapeutic purposes.

      Strengths:

      This is comprehensive and detailed research using multiple methods and model systems to determine the pharmacological effects and molecular mechanisms of vilazodone. The authors conducted in vitro experiments using HEL and Meg-01 cells and in vivo experiments using Zebrafish and Kunming-irradiated mice. The experiments and bioinformatics analysis have been performed with a high degree of technical proficiency. The authors demonstrated how vilazodone binds to 5-HTR1A and regulates the SRC/MAPK pathway, which is inhibited by particular 5-HTR1A inhibitors. The authors determined this to be the mechanistic underpinning for the effects of vilazodone in promoting megakaryocyte differentiation and thrombopoiesis.

      Weaknesses:

      (1) Which database are the drug test sets and training sets for the creation of drug screening models obtained from? What criteria are used to grade the results?

      Response: Thank you for your thoughtful comment. The database is built by our laboratory. Firstly, we collected 39 small molecule compounds that can promote MK differentiation or platelet formation and 691 small molecule compounds that have no obvious effect on MK differentiation or platelet formation to buiid the datbase. Then, the data of the remaining 713 types of small molecule compounds were utilized as the Training set, and the Molecular Descriptors of 2 types of active and 15 types of inactive small molecule compounds were randomly picked as the Validation set. With regard to the activity evaluation criteria, the prediction score for each molecule was between 0 and 1, and the model decision was made with a threshold of 0.5. The molecule with a score above the 0.5 threshold was identified as a megakaryopoiesis inducer (1).

      Reference:

      (1) Mo Q, Zhang T, Wu J, et al. Identification of thrombopoiesis inducer based on a hybrid deep neural network model. Thromb Res. 2023;226:36-50. doi:10.1016/j.thromres.2023.04.011

      (2) What is the base of each group in Figure 3b for the survival screening of zebrafish? The positivity rate of GFP-labeled platelets is too low, as indicated by the quantity of eGFP+ cells. What gating technique was used in Figure 3e?

      Response: We are deeply grateful for the insightful feedback you have provided regarding Figure 3 and the assessment of zebrafish model. We used 50 zebrafish embryos per group to evaluate VLZ toxicity, and we think this is a suitable and fair baseline. Our gating procedure is clearly depicted in the resulting diagram. Since our goal was to evaluate the fluorescence intensity quantitatively, we isolated the entire zebrafish cell. Since the amount of eGFP+ in various zebrafish tissues found in other literature is likewise quite low and we are unsure of the typical eGFP+ threshold for zebrafish (1, 2), we think this finding should be fair given that each group's activities in the experiment were conducted in parallel.

      Reference:

      (1) Yang L, Wu L, Meng P, et al. Generation of a thrombopoietin-deficient thrombocytopenia model in zebrafish. J Thromb Haemost. 2022; 20(8): 1900-1909. doi:10.1111/jth.15772

      (2) Fallatah W, De Silva IW, Verbeck GF, Jagadeeswaran P. Generation of transgenic zebrafish with 2 populations of RFP- and GFP-labeled thrombocytes: analysis of their lipids. Blood Adv. 2019;3(9):1406-1415. doi:10.1182/bloodadvances.2018023960

      (3) In Figure 4C, the MPV values of each group of mice did not show significant downregulation or upregulation. The possible reasons for this should be explained.

      Response: Thank you for your thoughtful comment. Megakaryocytes build pseudopodia, which form extensions that release proplatelets into the bone marrow sinusoids. Proplatelets convert into barbell-shaped proplatelets to form platelets in an integrin αIIbβIII mediated process (1-2). Platelet size is established by microtubule and actin-myosin-sceptrin cortical forces which determine platelet size during the vascular formation of barbell proplatelets (3). Conversion is regulated by the diameter and thickness of the peripheral microtubule coil. Proplatelets can also be formed from proplatelets in the circulation (4). Megakaryocyte ploidy correlates with platelet volume following a direct nonlinear relationship to mean platelet volumes (5). Usually there is an equilibrium between platelet generation and clearance from the circulation (normal turnover) controlled by thrombopoietin. When healthy humans receive thrombopoietin, their platelet size decreases (6). Proplatelet formation is dynamic and influenced by platelet turnover (7) which increases upon increased platelet consumption and/or sequestration. In our study, the MPV values of each group of mice did not show significant downregulation or upregulation, from our point of view, there are several possible reasons for these results.

      (1) Mice in a radiation-damaged state may result in a decrease in platelet count, but at the same time stimulate the bone marrow to release young and larger platelets, thus keeping the MPV relatively stable.

      (2) After radiation injury, bone marrow cells were suppressed, resulting in a decrease in the number of platelets produced, but MPV remained unchanged, possibly because the direct effects of radiation on the bone marrow caused thrombocytopenia, but not necessarily the average platelet size.

      Reference:

      (1) Thon JN, Italiano JE. Platelet formation. Semin Hematol. 2010(3):220-226. doi: 10.1053/j.seminhematol.2010.03.005.

      (2) Larson MK, Watson SP. Regulation of proplatelet formation and platelet release by integrin alpha IIb beta3. Blood. 2006(5):1509-1514. doi: 10.1182/blood-2005-11-011957.

      (3) Thon JN, Macleod H, Begonja AJ, et al., Microtubule and cortical forces determine platelet size during vascular platelet production. Nat. Commun. 2012(3):852. doi: 10.1038/ncomms1838.

      (4) Machlus KR, Thon JN, Italiano JE Jr. Interpreting the developmental dance of the megakaryocyte: a review of the cellular and molecular processes mediating platelet formation. Br. J. Haematol. 2014(2):227-36. doi: 10.1111/bjh.12758.

      (5) Bessman JD. The relation of megakaryocyte ploidy to platelet volume. Am. J. Hematol. 1984(2):161-170. doi: 10.1002/ajh.2830160208.

      (6) Harker LA, Roskos LK, Marzec UM, et al., Effects of megakaryocyte growth and development factor on platelet production, platelet life span, and platelet function in healthy human volunteers. Blood. 2000(8):2514-2522. doi: 10.1182/blood.V95.8.2514.

      (7) Kowata S, Isogai S, Murai K, et al., Platelet demand modulates the type of intravascular protrusion of megakaryocytes in bone marrow. Thromb. Haemost. 2014(4):743-756. doi: 10.1160/TH14-02-0123.

      (4) The PPI diagram and the KEGG diagram in Figure 6 both provide a possible mechanism pathway for the anti-thrombocytopenia effect of vilazodone. How can the authors analyze the differences in their results?

      Response: We are appreciated your valuable comments. PPI (Protein-Protein Interaction) refers to the interaction between proteins. Inside cells, proteins interact with each other to perform various biological functions, influencing cell signaling, metabolic pathways, cell cycle, and more. KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database that integrates information on genomes, chemicals, and biological systems. In pharmacoinformatic, KEGG pathways are often used to understand the molecular mechanisms of specific diseases or biological processes. KEGG contains the interrelationships between genes, proteins, and metabolites, helping to reveal key nodes in biological processes. PPI information can be integrated with data from KEGG pathways, such as metabolic and signaling pathways, to gain a more comprehensive understanding of the role of protein-protein interactions in cellular processes and biological functions. For example, by analyzing nodes in the PPI network, proteins associated with a specific disease can be identified, and further examination of these proteins' locations in KEGG pathways can reveal molecular mechanisms underlying the onset and development of the disease. However, this method also has some limitations:

      Uncertainty (1): The construction of protein-protein interaction networks and drug interaction networks involves many assumptions and speculations. The edges of these networks may be based on experimental data but can also rely on bioinformatics predictions. Therefore, the accuracy of predictions is limited by the quality and reliability of the data used during network construction.

      Insufficient data (2): Despite the availability of a large amount of bioinformatics data for network construction, interactions between some proteins and drugs may still lack sufficient experimental data. This data insufficiency can result in inaccuracies in network predictions.

      Dynamics and temporal-spatial changes (3): The dynamics and temporal-spatial changes in biological systems are crucial for drug effects. Pharmacoinformatic may struggle to capture these changes as it often relies on static network representations, overlooking the temporal and dynamic nature of biological systems.

      Reference:

      (1) Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics. 2020(1):442. doi: 10.1186/s12859-020-03773-2.

      (2) Zhang S, Zhao H, Ng MK. Functional module analysis for gene coexpression networks with network integration. IEEE/ACM Trans. Comput. Biol. Bioinform. 2015(5):1146-1160. doi: 10.1109/TCBB.2015.2396073.

      (3) Cinaglia P, Cannataro M. A method based on temporal embedding for the pairwise alignment of dynamic networks. Entropy (Basel). 2023(4):665. doi: 10.3390/e25040665.

      (5)-HTR1A protein expression is measured only in the Meg-01 cells assay. Similar quantitation through western blot is not shown in other cell models.

      Response: Your insightful criticism and recommendation to use different cell models in order to obtain a more accurate depiction of 5-HTR1A protein expression are greatly appreciated. We completely concur that using this strategy would greatly increase the validity of our research. However, establishing a primary megakaryocyte model requires specialized expertise and technical resources, which unfortunately are not readily available to us within the given timeframe. Nevertheless, we acknowledge the limitations of Meg-01 cells, which may exhibit distinct properties compared to true megakaryocytes. To mitigate this concern, we have ensured robust experimental design and rigorous data analysis to interpret our findings within the context of these model cell lines. We believe our results still provide valuable insights into megakaryocyte differentiation and address an important biological question.

      Reviewer #2 (Public Review):

      Summary:

      The authors tried to understand the mechanism of how a drug candidate, VLZ, works on a receptor, 5-HTR1A, by activating the SRC/MAPK pathway to promote the formation of platelets.

      Strengths:

      The authors used both computational and experimental methods. This definitely saves time and funds to find a useful drug candidate and its therapeutic marker in the subfield of platelets reduction in cancer patients. The authors achieved the aim of explaining the mechanism of VLZ in improving thrombocytopenia by using two cell lines and two animal models.

      Weaknesses:

      Only two cell lines, HEL and Meg-01 cells, were evaluated in this study. However, using more cell lines is really depending on the workflow and the grant situations of the current research team.

      Response: We deeply appreciate your insightful feedback and valuable suggestions regarding the use of more suitable models for studying the role of VLZ in megakaryocyte differentiation and platelet production. We fully agree that CD34+ hematopoietic stem/progenitor cells or primary megakaryocytes would provide a more accurate representation of in vitro megakaryopoiesis compared to HEL and Meg-01 cells, which possess limited potential for this process. We acknowledge that our current study did not include experiments with these preferred cell models. This is because our laboratory is still actively developing the technical expertise and resources required for establishing and maintaining primary megakaryocyte and CD34+ cell cultures. Despite the limitations of the current study, we believe the results using HEL and Meg-01 cells provide valuable preliminary insights into the potential effects of VLZ on megakaryocyte differentiation. We are actively working to overcome these limitations and plan to incorporate these more advanced models in our future investigations.

      Reviewer #1 (Recommendations For The Authors):

      I think the authors can enhance the mechanism study by developing more reliable models and methodologies. The connection to clinical research should be strengthened at the same time.

      Response: We deeply appreciate your insightful feedback and valuable suggestions regarding the use of more suitable models for studying the role of VLZ in megakaryocyte differentiation and platelet production. Despite the limitations, we are committed to expanding our research in the future by incorporating your suggestion and establishing a primary megakaryocyte model to further validate our findings and strengthen our conclusions. At the same time, we wholeheartedly concur with your suggestion to combine clinical research. Unfortunately, VLZ is not a first-line treatment for depression in China, and getting blood samples from the matching number of patients for analysis is a challenge. To give additional experimental support for the medication, we have attempted to improve the data in vivo as much as feasible, including by implementing the intervention in normal mice. Our findings should also contribute to the theoretical underpinnings of this medication and aid in its practical application.

      Reviewer #2 (Recommendations For The Authors):

      Issues the authors need to address:

      Figure 7: Why the band intensity of GAPDH in b or e is much greater than that in f, g, or h?

      Response: Thank you for your careful observation and insightful comment regarding Figure 7. Because the concentration of each batch of protein samples is different, sometimes the GAPDH band strength is increased by the large loading volume. Other factors that may influence the GAPDH band strength include the instrument's contrast adjustment during exposure and the use of different numbers of holes for electrophoresis. Meanwhile, the original three replicate results of all WB results will be provided in the supplementary materials.

      Finally, we sincerely thank you for providing us with this opportunity to make a further revision and modification of our manuscript, and your valuable and scientific comments are useful for the great improvement of our manuscript!

    1. Book Summary:PART 1: FUNDAMENTAL TECHNIQUES IN HANDLING PEOPLEPrinciple 1: Don't Criticise, Condemn or ComplainCriticism is futile, it makes the other person strive to justify himselfCriticism doesn't correct a situationWhen you give a person criticism, they will never make lasting changes in the things you criticised them forDon't criticise anyone; "they are just what we would be in similar circumstances"📝Action Step: Ponder and journal on all the instances when you criticised someone on something they valued or were making progress in (e.g. studies, business, sport). Journal on why you said that, really get to the roots of your beliefs. Go and message the person you criticised and tell them you're sorry. Next time don't criticise ANYONE."Don't complain about the snow on your neighbour's roof, when your own is unclean"🤔Action Step: Think of all the times when you complained in the last week or so. Write it down/type it out, then write next to the complain, what an alternative for the complain could be. Next time NEVER complain."I will speak ill of no man ... and speak all the good I know of everybody"Principle 2: Give Honest And Sincere AppreciationHumans all want to have the feeling of importance in societyAndrew Carnegie praised his associates publicly and privately to handle them better"Don't be afraid of enemies that attack you, be afraid of friends that flatter you"If someone makes a mistake, don't condemn them, appreciate their good points, and reward them through praise🗣️Action Step: The next time you see someone making progress or working really hard, go and give them a compliment (give them honest and sincere appreciation) - Go to AG wins and comment on a win —> DO THIS RN OR YOUR A JEFFREY"Every man I meet is superior to me in some way, in that way I learn of him"Principle 3: Arouse In The Other Person An Eager WantThe only way to influence other is to talk about what they want and show them how to get it💡Action Step: The next time you come across a situation where you have to make someone do something under your responsibility/leadership, ponder for a second, "How can I make this person want to do it?", really get into their shoes - journal/ponder on it, then apply it to the person in real life — or, if you sell a product, ask yourself, "How can I make this person want to buy it?", use the feedback and apply it"If there's a secret to success, it's the ability to get into the POV of the other person and see thingsPART 2: 6 WAYS TO MAKE PEOPLE LIKE YOUPrinciple 1: Be Genuinely Interested In The Other PersonYou can make more friends in 2 months by becoming interested in others, than you can in 2 years by being interested in yourselfMake yourself do things for others — things that require time, thoughtfulness/unselfishness😢Action Step: Whenever you see someone that is in need of help in their life, or is struggling, go and give them advice. Be genuinely interested in helping them improve rather than helping yourself —> Do this in AG right NOW."We are interested in others when they are interested in us"Principle 2: SmileWhat one wears on one's face is far more important that the clothes on one's backHappiness doesn't depend on outer conditions, it depends on inner conditions😀Action Step: Start SMILING RIGHT NOW, Literally, Just put a smirk on your face and wear it for the rest of the day (see how people respond to it)"There is nothing good or bad, it is thinking that makes it so"Principle 3: Remember A Person's Name To That Person Is The Sweetest Sound🤝Action Step: Whenever you meet someone new, find out their complete name and associate it with an image in your headYour name to you is more important than 1000 other names of othersPrinciple 4: Be A Good Listener, Encourage Others To TalkListening is one of the highest compliments we can pay to anybodyGood conversationalist = Good Listener (be attentive)To be interesting, be interested🗣️Action Step: The next time you socialise with someone, make them to 80% of the talk, ask them open-ended questions, and let them freely answer (follow the 80/20 principle)Principle 5: Talk In Terms Of The Other Persons Interest💡Action Step: When talking to someone else, talk about something that they're interested in (e.g. self-improvement, sports), then let the conservation freely flow on that topic, pick their brain on that topic, ask them questionsPrinciple 6: Make The Other Person Feel Appreciated And ImportantAlways make the other person feel appreciated and importantUse phrases like, "I'm sorry to trouble you", "Would you be kind as to ____", "Would you mind"🤷‍♂️Action Step: The next time you have to call someone, or tell someone to move, use of the phrases abovePART 3: HOW TO WIN PEOPLE TO YOUR WAY OF THINKINGPrinciple 1: The Only Way To Get The Best Out Of An Argument Is To Avoid It, You Can't WinWhy argue?"A man convinced against his will, is of the same opinion still""Hatred is never ended by hatred, but by love"😠Action Step: The next time you're talking to someone and you notice them starting to escalate into an argument, end it right there by showing love (e.g. give them a compliment, express gratitude)Principle 2: Show Respect For The Other's Opinion, Never Say "You're Wrong"If you're going to prove something, don't let anyone know it"Be wiser than other person if you can, but do not tell them so"If someone says something wrong say, "I thought otherwise", "I may be wrong ____"Telling someone directly that they're wrong can cause a lot of damage💬Action Step: When you're in a discussion with someone, let's say one of your JEFFREY friends at school, he says Junk FOOD is fine, instead of saying "you're wrong", use one of the phrases above, repeat in a much friendlier tonePrinciple 3: If You Are Wrong, Admit Quickly And EmphaticallyAdmit quickly that the other person is right and you are wrong in a friendly toneYou need to have courage to have the ability to criticise yourself🤨Action Step: The next time you find yourself having made a mistake in front of others, admit it straight away in a friendly manner. Make sure you don't cause damage to others while doing so.Principle 4: Begin In A Friendly Way"A drop of honey catches more flies than gallon of gall"Always begin the conversation in a friendly manner and friendly tone💭Action Step: The next time you have a conversation with someone, start the conversation with a positive vibe, and friendly tone.Principle 5: Get The Other Person Saying "Yes" "Yes" ImmediatelyDon't start a convo with things you differ from, start with things you agree onAt all costs, keep the person from saying "no" at the startIt is much more profitable to set things from the other person's view point and make them say "yes"🙌Action Step: After bringing the positive vibe to the conversation, start talking about things you agree on to the other person, and ask them questions which deliberately provoke a "yes" response. Brainstorm a little on this in your brain before proceeding the person.Principle 6: Let The Other Person Do A Great Deal Of The TalkingEncourage them to talk, if you disagree, hold silent, listen with an open mind"If you want enemies, excel your friends; if you want friends; let your friends excel you" - keep quiet about your accomplishments, don't talk about them, unless somebody asks🏆Action Step: Follow the 80/20 rule when talking in convo, only talk about the other person, their interests, don't show off in the conversation to look cool (e.g. saying you earn $10k/m online), keep quiet, remain humble in the conversationPrinciple 7: Let The Other Person Feel The Idea Is TheirsMaking someone feel that the idea is theirs is like giving them a compliment💡Action Step: The next time you come up with a great idea and you implement it, and it gives your reasonable success, thank the friend that helped you generate the idea (e.g. tag someone in AG because they helped you start a profitable business)Principle 8: Try Honestly To See Things From The Other Person's POVPeople may be totally wrong, but don't condemn them, try to understand them, their situation🧐Action Step: The next time you're in a conversation, and someone has said something that is completely wrong, and you thought to yourself "why did he/she say that!" - empathise their situation and see things from their POV (e.g. say to yourself, "I would've done the same if I was in that situation)Principle 9: Be Sympathetic With The Other Persons's POV3/4 of people which you meet crave sympathy, go give it to themPut yourself in the shoes of the other person at the start of a conversation, or deal😊Action Step: Another tip to just keep at the back of your head is to see things from the other person's POV, have sympathy for the situation their own. Really put your shoes in the other person, make yourself feel that you're the other person, see things from a new REALITY.Principle 10: Appeal To The Nobler MotivesAlways choose a nobler motive when you assume something about othersBe the kind of leader who appeals to what really matters and, even when the feedback is tough, reminds people why they're really therePrinciple 11: Dramatise Your IdeasTruth isn't enough, the truth has to be made vivid, interesting dramatic🕺Action Steps💡Make your ideas more obvious, interesting, and vivid to peopleUse drama and showmanship to capture attention and imagination to make your ideas more impressiveWhen presenting an idea, make it more exciting than it really isPrinciple 12: Throw Down A Challenge"The way to get things done is to throw down a competition"🥵Action Step: When you're doing something that many others are doing (e.g. participating in a challenge), ask someone participating and throw down a challenge to them (e.g. whoever finishes the challenge first wins)PART 4: BE A LEADER - HOW TO CHANGE PEOPLE WITHOUT GIVING OFFENCEPrinciple 1: Begin With Praise And Honest AppreciationAppreciate the person first before bringing up your problem for resolution🗣️Action Steps:e.g. if someone did a random act of kindness for youTell the person that you appreciate the actTell them how it made you feel goodCongratulate and tell them that it was beyond expectationsPrinciple 2: Call Attention To People's Mistakes IndirectlyWhen indirectly criticising someone, never use the word "but", use "and" insteadThis technique works well for sensitive people who resent criticism💭Action Step: Praise a quality, and also a quality that you want to see the improvement in of someone else (e.g. if someone doesn't keep his house clean, say, "I appreciate the effort you put in to make the house clean")Principle 3: Talk About Your Own Mistakes Before Criticising The Other PersonTalk about your own shortcomings, before judging someone (e.g. asking them to improve)😆Action Step: If again you want to see a direct improvement in someone, before telling them, talk about your own mistakes in that area you want to see improvement in from the other person, tell them a joke about you, a story about the mistakes you madePrinciple 4: Ask Questions Instead Of Giving Direct OrdersAlways give people the opportunity to do things by themselves through questionsResentment is caused by a brash order that may last a long time😤Action Step: When you need something done by someone else, don't give them a direct order. Give the person an opportunity to do things by asking questions (questions must be relevant to the task that you need done)Principle 5: Let The Other Person Save FaeFinding faults in the other person will make them resent you❌Action Step: Instead of directly pointing out the faults in the other person, let them save face and find their own mistakes (or point it out indirectly)Principle 6: Praise The Slightest Improvement, And Praise Every ImprovementFaults start to disappear after you give praise😊Action Step: When you see someone making progress, or you see growth, praise them on their hard work, and praise the improvementPrinciple 7: Give The Other Person A Fine Reputation To Live Up To💡Action Step: If you want to improve a person in a certain area, act as though that trait was already one of his or her outstanding characteristics (e.g. make it seem as if they already have that trait)Principle 8: Use Encouragement, Make The Fault Seem Easy To CorrectLet the other person know that you have faith in their ability to performa task💪🏿Action Step: When you see a fault, and they're trying their best to fix it, let them know that you have full faith in themPrinciple 9: Make The Other Person Happy About Doing The Thing You SuggestGive some reward for performing what you want to the other person, and take away a little for something which they do not doRules for making other person happy about thing you suggest:Be sincere, do not promise anything you can't deliverKnow exactly what it is you want the other person to doBe empathetic, ask yourself what it is the other person really wantsConsider the benefits the person will receive from doing what you suggestMatch those benefits to the other person's wantsWhen you make your request, put in a form that will convey to the other person the idea that he personally will benefit from

      how to win friends and influence people summary

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer_01

      Major comments:

      1. The authors cite that acetylated and tyrosinated microtubules have different spatial and compartmental distribution in dendrites and axons and investigate the distribution in the AIS of nonAcD cells and AcD cells, as well as the stem dendrites. However, they just show one example of two different cells (Figure 2D and E) without any statistical analysis. Either, they should remove this part or provide a thorough quantification. Reply: The spatial and compartmentalized distribution of stable and dynamic MTs in the dendrites and axons of nonAcD neurons has been extensively studied and reviewed (see Kapitein & Hoogenraad, 2011; Katrukha et al., 2021; Tas et al., 2017 for reference). However, the organization of the MT cytoskeleton in AcD neurons is still unknown. Here, we provide the very first evidence on the distribution of tyrosinated and acetylated MTs in AcD neurons, as well as data on MT orientations. We agree with the reviewer that to make our results on the spatial organization of these post-translational modifications in AcD neurons more complete, we need to provide a more thorough quantification analysis.

      To achieve this, we plan to perform immunostainings on DIV10 neurons using antibodies against tyrosinated (tyr) and acetylated (ac-) tubulin to label dynamic and stable MTs, respectively. Subsequently, we will conduct high-resolution 3D confocal imaging and measure fluorescent intensity to illustrate the abundance and staining patterns of tyr- and ac- MTs in the axons and dendrites of AcD neurons. Since the spatial distribution of tyr- and ac-MTs is distinguishable with confocal microscopy, we will retain STED examples in the figures but conduct new analyses on confocal imaging data. We will measure the total fluorescent intensity of tyr- and ac- MTs in different compartments of AcD neurons and normalize it to the size of the measured area. We will then compare the normalized intensity values between the axons and dendrites of AcD neurons to examine whether there is a specific distribution pattern of stable and dynamic MTs. We will analyse at least 3 independent primary culture preparations with a minimum of 30 cells. Using the same dataset, we will also quantify the percentage of AcD neurons with ac-MTs specifically elongating into the axon compared to AcD.

      The authors use EGFP-Rab3A vesicle to investigate anterograde transport at the axon and dendrites. They find a slightly faster transport of these vesicles at the AIS of AcD cells and conclude the axonal cargos in general are transported faster across the AIS in AcD cells. In my opinion, this generalization based on one type of vesicle is too farfetched.

      Reply: The Rab3A protein is associated with pre-synaptic vesicles that are transported by KIF1A and KIF1Bβ, members of the kinesin-3 family, towards pre-synaptic buttons (see Guedes-Dias & Holzbaur, 2019; Niwa et al., 2008 for reference). Since KIF1A and KIF1Bβ are common motor proteins that mediate MT-based transport of different types of vesicles (e.g., synaptic vesicles and dense-core vesicles, see Carabalona et al., 2016; Helmer & Vallee, 2023 for reference), we reasoned that Rab3A should be a representative marker for an axonal cargo. However, this indeed does not rule out whether the faster trafficking effect we saw is specific to presynaptic vesicles, as different types of vesicles tend to recruit different modulators that could lead to different trafficking features.

      To address this question, we will perform a live-imaging experiment including two additional organelle marker proteins, Neuropeptide Y (NPY) and Lysosome-associated membrane protein 1 (Lamp1). NPY is transported into the axon via KIF1A and KIF1Bβ-mediated dense-core vesicles (see Helmer & Vallee, 2023; Lipka et al., 2016 for reference). Lamp1 is associated with lysosomes and a range of endocytic organelles that recruit both kinesin-1 and kinesin-3, and are transported into both axons and dendrites (as reviewed in Cabukusta & Neefjes, 2018). By introducing two additional types of vesicles, we should be able to answer whether AcD neurons, in general, tend to transport cargoes into the axon faster than nonAcD neurons.

      __Minor comments: __

      In the introduction, the authors describe how synaptic inputs are received at the dendrites and propagated to the soma in the form of membrane depolarizations. They should add 'excitatory' to synaptic inputs or also describe the impact of inhibitory synaptic inputs at the dendrites.

      In my opinion, Figure 2 could be presented in a slightly better way. The lower part of panel A better fits to panel B, which is next to the upper part of panel A. I understand that the authors systematically present their data first for nonAcD cells and then for AcD cells. However, in this special case it is a little bit more difficult to read the current figure in that order. The results displayed in Figure 4 are presented in a slightly confusing order. The authors jump from 4D to 4G, then to 4I and 4E, 4H, 4F. Similarly, 4M and N are addressed before 4O and P to finally get to 4K and L. It would be beneficial to present and address the data in a stringent way.

      Reply: Thank you for the suggestions on how to improve the data representation in the figures. We will change Figures 2 and 4 and make adjustments in the text upon revision since we also plan to include additional data.

      Reviewer_02

      Major comments:

      1. The authors suggest that there is reduced Na+ channel density at AcD AIS compared to other AIS arising from the cell body. This is not convincing. Immunostaining for Na+ channels is notoriously difficult and sensitive to fixation since the epitopes of the anti-Pan Nav antibodies are highly sensitive to fixation. In addition, this is based on immunofluorescence intensity quantification. Since the mechanism of localization is through binding to AnkG, the authors should also measure other AIS proteins like AnkG, b4 spectrin, and Nfasc. Do these change? If all uniformly change I would be much more inclined to accept the conclusion. If they do not change, it still doesn't rule out the concern about fixation conditions and slight differences in the cultures. The authors indicate there is about a 40% reduction in fluorescence intensity. That is quite large. This big difference should also be confirmed in brain sections. Reply: The potential fixation issue and antibody sensitivity on Na+ channel staining are indeed valid considerations, and we are aware of them. However, it should be noted that we used pan-Na+ channel antibodies that were previously characterised and widely used in literature (see Solé et al., 2019; Yang et al., 2020 for references). Furthermore, our samples underwent the same fixation and staining protocol, and comparable numbers of AcD and nonAcD neurons were imaged from the same preparation and coverslip for each experiment. Imaging settings were also kept constant. Any loss of Na+ channel staining at the AIS due to fixation should affect both neuron types and therefore our conclusion is justified. Nevertheless, the reviewer's point regarding other AIS components is valid and will be investigated further in the revised manuscript.

      Following the reviewer's suggestion to further strengthen our conclusion, we will measure the intensity of AnkG, βIV-spectrin, and neurofascin in DIV21 AcD and nonAcD neurons. We will compare a minimum of 3 independent cultures, each containing at least 10 cells of each type per culture.

      We agree with the reviewer that confirming observed differences in Na+ channel staining using brain slices would be beneficial. However, conducting such experiments presents several challenges. Firstly, one approach could involve immunostaining with antibodies against AIS marker AnkG, in combination with somatodendritic marker MAP2 and pan-Nav. However, this method lacks the advantage of clearly identifying neuronal morphology as seen in dissociated cultures, making the outcome unclear and difficult for analysis and interpretation. Alternatively, the use of Thy1-GFP rats, where a subset of neurons is labelled with GFP, could allow for morphological studies. Unfortunately, we do not have access to this rat line, and the process of importing it, obtaining permits, and establishing a colony is beyond the timeframe for manuscript revision. Additionally, while pan-Nav antibodies have shown reliability in dissociated cultures, their efficacy in tissue staining is less certain. We could provide example images upon request. Secondly, endogenously labelling of Na+ channels is another option, but remains a significant challenge. Recent developments in endogenous labelling, such as the CRISPR/Cas9-based method using pORANGE by Fréal et al. (Fréal et al., 2023), and the generation of Scn1a-GFP transgenic mice by Yamagata et al. (Yamagata et al., 2023), offer potential solutions. However, the labelling efficiency of pORANGE is uncertain, and both methods are time-consuming and cannot be completed within the three-month revision period.

      As an alternative, we propose emphasising that our results are based on in vitro experiments and discussing the advantages and limitations of this approach in the discussion section.

      The analysis of inhibitory synapse differences at the AIS are also not compelling - this is a limitation of the culture system. The authors have no control over the density of inhibitory neurons in the culture well. This interaction is not intrinsic to the AcD neuron, but rather a feature of neuron-neuron interactions which should only be modelled in the animal.

      Reply: The reviewer is correct in pointing out that establishing inhibitory synapses at the AIS is not an intrinsic feature of AcD neurons; it depends on the network and should be modelled in animals. We will include this limitation of the cell culture model in the discussion section in the revised manuscript. We also understand the reviewer's concern that the lower amount of inhibitory synapses at AcD neuron AIS might be due to uneven density of inhibitory neurons between cultures. Nonetheless, assuming that the number of inhibitory neurons is constant between preparations, it is an interesting observation that AcD neurons form fewer inhibitory synapses at the AIS. This may be related to the features of the AIS and its morphology and should be further investigated.

      To make our study more comprehensive and also address the reviewer's concern regarding the presence of inhibitory neurons, we will perform immunostainings in dissociated cultures (40.000 cells per 18 mm coverslip, same as in experiments with synapse quantification) with antibodies against pCaMKIIa, an excitatory neuron marker, and GAD1, a marker for inhibitory neurons. Then, we will quantify the density of inhibitory neurons in the culture. We will perform measurements from 3-6 independent cultures by analysing large fields of view in different areas of a coverslip (20-30 neurons per area) to determine if the density of inhibitory neurons varies between cultures as well as preparations. Furthermore, as also requested by reviewer 4, we will perform new immunostainings where pre- and post-synaptic markers (VGAT and Gephyrin) will be included in the same sample together with the AIS (AnkG or Neurofascin) and dendritic marker (MAP2). Synapses that contain pre- and post-synaptic components will be analysed and included in the revised version of the manuscript.

      Finally, the major limitation of this study is that it is performed in vitro. Surprisingly, the authors actually argue this is a feature of their system. While it is true some of the questions can be addressed perfectly well in vitro, many cannot. In the first paragraph of the results the authors state an advantage of their system is that there are no microenvironments to influence the development of the AcDs. I'm afraid I view this as a drawback. The authors suggest this is an opportunity to examine intrinsic mechanisms of development - true, but it also foregoes the opportunity to determine if the outcomes are different from what occurs in vivo. To this point, the authors report that only 15-20% of the population of hippocampal neurons in culture are AcD neurons. But in their introduction they cite other literature indicating 50% of hippocampal neurons in vivo are AcD neurons - this suggests that the environment of the hippocampus in vivo influences whether a neuron becomes an AcD neuron or not.

      Reply: The reviewer is right in pointing that the in vivo environment could indeed affect AcD neuron development, and we also find this to be a very interesting topic to investigate in the future. Even more intriguingly, as shown in a preprint by Lehmann et al. (doi: https://doi.org/10.1101/2023.07.31.551236), network activity stimulates neurons to acquire AcD morphology. While it is true that the impact of the microenvironment on AcD neuron development cannot be studied in dissociated cultures, our in vitro data undoubtedly support the fact that hippocampal neurons can intrinsically develop into AcD morphology independent of the in vivo environment. As also mentioned in the next point, our statement "...their development must be driven by genetically encoded factors rather than specific..." might sound too definitive and therefore eliminate possible effects from the microenvironment. We will revise this part. Although it is highly desirable to move cell biological studies from neuronal cell cultures to tissue, to date, it is still very challenging to perform many of experiments which we did in this study in slices or living animals due to a lack of appropriate technologies and tools. We are convinced that many basic biological questions can be and should be studied in simplified culturing models because they are truly fundamental, they should also be reproducible in these models.

      To address the reviewer's question regarding the percentage difference between our data and the previous study by Thome et al. (2014), several factors should be considered. First, as noted by the reviewer, our results were obtained from an in vitro system, which is not directly comparable to the in vivo model system used in Thome et al.'s study (Thome et al., 2014). Second, the age of the neurons quantified in our developmental experiments is DIV5 and DIV7. This young age disparity could contribute to the percentage difference, as Thome et al. analyzed neurons from P28-35 adult animals, where 50% of the AcD neuron population was observed, specifically in the CA1 region. Third, it's important to note that in other hippocampal regions, the percentage of AcD neurons is lower (approximately 20-30%). Since our hippocampal primary cultures contain neurons from all hippocampal regions, this may have averaged out our quantification of AcD neuron percentage. Additionally, in the study by Benavides-Piccione et al. (Benavides-Piccione et al., 2020), they reported 20% AcD neurons in the CA1 region of hippocampi isolated from 8-week-old mouse pups, a number similar to what we observed in vitro. Interestingly, Thome et al. reported that in P8 pups, AcD neuron population in hippocampal CA1 region is 30%. This number increased to 50% in adult animals at age of P28-35, suggesting there is perhaps an age dependent increase of AcD neuron population. This could be an additional reason of why we only saw 15-20% of AcD neurons in our in vitro system, regardless of the in vivo environment.

      In the revised version, we will clarify these points in the introduction and discussion sections. Additionally, we will quantify the proportion of AcD neurons in mature DIV21 dissociated hippocampal cultures and compare it to DIV7 cultures to assess whether there is an increase in the AcD population over time. We believe that this experiment, combined with the explanations provided above, will sufficiently address the reviewer's question. However, it is important to acknowledge that the establishment of neuronal networks in vitro differ from those in vivo. Therefore, there may be potential differences in the outcomes.

      I appreciated the balanced discussion of whether this is a stochastic or genetically programmed process. This could have been emphasized earlier in the results since the authors invoke the concept that "...their development must be driven by genetically encoded factors rather than specific...". The authors have not shown this and cannot show it in this system. Indeed, as stated in point 4 above, I think their data argue against a simple genetic program.

      Reply: As suggested by the reviewer and noted in point 4, we will revise the section on AcD neuron development in our manuscript to emphasize that hippocampal neurons may adopt AcD morphology through genetic or stochastic mechanisms. While we acknowledge that environmental and activity factors may also influence this process, particularly in mature neurons, our study focuses on developing neurons where genetic and stochastic factors are likely to be predominant. This conclusion is supported by the observation that neurons develop into AcD morphology in vitro, where environmental and activity patterns do not mimic those of in vivo systems.

      Indeed, our current manuscript does not explore genetic factors involved in AcD neuron development. To address this question, one approach could be to label AIS markers endogenously in dissociated cultures using the PORANGE method (see Willems et al., 2020 for reference) or utilize AnkG-GFP transgenic mice (Fréal et al., 2023; Thome et al., 2023) along with a volume marker like mRuby or GFP. This would allow for the identification of AcD and nonAcD neurons in vivo and in vitro, followed by single-cell transcriptomics analysis to uncover potential genetic factors. Subsequently, candidate genes could be manipulated to demonstrate their essential role in AcD neuron development. However, such experiments require significant time and resources beyond the scope of our current revision timeframe. Nonetheless, this question presents an exciting direction for future research.

      Reviewer 3

      Major comments:

      1. The authors classify neurons into axon-carrying dendrite (AcD) and non-AcD neurons by measuring the stem dendrite length (> 3 µm). I could not find the validity for this cut-off. The non-AcD neurons in Fig. 6B appear more AcD to this reviewer, and, in addition, other researchers have proposed a third category of 'shared root' neurons (doi: 10.7554/eLife.76101). For purposes of reproducibility and transparency, please provide first a comprehensive overview of the entire population of morphologies (i.e. all cells in control conditions). The distances from the soma could be plotted in histogram (etc.) and authors may want to think about independent supporting evidence for the cut-off to classify AcD and non-AcD neurons. Reply: Concerning the validity of AcD neuron classification, we did measure the length of the stem dendrite, as shown in Figure S4G, with an average distance of around 10 µm. However, we admit that this information is presented relatively late in the manuscript. To address the reviewer's criticism, in the revised version, we will include a supplementary figure displaying a gallery of representative images of both AcD and nonAcD neurons analyzed in our study (please refer to Hodapp et al., 2022; Fig S1 C&D; Fig S3 as an example). Given the sample size of AcD and nonAcD neurons in our study, including all images would result in a very large figure (for example, Figure 1: DIV5: 83 AcD neurons out of 427 cells, DIV7: 47 AcD neurons out of 387 cells). We will only show representative examples of AcD neurons in the gallery. Additionally, as suggested, we will plot the length of the stem dendrite (or axon distance) of AcD neurons as a histogram to demonstrate that the AcD neurons included in our study indeed have a stem dendrite longer than 3 µm. To further validate the used classification method, we will measure the diameter of the stem dendrite in all analyzed AcD neurons and then compare the distance between the soma and the start of the axon in each analyzed AcD neuron to the diameter of its stem dendrite. As described by Hodapp et al. (Hodapp et al., 2022; Fig S1A), AcD neurons are expected to have a stem dendrite longer than their diameter.

      We have considered having independent evidence to support the classification of nonAcD and AcD neurons. However, the method used by Thome et al. and Wahle et al. for AcD and nonAcD neuron classification is well established and widely accepted (see Thome et al., 2014; Wahle et al., 2022 for references). Similar standards were also employed by Benavides-Piccione et al. (Benavides-Piccione et al., 2020). Introducing independent evidence could potentially raise further doubts, so we have chosen to maintain consistency with previous studies.

      As for the "shared root" neurons described by Wahle et al., we did not analyze this category separately and included them in the nonAcD subtype. Nonetheless, it is an interesting direction to explore in the future. For completeness, we will discuss this point in the revised manuscript.

      Related to point #1 the primary hippocampal neuron system is excellent for cell biological questions but comes with the drawback of imaginative morphologies including neurons with multiple axons and AISs. It is not mentioned here but literature indicates up to 20% of neurons have two axons (e.g. doi: 10.1007/s12264-017-0169-3, 10.1083/jcb.200707042). How did the authors classify the double axon cells? Since the main hypothesis is the existence of an intrinsic program for AcD neurons (p. 5 top), the two axons from one neuron should develop similarly. The authors can easily test this with the data.

      Reply: We appreciate the reviewer's comment regarding the choice of the model system for this type of study. Indeed, as they pointed out, in primary cultures, some neurons develop more than one axon. Since we did not find any supporting evidence from the literature reporting that hippocampal neurons have multiple axons in vivo, we only analyzed neurons with one axon for both AcD and nonAcD neurons. We will clarify this in our method section of the revised manuscript.

      Some interpretations about function are not correct and the authors should reconsider these. A role of cisternal organelles on neuronal excitability remains to be demonstrated (and see doi.org/10.1002/cne.21445 showing there is none). In addition, the statement that lower fluorescence intensity of Pan-Nav1 is indicating reduced excitability is flawed. Antibody staining does not scale linearly with voltage-gated sodium channel density and since the AIS of AcD neurons is further from the soma it is most likely smaller in diameter which may account for apparent fluorescent differences. For biophysical reasons (for details I refer to 10.3389/fncel.2019.00570, 10.1016/j.conb.2018.02.016 and 10.7554/eLife.53432) smaller diameter axons will be easier to depolarize by depolarizing voltage-gated channels or excitatory synapses. Finally, in AcD neurons the AIS distance from the soma poses all sorts of interesting cable properties with the soma and the local dendritic membrane and the electrotonic properties alone suffice to make these neurons more excitable.

      Reply: The reviewer brings up very valid and important points that we will address in the revised manuscript. First, we will rephrase and adjust our interpretations regarding the functions of the cisternal organelle in the AIS. As also mentioned by reviewer #2, we are aware that antibody staining does not properly reflect Na+ channel density. As discussed above, we will also measure other AIS proteins that anchor Na+ channels to see if there are any correlations in fluorescence intensity between them and Nav1. We agree with the reviewer that AcD neuron's AIS could have a smaller diameter, resulting in fewer Na+ channels. Indirect evidence is already available in the study of Benavides-Piccione et al., showing a smaller axon diameter in AcD neurons compared to nonAcD neurons in both human and mouse brain sections (Figure S4). To test this in our model system, we propose to measure the AIS diameter in AcD neurons. If this is indeed the case, we will indicate it in our revised manuscript and edit the section on Na+ channels.

      Exploring the biophysical properties of the AIS and axons of AcD neurons is indeed a highly interesting direction to pursue and is the project in its own. It would necessitate the use of computational modeling approaches, which require considerable time and resources that are not feasible within the timeframe of this revision.

      Comparing AcD and non-AcD neurons for AIS plasticity is an excellent idea but the present statistical design is not suitable for answering this question. The authors should directly compare non-AcD and AcD neurons within a two-way ANOVA design, asking the question whether the independent variable axon type is significantly different and interacts with plasticity.

      Related points: 'AIS distance' in Figure 7 seems to refer to something else than distance from soma (Figure 1). Please clarify. What were the absolute distances from the soma for the AcD neurons and was this dependent on treatment?

      Reply: We appreciate reviewer's comment and in the revised version we will perform the analysis using two-way ANOVA.

      Regarding the terminology and definitions used in our manuscript, the "AIS distance" refers to the measurement between the start of the AIS and the axon initiating point, as depicted in Figure S4 of the manuscript. We adopted this parameter from the previous study by Grubb et al. (Grubb & Burrone, 2010), ensuring consistency in our investigation of AIS plasticity. For AcD neurons, where the axon branches out from the dendrite, we defined the AIS distance as the length between the start of the AIS and the border of the stem dendrite, as illustrated in Figure S4B.

      In Figure 1, the term "distance from soma" represents the length of stem dendrite and used for AcD and nonAcD neuron classification. As shown in Figure S4G, the absolute distance from the soma for AcD neurons is approximately 10 µm and remains consistent across treatments. We will explain these points more clearly in the revised manuscript.

      Minor comments:

      1. At p. 7 is stated that "The percentage of none-AcD forming collaterals at DIV1 is much lower than for AcD neurons" but statistical support is lacking. The conclusion in the next line is that "AcD neurons follow consensus development". That is puzzling given the difference just mentioned before. Please clarify. Reply: We will provide statistical support for comparing collateral formation between nonAcD and AcD neurons at DIV1.

      Regarding the second point concerning consensus development, we were referring to the general developmental sequence of AcD neurons, as described by Dotti et al. (see Dotti et al., 1988 for reference), where neurons typically first establish an axon and then dendrites. This sequence is not necessary related to collateral formation, which indeed differs between nonAcD and AcD neurons. The ability to form collaterals may come from local differences in microtubule (MT) and actin dynamics at AcD neuron precursor axons, but it does not alter the fact that AcD neurons initially establish an axon and subsequently dendrites. We will clarify it in the revised manuscript.

      A study not cited in this manuscript showed distinct dendritic morphologies (doi: 10.1073/pnas.1607548113) and AcD interneurons are different for their axonal arborization (doi: 10.1242/dev.202305). Differences in growth of branch arborization could hint to subtypes. Are the AcD and non-AcD neurons different in their adult morphology? A detailed account of the axonal and dendritic trees would strengthen the data.

      Reply: Thank you for pointing this out. We will include this citation. In the study by Hodapp et al., it was shown that AcD and nonAcD neurons exhibit similar dendritic morphology and do not differ in spine density, number of dendritic branches, and total dendritic length. However, in hippocampal AcD neurons, the AcD occupies 35% of the total basal dendrite length, which is larger than basal dendrites in nonAcD neurons, suggesting that AcD neurons do possess specific features in their dendritic trees.

      Regarding the axons of AcD neurons, there is currently no detailed study available, and it would be more appropriate to investigate neuronal connectivity through tracing studies in animals rather than in primary cultures. Therefore, this question falls outside the scope of the current manuscript.

      Some key references are not included here, and a number of these are mentioned above. In the context of the detailed MT and Rab3A vesicle and cargo transport studies, please acknowledge some of the pioneering work of Alan Peters revealing the ultrastructure of axons emerging from dendrites. See Figs. 5-7 in Peters, Proskauer and Kaiserman-Abramof IR., J Cell Biol 39:604 (1968). What is the identity of the neurons? It makes a difference if the cells are interneurons or pyramidal neurons, CA1 or CA3-like. For plasticity experiments the authors uses cells as independent measurements, but this is inflating the power. How many cultures were used?

      Reply: Thank you for pointing this out; we will include the suggested references in the revised manuscript. In our study, we focused on excitatory neurons from the hippocampus. We distinguished neuron types morphologically or with the inhibitory neuron marker GAD1. Identifying CA1, CA2, CA3, and DG subtypes in dissociated culture is more challenging, and this would be an interesting avenue to explore in an in vivo system. Here, we focused on fundamental cell biology aspects related to the AIS structure and its trafficking barrier function, which should be similar in all these neuron types. While there may be subtype-specific differences in AIS plasticity, investigating this is beyond the scope of our manuscript.

      For the plasticity experiments, we used a total of 3 independent cultures, from which we collected a comparable number of neurons. In response to the reviewer's concern, we will also plot the mean of each culture to illustrate the variability of our data points.

      Reviewer 4

      Major comments:

      1. A general limitation of this study is the low N for some critical experiments. In several experiments, individual cells become an N, therefore boosting the power of the analysis when in reality, due to the known heterogeneity of AIS length, position, and general cell morphology in vitro, the aim should be to compare means across animals / preparations, each consisting of a comparable number of individual cells. This is especially important for the analyses of COs, axo-axonic synapses and channel expression at the AIS. Reply: We would like to mention that this is a cell biological study where neurons are grown in dissociated cultures. To prepare one such culture, we typically use hippocampi from 6-8 E18 rat embryos, which are then mixed in one suspension before plating. The cells are then plated on coverslips in a 12-well plate format. When referring to replicates, for all experiments except for the longitudinal study of 5-day-long time-lapse imaging of developmental sequences (Figure 1), we used between 3 to 6 independent preparations. From each preparation, we took a comparable number of cells derived from 4-6 different coverslips. For each experiment, we measured more than a hundred cells, which is standard practice in the field. To address the issue with individual measurements, in the revised manuscript, we will additionally plot the means of each independent preparation.

      Such critical parameters as e.g. synaptic innervation at the AIS are investigated in a way that does not support the clear statements given, e.g. "The AIS of AcD neurons receives fewer inhibitory inputs" (Highlights statement) or "AcD neurons have less inhibitory synapses at the AIS" (header of Fig. 6). The overall number of analyzed cells is low (3 and 4 preparations, respectively and approximately 50-cells for each marker). The combination of a pre- and postsynaptic marker for inhibitory / excitatory neurons is a solid decision, but the analysis is not done based on the close approximation of these markers, in 3D, along an AIS, but rather in maxIPs and without any regard of whether pre-and postsynaptic markers are actually close to each other not. The expression of these markers alone just points towards the epitopes being expressed, but are they localized to each other in such a manner that they could form bona fide synapses? The methods are not totally clear on the image depth (tile scans with 5 µm in z will not provide the detail of information to resolve synapses, so how did the authors address the subcellular analysis here and for the CO and VGSCs?). And generally, were Nyquist conditions taken into consideration throughout the study? This can be clarified in text and does not require additional experiments.

      Reply: The overall number of cells for quantifying inhibitory synapses along the AIS was approximately 80 cells for each synaptic marker. To clarify this, we will indicate the number of cells in the figure legend of our revised manuscript and will additionally plot mean values across independent preparations.

      In the current manuscript, our main goal was to provide an initial quantitative measurement of AIS features in AcD neurons to see if they differ from nonAcD neurons. Hence, maxIPs are sufficient for this purpose as they summarize the 3D information. To make our study more comprehensive, following the reviewer's suggestion, we will conduct additional experiments to co-label pre- and post-inhibitory synapses at the AIS with VGAT and gephyrin, respectively. Then, we will image samples in 3D to measure the density as well as the distance between pre- and post-synapses at the AIS of AcD neurons and compare them to nonAcD neurons.

      The Nyquist condition was taken into consideration throughout the study. The pixel size of our data collection was 0.081 µm for the laser scanning microscope, as indicated in our methods section. Given the optical setup of our microscope and the fluorophores used to label target proteins (information available in the methods section of our manuscript), the acceptable Nyquist lateral sampling size (or pixel size, in other words) for confocal images is between 0.083 to 0.093 µm and 0.2 µm in the z-plane. In our data collection for laser scanning confocal images, the z-step size was 0.5 µm (see methods section of our manuscript), which is indeed undersampling the data. However, this should not significantly affect our analysis based on maxIPs. The new stainings with matched pre- and post-synaptic markers will be imaged with a smaller z-step (0.2 µm) and then reconstructed in 3D.

      The chapter on AIS plasticity is certainly an interesting addition to the study, but is a bit superficial, yet reaches strong conclusions ("More importantly, it further indicates that the AIS of AcD neurons is insensitive to activity changes"). This is based on un-physiological concentrations of KCl, and certainly not on network manipulation that truly tests synaptic activity. It also comes back to the 1st point above. A suggestion would be to edit the conclusion.

      Reply: KCl treatment globally depolarizes the membrane potential of neurons, leading to an increase in intracellular calcium via voltage-sensitive calcium channels as well as NMDA and AMPA receptors (Rienecker et al., 2020). This protocol has been used in several initial studies describing the plasticity of the AIS (see Evans et al., 2013, 2017; Grubb & Burrone, 2010; Jamann et al., 2021; Muir & Kittler, 2014; Wefelmeyer et al., 2015 for references). Moreover, as shown by Evans et al. and Grubb et al. (see Evans et al., 2013; Grubb & Burrone, 2010 for references), AIS plasticity is not abolished by TTX, which blocks Na+ channels, but is prevented by L-type calcium channel blockers. This suggests that the occurrence of AIS plasticity is independent of action potentials but more sensitive to calcium-related pathways downstream of membrane potential depolarization and post-synaptic activation. Hence, we believe our results are indicative of how the AIS would react when calcium signaling pathways are altered by activity levels. To address the reviewer's concern, we will focus our conclusion more on membrane potential depolarization and calcium signalling and edit out statements.

      As discussed above in response to reviewer #3, the quantification of AIS plasticity includes 3 independent preparations, comprising approximately 200 neurons in total. To prevent inflation of statistical power in the analysis, we will also plot the means and standard error of the mean (SEM) for each independent experiment and assess whether any differences persist.

      The rationale behind looking at the cisternal organelle (CO) in this study is outlined in the Introduction, where the authors state that "...... and is responsible for calcium handling". What is "calcium-handling" and where is the evidence cited? Furthermore, in the Results, they state that "...both compounds (VGSCs and COs) are critical for the AIS to regulate neuronal excitability". While this is the case for VGSCs, there is no conclusive evidence in the literature whether of not the CO is "critical" for neuronal excitability. In fact, a number of neurons have no CO in the AIS (as much as 50% of all AIS in mouse primary visual cortex for example do not express synpo at the AIS at all, Schlüter et al., 2017). The CO can therefore not be as critical for AP initiation as the authors state. Furthermore, the authors state that "AIS plasticity in excitatory neurons is triggered by calcium signaling". While certainly shown and adequately cited here, other factors (independent of calcium) can also play a role, therefore this statement is a bit absolute and should be edited accordingly.

      Reply: Thank you for constructive editorial suggestions. Regarding the first question on calcium handling, we were referring to Ca2+ storage and release mechanisms. Benedeczky et al. already showed the existence of SERCA-type Ca2+ pumps at the membrane of the cisternal organelle (CO) to demonstrate the involvement of Ca2+ sequestering/storage by the CO at the AIS (Benedeczky et al., 1994). Although indirect, Sánchez-Ponce et al. showed the presence of IP3R, which promotes Ca2+ release from internal storage, at the AIS and partially colocalizes with synaptodin (Sánchez-Ponce et al., 2011). This is also the same case for the Ca2+-binding protein annexin 6. Together, this evidence indicates a putative role of the CO in regulating Ca2+ dynamics (storage/release) at the AIS. Since Ca2+ levels have a significant impact on action potential generation and timing at the AIS (see Bender & Trussell, 2009; Yu et al., 2010 for references), and therefore should be strictly regulated, it is likely that the CO at the AIS is important for regulating neuronal excitability by controlling Ca2+ dynamics. However, as mentioned by the reviewer, there are no conclusive pieces of evidence showing the relationship between the CO and neuron excitability regulation. We will edit our statement accordingly.

      In contrast to the findings of Schlüter et al. (Schlüter et al., 2019), which were conducted in the mouse primary visual cortex, Sánchez-Ponce et al. showed that nearly 90% of hippocampal neurons contain synaptopodin, the CO marker protein, at the AIS. Furthermore, Schlüter et al. also demonstrated that in the other 50% of neurons containing COs at the AIS, the COs change size during visual deprivation, and their presence correlates with AIS length changes as well as eye-opening. These observations do suggest that COs are related to neuronal activity. However, this correlation and the formation of COs may be specific to neuro subtypes or require certain triggers. This is another interesting direction to explore, and we will include it in the discussion of the revised manuscript.

      Regarding the last point on Ca2+ and AIS plasticity, we were not excluding other factors that could potentially participate in AIS plasticity and will also discuss it in the revised version.

      The Introduction ends with the rationale of the study, namely that the authors seek to ....."provide a detailed characterization of the AIS, including its structural and functional properties....". Structure is investigated, but function is limited to the barrier function of the AIS. Since the authors provide no electrophysiology that would really dissect AIS function, I suggest to rephrase this part and focus on transport.

      Reply: As suggested, we will certainly emphasize the cargo barrier function of the AIS in AcD neurons in our introduction. But we would like to keep the term "AIS function", because it has already been nicely demonstrated electrophysiologically by previous studies that the plasticity effect of the AIS is very important for maintaining cellular homeostasis.

      The Discussion is more a list of future plans than a context to current data. The authors could move some of the new questions they identify into an "outlook" section at the end? Also, again have a critical look at the literature that is cited and which statements are accurate.

      For example, the 2nd phrase in the Discussion states that is was shown that AcD neurons have a "role in memory consolidation", referenced to Hodapp et al., 2022. However, that paper does not provide direct evidence of such a role for AcD neurons. The statement "Collectively, our data provide new insights into the development of AcD neurons and demonstrate that there are differences in AIS functionality between AcD and nonAcD neurons", is not correct. AIS function was not investigated outside of the axonal barrier, and here, the AcD and nonAcD cells do not differ. Also, although the Discussion is geared towards excitatory / glutamatergic neurons, it has been shown by others that interneurons show an even stronger trend to exhibit AcD morphology (work by the Wahle lab and others). This is not clear from the current text (also compare "...AcD neurons being a different subtype if pyramidal neuron").

      Further original publications should be included in the paragraph highlighting patch-clamp recordings (see above). In the same context, the statement "...showed that rapid AID plasticity occurs mainly in hippocampal dentate gyrus cells but not in principal excitatory neurons" is not accurate (see Kim, Kuba, Jamann and others). Generally, the Introduction and Discussion would benefit from a very clear distinction between studies done in vitro versus those done ex vivo or in vivo. This needs to be stated in the Abstract as well.

      Methods: For the imaging of synapses, the CO and VGSCs, it is not clear to me from the methods whether Nyquist conditions were applied to produce data that can support the quantification of nanoscale structures. Basing the analysis and interpretation of channel expression on fluorescence intensity profiles is problematic (variance in staining quality from samples to sample, lack of an internal standard). This should be noted in the text. In the text, the first two references given for "Induction of plasticity" do not reference the correct papers.

      Reply: Thank you for the valuable suggestions; we will incorporate them into the revised version of the manuscript. The structure will undoubtedly benefit from these improvements. We will also have a further look into our interpretation of the literatures as well as citations during our revision time frame.

      Regarding methods, as stated in response to the second point raised by this reviewer, we ensured that the Nyquist condition was adhered to throughout the study. The pixel size, z-step size, and optical setup of the microscopes used were already indicated in our methods section. With respect to Na+ channel staining, we were indeed aware of the potential issues posed by the experimental setup, and we will explicitly mention this in our revised manuscript. Additionally, we plan to measure other AIS scaffolding and membrane proteins that anchor Na+ channels to assess for potential changes, which could indirectly support our Na+ channel staining results.

      Finally, the text is lacking a discussion of limitations of the study, especially from a methodological point of view. In the Abstract/Summary already, the authors could point out that this is a pure in vitro study. Interestingly, to this day, AIS relocation during plasticity events has only been shown in cell culture systems, and not in vivo. Therefore, this needs to be put into context here - the chosen system is great for the type of imaging approach presented here, but may look at a type of AIS plasticity that is not seen in vivo.

      Reply: These are very good points. We will include the limitations of the study in the discussion. Indeed, due to technical and methodological challenges, the relocation of the AIS has not yet been demonstrated using animal models. However, in the study by Wefelmeyer et al. (Wefelmeyer et al., 2015), a similar relocation of the AIS resulting from chronic stimulation was observed in hippocampal organotypic slices, and it was accompanied by reduced excitability of neurons. Furthermore, in the same study, neurons with axons/AIS originating from basal dendrites were also mentioned. However, the measurement of chronic AIS plasticity in their study was not performed based on different classes of neuron types. Hence, our work complements their results. Given that the network connectivity of organotypic slices is much closer to real physiological conditions, it is likely that similar plastic adaptations could occur in vivo.

      __Minor comments __

      1. How does intrinsic neuronal activity play into developmental programs in vitro? Electrical activity in maturing neurons is a major part of how networks are shaped, and cells differentiate. This is not genetically encoded per se, but has been shown to be a major driving force of neuronal development in vivo. Is this reflected in the culture setting in any way? And have the authors considered testing early changes in activity patterns in their cultures to see whether AcDs and nonAcDs develop in similar percentages? To clarify, I am not asking for additional experiments. Reply: It is indeed a valid point that activity can influence neuronal morphology. Lehmann et al. (pre-print, doi: https://doi.org/10.1101/2023.07.31.551236) have recently demonstrated that increased network activity leads to more excitatory principal neurons adopting AcD morphology. However, our developmental data were collected from DIV0 to DIV5, an age at which dissociated neurons do not yet form functional excitatory synapses. Therefore, it is highly unlikely that network activity plays a role in shaping AcD neuron development during this early stage.

      The authors may want to add a bit of a technical discussion on the choice of KCl and TTX as triggers for plasticity, especially at the non-physiological concentrations offered here and elsewhere (15 mM KCl).

      Reply: We appreciate the reviewer for pointing this out. We will add this in our revised manuscript.

      Some key statements would benefit from citing the appropriate original literature (some examples would be the original work by Kole, Bender and Brette on the role of the AIS in AP initiation; original work by D'Este and Letterier on the dendritic and axonal scaffold using nanoscopy; work by Kim, Kuba and Jamann on AIS plasticity in vitro and in vivo that is critical for a more informed discussion of AIS plasticity here, and others)

      Reply: These are very good points, we will make suggested edits in the revised version.

      In the Introduction, the authors word their text explicitly for excitatory neurons. However, AIS plasticity has also been observed in interneurons (work by the Grubb lab for example), and axo-axonic synapses are in fact not all inhibitory - this is in important factor to consider given the embryonic state of the culture material. Does the DIV maturation reflect how axo-axonic synapses "switch" from excitatory to inhibitory in vivo (also see work of the Burrone lab)? Can the conclusions form the paper really be drawn based on this type of system?

      Reply: The AIS plasticity was indeed also observed in inhibitory interneurons (see Chand et al., 2015 for reference) and show opposite phenotypes compared to excitatory neurons. Also related to major comment #5, we did take the potential influence of AcD interneurons on the outcome of AIS plasticity experiment into consideration. Therefore, we also did a control experiment where inhibitory interneurons were labelled with GAD1 after chronic KCl treatment and these neurons were excluded from the analysis. Consistently, we got the same results that excitatory AcD neurons do not undergo chronic AIS plasticity. We will include this data in our revised manuscript. Further, in our current manuscript, we decided to focus on excitatory AcD neurons not only because they are the major functional unit in neuronal circuits, but also because the majority of the electrophysiological features were studied in excitatory AcD neurons. But we agree with the reviewer that AcD interneuron is definitely an interesting subject for follow up research in the future.

      As mentioned by the reviewer, Pan-Vazquez et al. (Pan-Vazquez et al., 2020) nicely showed that axo-axonic synapses made by GABAergic Chandelier cells (ChCs) depolarise neurons in brain slices obtained from P12-18 animals. But this effect is reversed in slices obtained from older animals (>>P40). Of note, their results were based on cortical neurons but not hippocampal neurons, hence cell type specificity should be considered. More importantly, previous study reported that this conversion or switch of GABAergic interneurons from excitatory to inhibitory occurs on hippocampal neurons in P12-13 animals (Leinekugel et al., 1995). In dissociated hippocampal neurons from E18 rat embryos, this switch of GABAergic interneurons takes place on DIV9-11 and completes on DIV19, which should have a comparable neuronal developmental stage as the P12-13 in in vivo system (see Ganguly et al., 2001 for reference). Therefore, the conclusion could be drawn in an in vitro system, but it certainly needs to be validated in in vivo system.

      The authors state that "less COs account for higher intrinsic excitability". Why is that the case?

      Reply: According to Yu et al. and Bender et al., Ca2+ transient at the AIS regulates the generation of action potentials (APs). For instance, reducing Ca2+ transient at the AIS by blocking Ca2+ channels with either mibefradil (a T-type Ca2+ channel antagonist) or Ni2+ (which blocks R- and T-type channels) decreased the number of spikelets evoked by EPSP-like current injection and delayed the timing of spike generation (please see Bender & Trussell, 2009 for details). Therefore, we speculate that Ca2+ transients are less affected when there are fewer cisternal organelles (COs) at the AIS, which could have a more direct impact on AP initiation. However, this is just our hypothesis, and there is indeed no direct evidence showing that COs regulate Ca2+ dynamics. We will discuss this in the revised manuscript.

      Last but not least, some very recent studies on AcD biology (Stevens, Thome, Lehmann, Wahle) is available online also on preprint servers and may provide additional support for the current study.

      Reply: We will check these pre-prints and include relevant information into the revised version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The author should evaluate the possibility of naturally occurring arrhythmia due to the geometry of the tissues, by using voltage or calcium dye.

      Answer: We thank the reviewer for this suggestion. We have performed new experiments using a voltage-sensitive fluorescent dye (i.e. FluoVolt) with data reported in the new Figure 4 + new results section “arrhythmia analysis”. Briefly, we found that our ring-shaped tissues are compatible with live fluorescence imaging. We were then able to show that our cardiac tissues beat regularly, without naturally occurring arrhythmias or extra beats. We could not detect any re-entrant waves in our tissues in the conditions offered by the speed of our camera. A specific paragraph has also been added to the discussion.

      (2) There is only 50% survival after 20 days of culture in the optimized seeding group. Is there any way to improve it? The tissues had two compartments, cardiac and fibroblast-rich regions, where fibroblasts are responsible for maintaining the attachment to the glass slides. Do the cardiac rings detach from the glass slides and roll up? The SD of the force measurement is a quarter of the value, which is not ideal with such a high replicate number.

      Answer: This paper report seminal data that will serve as a foundation for further use of the platform. We are currently expanding to other cell lines with improvement in survival (see https://insight.jci.org/articles/view/161356). We confirm that the rings do not detach. The pillar was specifically designed to avoid this (See figure 1B).

      As the platform utilizes imaging analysis to derive contractile dynamics, calibration should be done based on the angle and the distance of the camera lens to the individual tissues to reduce the error. On the other hand, how reproducible of the pillars? It is highly recommended to mechanically evaluate the consistency of the hydrogel-based pillars across different wells and within the wells to understand the variance.

      Answer: We propose a system and a measurement method that do not need calibration. Contraction amplitude is expressed as a ratio between the contracted / relaxed areas (See figure 3 A). There is thus no influence of the distance of the camera lens.

      In order to evaluate the consistency of the mechanical properties of the hydrogel, we reproduced the experiment pictured in Figure1-Supplement 1, and measured the Young’s Modulus of three different gel solutions on different days. In the three experiments performed, we found values of 10.0-12.2 kPa, resulting in a final average value of 11.2 (+/- 0.6) kPa, coherent with the value reported in the article. We are therefore confident that the mechanical properties are consistent across and within wells. More extensive mechanical characterization of the molded gels would require the access to an Atomic Force Microscope (AFM), and is considered in the future.

      The author should address the longevity and reproducibility issues, by working on the calibration of camera lens position/distance to tissues and further optimizing the seeding conditions with hydrogels such as collagen or fibrin, and/or making sure the PEG gels have high reproducibility and consistency.

      Answer: This paper report seminal data that will serve as a foundation for further use of the platform. This platform (including the design, approach and choice of polymers) allows a fast and reproducible formation of an important number of cardiac tissues (up to 21 per well in a 96-well format, meaning a potential total of about 2,000 tissues) with a limited number of cells.

      (3) The evaluation of the arrhythmia should be more extensively explained and demonstrated.

      Answer : See answer to comment 1

      (4) The results of isoproterenol should be checked as non-paced tissues should have increased beating frequency with increasing dosages. Dofetilide does not typically have a negative inotropic effect on the tissues. Please check on the cell viability before and after dosing

      Answer : We agree with this reviewer on the principle. However, we have repeated the experiments and we confirm our results, i.e. increasing concentrations of isoproterenol induced a trend towards increase in the contraction force and significantly increased contraction and relaxation speeds without change in the beat rate (Figure 5C). We do not have a definitive explanation for this observation. Our hypothesis is that this increase in contraction and relaxation speeds induced by isoproterenol is translated, on average in our study, into an increase in contractile force rather than in an increase in contraction frequency. This may depend on the cell line used, and is very well illustrated in a recent paper from Mannhardt and colleagues (Stem cell reports. 2020; 15(4):983–998). Of the 10 different cell lines tested in engineered heart tissues, all show an increase in contraction and relaxation speeds after isoproterenol administration, but this is translated either into an increase in contractile force (4 cell lines) or into a shortening of the beat (3 cell lines), and only 2 cell lines show an increase in both parameters. Indeed, since iPSC-CMs are immature cardiac cells, it is rare to obtain a positive force-frequency relationship without any maturation medium or mechanical or electrical training. We agree that above a concentration of 10nM, dofetilide shows cardiotoxicity in our tissues as tissues completely stop beating.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the general comments in the public review, I have the following specific suggestions to the authors, that would help improve the manuscript.

      (1) Please describe the protocol for preparation of cardiac rings (shown in Figure 1C) in more detail. In particular, please describe how the tissues were transferred from the mold into the 96-well plate and how are they positioned and characterized during the study.

      Answer: There is no transfer of the tissues as they directly form in the well, that is pre-equipped with the molded PEG gel (See Figure 1B and methods section). The in situ analysis is a strong asset of this platform.

      (2) Please clarify the timepoints in this study. The overall schematic in Figure 1 C shows that the rings were formed on day 22 and then studied for 14 days, while Figure 2B shows data over 20 days following seeding, and Figure 3 shows data 14 days after seeding. It appears that these were separate studies (optimization of myocyte/fibroblast ratio followed by the main study.

      Answer: Figure 1C is showing the timeline including the cardiomyocytes differentiation. hiPSC-CMs are indeed seeded in the wells 22 days after starting the differentiation, which represent the Day0 for tissue formation. We apologize for the confusion.

      (3) Please explain if the number of rings per well (Figure 2) was used as the only criterion for selecting the myocyte/fibroblast ratio, and if so, why. Were these rings also characterized for their structural and contractile properties?

      Answer: Figure 2 supplement 1 report the contractility data according to the different tested ratios, and show no differences. The number for generated ring-shaped tissues was indeed the only criterion retained.

      (4) Please provide rationale for using the dermal rather than cardiac fibroblasts.

      Answer: We had previous experience generating EHTs using dermal fibroblasts which are easier to obtain commercially. Our approach could in theory also work using cardiac fibroblasts, which we have not tested in the present study.

      (5) Figure 2 panels C-E show an interesting segregation of cardiomyocytes into a thin cylindrical layer that does not appear to contain fibroblasts and a shorter and thicker cylinder containing fibroblasts mixed with occasional myocytes. Please specify at which time point this structure forms, and how does it change over time in culture? At which time point were the images taken? It would be helpful to include serial images taken over 1-14 days of study.

      Answer: We thank the reviewer for this interesting comment. We have performed additional immunostainings (reported in Figure 2 supplement 3) on tissues at Day 1 and day 7 after seeding. The segregation appears in the 7 first days. It appears that 1 day after seeding the fibroblasts are not yet attached, although the cardiac fiber has already started to be formed. Seven days after seeding, fibroblasts are fully spread and attached, and the contractile ring is formed and well-aligned. Brightfield images are reported in Figure 1E.

      (6) In the cardiomyocyte region (Figure 2D) the cells staining for troponin seem to be only at the surfaces. The thickness of the layer is only about 30-40 µµ, so one would assume that cell viability was not an issue. Please specify and discuss the composition of this region.

      Answer: We agree but we think this is a technical issue as at the center of the tissue, tissue thickness will limit laser penetration, although at the surface (inner our outer), the laser infiltrates easily between the tissue and the PEG. Moreover, we see on the zoomed view of the tissue in Figure 2 Supplement 2 that we have a staining inside the cardiac fiber, which just appears less strong due to tissue thickness.

      (7) Please also discuss segregation in terms of possible causes and the implications of apparently very limited contact between the two cell types, i.e., how representative is this two-region morphology of native heart tissue. Also, it would be interesting to know how the segregation has changed with the change in myocyte/fibroblast ratio.

      Answer: We are not sure there is a very limited contact as the use of fibroblasts is critical to ensure the formation of tissues (i.e. no tissues can be formed if we avoid the use of fibroblasts). We agree that these ring-shaped cardiac tissues are not especially representative of a native heart tissue in terms of interactions between several cell types. They were developed as a surrogate for physiopathological and pharmacological experiments (see a recent application in https://insight.jci.org/articles/view/161356)

      (8) There is interest and demonstrated ability to culture engineered cardiac tissues over longer periods of time. Please comment what was the rationale for selecting 14-day culture and if the system allows longer culture durations.

      Answer: In line with this comment, we have studied the contractile parameters of our rings 28 days after seeding and compared to their contractile parameters at D14. We found a slight increase for all the parameters, which is significant for the maximum contraction speed. Nevertheless, the data is much more variable and the number of tissues is lower (29 for D14 against 17 for D28). Therefore, we demonstrated that long-term culture of our tissues is possible, however not yet optimized. Hence, the following physiological and pharmacological tests have been done at D14.

      (9) Figure 3 documents the development of contractile parameters over 14 days of culture. Would it be possible to replace the arbitrary units with the actual values? Also, would it be possible to include the corresponding images of the rings taken at the same time points, to show the associated changes in ring morphologies.

      Answer: Contraction amplitude is expressed as a ratio between the contracted / relaxed areas (See figure 3 A): it is a ratio, thus without unit. Corresponding images can be seen in Figure 1 E.

      (10) The measured contraction stress, strain, and the speeds of contraction and relaxation improve from day 1 to day 7 and then plateau (Figure 3, Supplemental Figure 3. Please discuss this result.

      Answer: The new immunostainings performed on tissues at Day 1 and Day 7 show the progressive alignment of the cardiomyocytes and the muscular fibers, with an almost complete organization at Day 7.

      (11) The beating frequency does not appear to markedly change over time, while Figure 3B shows strong statistical significance (***) throughout the 14-day period. Please check/confirm.

      Answer: We confirm this result.

      (12) Please comment on the lack of effect of isoproterenol on beating frequency.

      Answer: We agree with this reviewer on the principle. However, we have repeated the experiments and we confirm our results, i.e. increasing concentrations of isoproterenol induced a trend towards increase in the contraction force and significantly increased contraction and relaxation speeds without change in the beat rate (Figure 5C). We do not have a definitive explanation for this observation. Our hypothesis is that this increase in contraction and relaxation speeds induced by isoproterenol is translated, on average in our study, into an increase in contractile force rather than in an increase in contraction frequency. This may depend on the cell line used, and is very well illustrated in a recent paper from Mannhardt and colleagues (Stem cell reports. 2020; 15(4):983–998). Of the 10 different cell lines tested in engineered heart tissues, all show an increase in contraction and relaxation speeds after isoproterenol administration, but this is translated either into an increase in contractile force (4 cell lines) or into a shortening of the beat (3 cell lines), and only 2 cell lines show an increase in both parameters. Indeed, since iPSC-CMs are immature cardiac cells, it is rare to obtain a positive force-frequency relationship without any maturation medium or mechanical or electrical training.

      (13) Please compare the contractile function of cardiac tissues measured in this study with data reported for other iPSC-derived tissue models.

      Answer : A specific paragraph tackles this aspect in the discussion

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Reviewers’ Public Comments

      We are grateful for the reviewers’ comments. We have modified the manuscript accordingly and detail our responses to their major comments below.

      (1) Reviewer 2 was concerned that transformation of continuous functional data into categorical form could reduce precision in estimating the genetic architecture.

      We agree that transforming continuous data into categories may reduce resolution, but it also improves accuracy when the continuous data are affected by measurement noise. In our dataset, many genotypes are at the lower bound of measurement, and the variation in measured fluorescence among these genotypes is largely or entirely caused by measurement noise. By transforming to categorical data, we dramatically reduced the effect of this noise on the estimation of genetic effects. We modified the results and discussion sections to address this point.

      (2) Reviewer 2 asked about generalizability of our findings.

      Because our paper is the first use of reference-free analysis of a 20-state combinatorial dataset, generalizability is at this point unknown. However, a recent manuscript from our group confirms the generality of the simplicity of genetic architecture: using reference-free methods to analyze 20 published combinatorial deep mutational scans, several of which involve 20-state libraries, we found that main and pairwise effects account for virtually all of the genetic variance across a wide variety of protein families and types of biochemical functions (Park Y, Metzger BPH, Thornton JW. 2023. The simplicity of protein sequence-function relationships. BioRxiv, 2023.09.02.556057). Concerning the facilitating effect of epistasis on the evolution of new functions, we speculate that this result is likely to be general: we have no reason to think that the underlying cause of this observation – epistasis brings genotypes with different functions closer in sequence space to each other and expands the total number of functional sequences – arises from some peculiarity of the mechanisms of steroid receptor DBD folding or DNA binding. However, we acknowledge that our data involve sequence variation at those sites in the protein that directly mediate specific protein-DNA contact; it is plausible that sites far from the “active site” may have weaker epistatic interactions and therefore have weaker effects on navigability of the landscape. We have addressed these issues in the discussion.

      (3) Reviewer 3 asked “in which situation would the authors expect that pairwise epistasis does not play a crucial role for mutational steps, trajectories, or space connectedness, if it is dominant in the genotype-phenotype landscape?”

      The question addressed in our paper is not whether epistasis shapes steps, trajectories or connectedness in sequence space but how it does so and what its particular effects are on the evolution of new functions. The dominant view in the field has been that the primary role of epistasis is to block evolutionary paths. We show, however, that in multi-state sequence space, epistasis facilitates rather than impedes the evolution of new functions. It does this by increasing the number of functional genotypes and bringing genotypes with different functions closer together in sequence space. This finding was possible because of the difference in approach between our paper and prior work: most prior work considered only direct paths in a binary sequence space between two particular starting points – and typically only considering optimization of a single function – whereas we studied the evolution of new functions in a multi-state amino acid space, under empirically relevant epistasis informed by complete combinatorial experiments. The result is a clear demonstration that the net effect of real-world levels of epistasis on navigability of the multidimensional sequence landscape is to make the evolution of new functions easier, not harder.

      (4) Reviewer 3 asked for “an explanation of how much new biological results this paper delivers as compared with the paper in which the data were originally published.”

      Starr 2017 did not use their data to characterize the underlying genetic architecture of function by estimating main and epistatic effects of amino acid states and combinations; it also did not evaluate the importance of epistasis in generating functional variants, determining the transcription factor’s specificity, or shaping evolutionary navigability on the landscape.

      (5) Reviewer 3 requested an explanation of how the results would have been (potentially) different if a reference-based approach were used, and how reference-based analysis compares with other reference-free approaches to estimating epistasis.

      This topic has been covered in detail in a recent manuscript from our group (Park et al. Biorxiv 2023.09.02.556057). Briefly, reference-free approaches provide the most efficient explanation of an entire genotype-phenotype map, explaining the maximum amount of genetic variance and reducing sensitivity to experimental noise and missing genotypes compared to reference-based approaches. Reference-based approaches tend to infer much more epistasis, especially higher-order epistasis, because measurement error and local idiosyncrasy near the wild-type sequence propagate into spurious high-order terms. Reference-based analyses are appropriate for characterizing only the immediate sequence neighborhood of a particular “wild-type” protein of interest. Reference-free approaches are therefore best suited to understanding genotype-phenotype landscapes as a whole. We have clarified these issues in the revised discussion.

      (6) Reviewer 3 suggested that the comparison between the full and main-effects-only model should involve a re-estimation of main effects in the latter case.

      This is indeed what we did in our analysis. We have clarified the description in the results and methods sections to make this clear.

      (7) Reviewer 3 asked about the applicability of the approach to data beyond those analyzed in the present study and requirements to use it.

      Our approach could be used for any combinatorial DMS dataset in which the phenotypic data are categorical (or can be converted to categorical form). Complete sampling is not required: a virtue of reference-free analysis is that by averaging the estimated effects of states and combinations over all variants that contain them, reference-free analysis is highly robust to missing data (except at the highest possible order of epistasis, where only a single variant represents a high-order effect) as long as variant sampling is unbiased with respect to phenotype. All the required code are publicly available at the github link provided in this manuscript. We have also described a general form of reference-free analysis for continuous data and applied it to 20 protein datasets in a recent publication (Park et al. Biorxiv 2023.09.02.556057).

      (8)Reviewer 3 suggested that the text could be shortened and made less dense.

      We agree and have done a careful edit to streamline the narrative.

      Response to Reviewers’ Non-Public Recommendations

      (1) Reviewer 1 noted that specific epistatic effects might in some cases produce global nonlinearities in the genotype-phenotype relationship. They then asked how our results might change if we did not impose a nonlinear transformation as part of the genotype-phenotype model. The reviewer’s underlying concern was that the non-specific transformation might capture high-order specific epistatic effects and thus reducing their importance.

      Because our data are categorical, we required a model that characterizes the effect of particular amino acid states and combinations on the probability that a variant is in a null, weak, or strong activation class. A logistic model is the classic approach to this kind of analysis. The model structure assumes that amino acid states and combinations have additive effects on the log-odds of being in one functional class versus the lower functional class(es); the only nonlinear transformation is that which arises mathematically when log-odds are transformed into probability through the logistic link function. Thinking through the reviewer’s comment, we have concluded that our model does not make any explicit transformation to account for nonlinearity in the relationship between the effects of specific sequence states/combinations and the measured phenotype (activation class). If additional global nonlinearities are present in the genotype-phenotype relationship – such as could be imposed by limited dynamic range in the production of the fluorescence phenotype or the assay used to measure it – it is possible that the sigmoid shape of the logistic link function may also accommodate these nonlinearities. We have noted this part in the revised manuscript.

      (2) Reviewer 1 observed that our model seems to prefer sets of several pairwise interactions among states across sites rather than fewer high-order interactions among those same states.

      This finding arises because the pattern of phenotypic variation across genotypes in our dataset is consistent with that which would be produced by pairwise interactions rather than by high-order interactions. In a reference-free framework, these patterns are distinct from each other: a group of second-order terms cannot fit the patterns produced by high-order epistasis, and high-order terms cannot fit the pattern produced by pairwise interactions. Similarly, main-effect terms cannot fit the pattern of phenotypes produced by a pairwise interaction, and a pairwise epistatic term cannot fit the pattern produced by main effects of states at two sites. For example, third-order terms are required when the genotypes possessing a particular triplet of states deviate from that expected given all the main and second-order effects of those states; this deviation cannot be explained by any combination of first- and second-order effects.

      We explain this point in detail in our recent manuscript (Park Y, Metzger BPH, Thornton JW. 2023. The simplicity of protein sequence-function relationships. BioRxiv, 2023.09.02.556057) and we summarize it here. Consider the simple example of two sites with two possible states (genotypes 00, 01, 10, and 11). If there are no main effects and no pairwise effects, this architecture will generate the same phenotype for all four variants – the global average (or zero-order effect). If there are pairwise effects but no main effects, this architecture will generate a set of phenotypes on which the average phenotype of genotypes with a 0 at the first site (00 and 01) equals the global average – as does the average of those with 0 at the second site (00 and 10). The epistatic effect causes the individual genotypes to deviate from the global average. This pattern can be fit only by a pairwise epistatic term, not by first-order terms. Conversely, if there are main effects but no pairwise effects, then the average phenotype of genotypes 00 and 01 will deviate from the global average (by an amount equal to the first-order effect), as will the average of (00 and 10): the phenotype of each genotype will be equal to the sum of the relevant first-order effects for the state it contains. This pattern cannot be fit by second-order model terms. The same logic extends to higher orders: a cluster of second-order terms cannot explain variation generated by third-order epistasis, because third-order variation is by definition is the deviation from the best second-order model.

      (3) Reviewer 1 suggested several places in the text where citations to prior work would be appropriate.

      We appreciate these suggestions and have modified the manuscript to refer to most of these works.

      (4) Reviewer 1 pointed to the paper of Gong et al eLife 2013 and asked whether it is known how robust the proteins in our study are to changes in conformation/stability compared to other proteins, and whether this might impact the likelihood of observing higher-order epistasis in this system.

      The DBDs that we study here are very stable, and previous work shows that mutations affect DNA specificity primarily by modifying the DBD’s affinity rather than its stability (McKeown et al., Cell 2014). Additionally, Gong et al.’s findings pertain to a globally nonlinear relationship between stability and function, which arises from the Boltzmann relationship between the energy of folding and occupancy of the folded state. Because our data are categorical – based on rank-order of measured phenotype rather than fluorescence as a continuous phenotype – the kind of global nonlinearity observed in Gong’s study are not expected to produce spurious estimates of epistasis in our work. We have modified the discussion to discuss the point.

      (5) Reviewer 1 asked a) why the epistatic models produce landscapes on which variants have fewer neighbors on average than main-effects only models and b) why the average distance from all ERE-specific nodes to all SRE-specific nodes is greater with epistasis (but the average distance from ERE to nearest SRE is lower with epistasis).

      In the main effects-only landscape, the functional genotypes are relatively similar to each other, because each must contain several of the states that contribute the most to a positive genetic score. Moreover, ERE-specific nodes are similar to each other, and SRE-specific nodes are similar to each other, because each must contain one or more of a relatively small number of specificity-determining states. When epistasis is added to the genetic architecture, two things happen: 1) more genotypes become functional because there are more combinations that can exceed the threshold score to produce a functional activator and 2) these additional functional variants are more different from each other – in general, and within the classes of ERE- or SRE-specific variants – because there are now more diverse combinations of states that can yield either phenotype. As a result, a broader span of sequence space is occupied, but ERE- and SRE-specific variants are more interspersed with each other. This means that the average distance between all pairs of nodes is greater, and this applies to all ERE-SRE pairs, as well. However, the interspersing means that the closest single SRE to any particular ERE is closer than it was without epistasis. We have added this explanation to the main text.

      (6) Reviewer 2 asked us to explain why average path length increases with pairwise epistasis as the strength of selection for specificity increases.

      This behavior occurs because of the existence of a local peak in the pairwise model. Genotypes on this peak contained few connections to other genotypes, all of which were less SRE specific. Thus, with strong selection, i.e. high population size, the simulations became stuck on the local peak, cycling among the genotypes many times before leaving, resulting in a large increase in the mean step number. As shown in the rest of the figure, when the longest set of paths are removed, there are still differences in the average number of steps with and without epistasis. This issue is described in the methods section.

      (7) Reviewers made several suggestions for clarity in the text and figures.

      We have modified the paper to address all of these comments.

      (8) Reviewer 3 stated that the code should be available.

      The code is available at https://github.com/JoeThorntonLab/DBD.GeneticArchitecture.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors were trying to understand the relationship between the development of large trunks and longirrostrine mandibles in bunodont proboscideans of Miocene, and how it reflects the variation in diet patterns.

      Strengths:

      The study is very well supported, written, and illustrated, with plenty of supplementary material. The findings are highly significant for the understanding of the diversification of bunodont proboscideans in Asia during Miocene, as well as explaining the cranial/jaw disparity of fossil lineages. This work elucidates the diversification of paleobiological aspects of fossil proboscideans and their evolutionary response to open environments in the Neogene using several methods. The authors included all Asian bunodont proboscideans with long mandibles and I suggest that they should use the expression "bunodont proboscideans" instead of gomphotheres.

      Weaknesses:

      I believe that the only weakness is the lack of discussion comparing their results with the development of gigantism and long limbs in proboscideans from the same epoch.

      Thank you for your comprehensive review and positive feedback on our study regarding the co-evolution of feeding organs in bunodont proboscideans during the Miocene. We appreciate your suggestion, and have decided to use the term "bunodont elephantiforms" (for more explicit clarification, we use elephantiforms to exclude some early proboscideans, like Moeritherium, ect.) instead of "gomphotheres," and we will make this change in our revised manuscript. We also appreciate the potential weakness you mentioned regarding the lack of discussion comparing our results with the development of gigantism and long limbs in proboscideans from the same epoch. We agree with the reviewer’s suggestion, and we are aware that gigantism and long limbs are potential factors for trunk development. Gigantism resulted in the loss of flexibility in elephantiforms, and long limbs made it more challenging for them to reach the ground. A long trunk serves as compensation for these limitations. limb bones were rare to find in our material, especially those preserved in association with the skull.

      Reviewer #2 (Public Review):

      This study focuses on the eco-morphology, the feeding behaviors, and the co-evolution of feeding organs of longirostrine gomphotheres (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) which are characterised by their distinctive mandible and mandible tusk morphologies. They also have different evolutionary stages of food acquisition organs which may have co-evolve with extremely elongated mandibular symphysis and tusks. Although these three longirostrine gomphothere families were widely distributed in Northern China in the Early-Middle Miocene, the relative abundances and the distribution of these groups were different through time as a result of the climatic changes and ecosysytems.

      These three groups have different feeding behaviors indicated by different mandibular symphysis and tusk morphologies. Additionally, they have different evolutionary stages of trunks which are reflected by the narial region morphology. To be able to construct the feeding behavior and the relation between the mandible and the trunk of early elephantiformes, the authors examined the crania and mandibles of these three groups from the Early and Middle Miocene of northern China from three different museums and also made different analyses.

      The analyses made in the study are:

      (1) Finite Element (FE) analysis: They conducted two kinds of tests: the distal forces test, and the twig-cutting test. With the distal forces test, advantageous and disadvantageous mechanical performances under distal vertical and horizontal external forces of each group are established. With the twig-cutting test, a cylindrical twig model of orthotropic elastoplasity was posed in three directions to the distal end of the mandibular task to calculate the sum of the equivalent plastic strain (SEPS). It is indicated that all three groups have different mandible specializations for cutting plants.

      (2) Phylogenetic reconstruction: These groups have different narial region morphology, and in connection with this, have different stages of trunk evolution. The phylogenetic tree shows the degree of specialization of the narial morphology. And narial region evolutionary level is correlated with that of character-combine in relation to horizontal cutting. In the trilophodont longirostrine gomphotheres, co-evolution between the narial region and horizontal cutting behaviour is strongly suggested.

      (3) Enamel isotopes analysis: The results of stable isotope analysis indicate an open environment with a diverse range of habitats and that the niches of these groups overlapped without obvious differentiation.

      The analysis shows that different eco-adaptations have led to the diverse mandibular morphology and open-land grazing has driven the development of trunk-specific functions and loss of the long mandible. This conclusion has been achieved with evidence on palaecological reconstruction, the reconstruction of feeding behaviors, and the examination of mandibular and narial region morphology from the detailed analysis during the study.

      All of the analyses are explained in detail in the supplementary files. The 3D models and movies in the supplementary files are detailed and understandable and explain the conclusion. The conclusions of the study are well supported by data.

      We appreciate your detailed and insightful review of our study. Your summary accurately captures the essence of our research, and we are pleased to note that multiple research methods were used to demonstrate our conclusions. Your recognition of the evidence-based conclusions from paleoecological, feeding behavior reconstruction, and morphological analyses reinforces the validity of our findings. Once again, we appreciate your time and thoughtful reviews.

      Reviewer #1 (Recommendations For The Authors):

      Thank you very much for the invitation to review this amazing manuscript. It is very well written and supported, and I have only minor suggestions to improve the text:

      (1) Some references are not in chronological sequence in the text, and this should be reviewed.

      We greatly appreciate the positive comments of the reviewer. We revised the reference of the manuscript as the reviewer’s suggestion.

      (2) I suggest the use of the expression "bunodont proboscideans" instead of Gomphotheres because there is no agreement if Amebelodontidae and Choerolophodontidae are within Gomphotheriidae, as well as some brevirrostrine bunodont proboscideans from South America. So I think it is ok to use "Gomphotheriidae", but not gomphotheres to refer to all bunodont proboscideans included in the study.

      The reviewer is correct. Using “gomphotheres” to refer to these three groups is inappropriate. We have replaced “gomphotheres” with "bunodont elephantiforms" throughout the entire manuscript. Here, we use “elephantiforms”, not “proboscideans”, to avoid confusion with some early proboscidean members like Moeritherium, ect.

      (3) I was expecting some discussion on the development of large trunks related to the gigantism in these bunodont proboscideans, regarding the huge skulls and the columnar limbs.

      We appreciate this suggestion, and we are aware that gigantism is a potential factor for trunk development. It is difficult to compare the three groups (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) in terms of their weight and limb bone length, because in our material, limb bones were rarely found, especially those associated with cranial material. Nevertheless, at this stage, all elephantiforms had significantly enlarged cranial sizes and limb bone lengths compared to early members like Phiomia. Gigantism caused the loss of flexibility in elephantiforms, and even the long limbs made it more difficult for an elephantiform to reach the ground. A long trunk compensates for this evolutionary change. Exploring these aspects further is a part of our future work.

      (4) The reference to Alejandro et al should be replaced by Kramarz et al (and the correct surname of the authors). The name and surname of this reference need to be corrected. The correct names are Kramarz, A., Garrido, A., Bond, M. 2019. Please correct this in the text too.

      We thank the reviewer for catching this error. This reference has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      I believe your paper will lead to other studies on other Proboscidean groups on the evolution of the mandible and trunk. There are some corrections in the text:

      • In line 199 in the text in pdf, "Tassy, 1994" should be "Tassy, 1996".

      • In line 241, "studied" should be "studies"

      • In line 313, "," after the word "tool" should be "."

      We appreciate the reviewer for pointing these errors out and have revised these based on the suggestions.

      • In the References, you write "et al." in some references. You should write the names of all of the authors.

      • In the References: "Lister AM. 2013" and "Shoshani&Tassy" are not referenced in the text.

      • In the References: "Tassy P. Gaps, parsimony, and early Miocene elephantoids (Mammalia), with a re-evaluation of Gomphotherium annectens (Matsumoto, 1925). Zool. J. Linn." should be "Tassy P. 1994. Gaps, parsimony, and early Miocene elephantoids (Mammalia), with a re-evaluation of Gomphotherium annectens (Matsumoto, 1925). Zool. J. Linn. 112, 1-2, 101-117" and replaced before "Tassy P. 1996".

      We appreciate the reviewer’s suggestions and have revised these references.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1

      The authors provided experimental data in response to my comments/suggestions in the revision. Overall, most points were appropriate and satisfactory, but some issues remain.

      (1) It is not fully addressed how atypical survivors are generated independently of Rad52-mediated homologous recombination.

      The newly provided data indicate that the formation of atypical telomeres is independent of the Rad52 homologous recombination pathway.

      "The atypical telomeres clones exhibit non-uniform telomere pattern", but the TG-hybridized signals after XhoI digestion are clear and uniform.

      "Atypical telomere" clones may carry circular chromosomes embedded with short TG repeats, rather than linear chromosomes. In other words, atypical telomeres may differ from telomeres, the ends of chromosomes. Is atypical telomere formation dependent on NHEJ? Given that "two chromosomes underwent intra-chromosomal fusions" (Line 248), are atypical telomere clones detected frequently in SY13 cells containing two chromosomes?

      We thank the reviewer’s questions. Frankly, we have not been able to determine the chromosome structures in these so-called "atypical survivors". As we mentioned in the manuscript, there could be mixed telomere structures, e.g. TG tract amplification, intro-chromosome telomere fusion and inter-chromosome telomere fusion. Worse still, these 'atypical survivors' may not have maintained a stable genome, and their karyotype may have undergone stochastic changes during passages. To avoid misunderstanding, we change the term "atypical" to "uncharacterized" in the revised manuscript.

      We have previously shown that deletion of YKU70 does not affect MMEJ-mediated intra-chromosome fusion in single-chromosome SY14 cdc13Δ cells (Wu et al., 2020). In SY12 cells, double knockout of TLC1 and YKU resulted in synthetic lethality, and we were unable to continue our investigation. The result of synthetic lethality of TLC1 and YKU70 double deletion was shown in the Figure 7B in the reviewed preprint version 1, and the result was not included in the reviewed preprint version 2 in accordance with the reviewer's instructions.

      "Atypical” survivors could be detected in SY13 cells (Figure 1D), but the frequency of their formation in the SY13 strain appeared to be lower than in SY12. As one can imagine, SY13 contains two chromosomes and its survivors should have a higher frequency of intra-chromosome fusions.

      (2) From their data, it is possible that X and Y elements influence homologous recombination, type 1 and type 2 (type X), at telomeres. In particular, the presence of X and Y elements appears to be important for promoting type 1 recombination. In other words, although not essential, subtelomeres have some function in maintaining telomeres. I suggest that the authors include author response image 4 in the text. They could revise their conclusion and the paper title accordingly.

      According to this suggestion, we have included author response image 4 in the revised manuscript as Figure 2E, Figure 5D, Figure 6C and Figure 6E. Accordingly, we have changed the title as “Elimination of subtelomeric repeat sequences exerts little effect on telomere essential functions in Saccharomyces cerevisiae”.

      (3) Minor points: The newly added data indicate that X survivors are generated in a type 2-dependent manner. The authors could discuss how Y elements were eroded while retaining X elements (line 225, Figure 2A).

      Thank this reviewer’s suggestion. We have discussed it in the revised manuscript (p.13 line 244-245). When telomere was deprotected, chromosome end resection took place. Since SY12 only has one Y’-element, it is hard to search homology sequences to repair the Y’-element in XVI-L. When the X-element in XVI-L was exposed by further resection, it is easier to find homology sequences to repair. So, in Type X survivor the Y’-element was eroded while retaining X-element.

      Reviewer #2

      I would like to congratulate the authors for their work and the efforts they put in improving the manuscript. The major criticism I had previously, ie testing the genetic requirements for the survivor subtypes, has been met. Below are a few minor comments that don't necessarily require a response.

      (1) I think the Author response image 6 could have been included in the manuscript. I understand that the authors don't want to overinterpret survivor subtype frequencies, but this figure would have suggested some implication of Rad51 in the emergence of survivors even in the absence of Y' elements. At this stage, however, it is up to the authors, and leaving this figure out is also fine in my opinion.

      According to the suggestion, the author response image 6 has been presented as Figure 6—figure supplement 7.

      (2) Chromosome circularization seems to rely on microhomologies. Previously, the authors proposed that SY14 circularization depended on SSA (Wu et al. 2020), but here, since circularization appears to be Rad52-independent, it is likely to be based on MMEJ rather than SSA (although there are contradictory results on Rad52's role in MMEJ in the literature).

      Yes, we mentioned it in the revised manuscript.

      (3) p. 28 lines 511-513: "The erosion sites and fusion sequences differed from those observed in SY12 tlc1Δ-C1 cells (Figure 2D), suggesting the stochastic nature of chromosomal circularization": I don't think they are necessarily stochastic, because the sequences beyond the telomeres are now modified, the available microhomologies have changed as well.

      We agreed with your opinion. In different chromosomes, there tend to be some hotspots for chromosome fusion. For example, in Figure 6C and 6F the resection site in Chr1 and Chr2 was the same in SY12XYΔ+Y tlc1Δ-C1 and SY12XYΔ tlc1Δ-C1. So, we speculate that there are some hotspots for chromosome fusion, but which site the cell will choose in one round chromosome fusion event is stochastic.

      (4) Typos and other errors:

      • p. 3 line 52: "subtelomerice" and "varies" are mispelled.

      • p. 5 line 78: "processes" should be "process".

      • Supp files are mislabelled (the numbers do not correspond to file name).

      • Supp file 2: how come SY12 has only one Y' element and SY13 has two?

      • p. 10 line 175: "emerging" should be "emergence".

      • p.15 line 276: "counter-selected" should be "being counter-selected" or "counterselection".

      • p. 29 line 523: "the formation of them" should be "their formation".

      • p. 37 line 653: "could have been an ideal tool": the sentence is grammatically incorrect. Writing "AND could have been an ideal tool" is enough to make it structurally correct.

      Thanks for pointing these errors out. We have corrected them in the revised manuscript. For the question “how come SY12 has only one Y' element and SY13 has two?” we were not sure at this moment. We speculated that one of the Y’ might be lost during genetic engineering of the chromosomes by CRISPR–Cas9 system.

      Reviewer #3

      The authors included statistical analyses of the qPCR data (Fig 4B) as requested, but did not comment on the striking difference in expression of MPH3 and HSP32 in the SY12 strain compared to BY4742. An improvement of the manuscript is the inclusion of rad52 tlc1 strains in their analyses, demonstrating that the "atypical and circular survivors" arose independently of homologous recombination. In addition, by analyzing rad51 and rad50 mutant strain they could demonstrate that the "type X" survivors had similar molecular requirements to type II survivors. Overall, the revised submission improves the article.

      We thank the reviewer’s comments and suggestions. The SY12 strain (with three chromosomes) exhibited lower expression levels of both MPH3 and HSP32 compared to the parental strain BY4742 (with 16 chromosomes). We speculated that with the reduced chromosome numbers, the silencing proteins appeared to no longer be titrated by other telomeres that have been deleted. We have added these comments in the revised manuscript.

      Wu, Z.J., Liu, J.C., Man, X., Gu, X., Li, T.Y., Cai, C., He, M.H., Shao, Y., Lu, N., Xue, X., et al. (2020). Cdc13 is predominant over Stn1 and Ten1 in preventing chromosome end fusions. Elife 9.

    1. Reviewer #2 (Public Review):

      In the manuscript "Full-length direct RNA sequencing uncovers stress-granule dependent RNA decay upon cellular stress", Dar, Malla, and colleagues use direct RNA sequencing on nanopores to characterize the transcriptome after arsenite and oxidative stress. They observe a population of transcripts that are shortened during stress. The authors hypothesize that this shortening is mediated by the 5'-3' exonuclease XRN1, as XRN1 knockdown results in longer transcripts. Interestingly, the authors do not observe a polyA-tail shortening, which is typically thought to precede decapping and XRN1-mediated transcript decay. Finally, the authors use G3BP1 knockout cells to demonstrate that stress granule formation is required for the observed transcript shortening.

      The manuscript contains intriguing findings of interest to the mRNA decay community. That said, it appears that the authors at times overinterpret the data they get from a handful of direct RNA sequencing experiments. To bolster some of the statements additional experiments might be desirable.

      A selection of comments:

      (1) Considering that the authors compare the effects of stress, stress granule formation, and XRN1 loss on transcriptome profiles, it would be desirable to use a single-cell system (and validated in a few more). Most of the direct RNAseq is performed in HeLa cells, but the experiments showing that stress granule formation is required come from U2OS cells, while short RNAseq data showing loss of coverage on mRNA 5'ends is reanalyzed from HEK293 cells. It may be plausible that the same pathways operate in all those cells, but it is not rigorously demonstrated.

      (2) An interesting finding of the manuscript is that polyA tail shortening is not observed prior to transcript shortening. The authors would need to demonstrate that their approach is capable of detecting shortened polyA tails. Using polyA purified RNA to look at the status of polyA tail length may not be ideal (as avidity to oligodT beads may increase with polyA tail length and therefore the authors bias themselves to longer tails anyway). At the very least, the use of positive controls would be desirable; e.g. knockdown of CCR4/NOT.

      (3) The authors use a strategy of ligating an adapter to 5' phosphorylated RNA (presumably the breakdown fragments) to be able to distinguish true mRNA fragments from artifacts of abortive nanopore sequencing. This is a fantastic approach to curating a clean dataset. Unfortunately, the authors don't appear to go through with discarding fragments that are not adapter-ligated (presumably to increase the depth of analysis; they do offer Figure 1e that shows similar changes in transcript length for fragments with adapter, compared to Figure 1d). It would be good to know how many reads in total had the adapter. Furthermore, it would be good to know what percentage of reads without adapters are products of abortive sequencing. What percentage of reads had 5'OH ends (could be answered by ligating a different adapter to kinase-treated transcripts). More read curation would also be desirable when building the metagene analysis - why do the authors include every 3'end of sequenced reads (their RNA purification scheme requires a polyA tail, so non-polyadenylated fragments are recovered in a non-quantitative manner and should be discarded).

      (4) The authors should come to a clear conclusion about what "transcript shortening" means. Is it exonucleolytic shortening from the 5'end? They cannot say much about the 3'ends anyway (see above). Or are we talking about endonucleolytic cuts leaving 5'P that then can be attached by XRN1 (again, what is the ratio of 5'P and 5'OH fragments; also, what is the ratio of shortened to full-length RNA)?

      (5) The authors should clearly explain how they think the transcript shortening comes about. They claim it does not need polyA shortening, but then do not explain where the XRN1 substrate comes from. Does their effect require decapping? Or endonucleolytic attacks?

      (6) XRN1 KD results in lengthened transcripts. That is not surprising as XRN1 is an exonuclease - and XRN1 does not merely rescue arsenite stress-mediated transcript shortening, but results in a dramatic transcript lengthening.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Comments

      Reviewer 1

      (1) Despite the well-established role of Netrin-1 and UNC5C axon guidance during embryonic commissural axons, it remains unclear which cell type(s) express Netrin-1 or UNC5C in the dopaminergic axons and their targets. For instance, the data in Figure 1F-G and Figure 2 are quite confusing. Does Netrin-1 or UNC5C express in all cell types or only dopamine-positive neurons in these two mouse models? It will also be important to provide quantitative assessments of UNC5C expression in dopaminergic axons at different ages.

      Netrin-1 is a secreted protein and in this manuscript we did not examine what cell types express Netrin-1. This question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present. As per the reviewer’s request we include below images showing Netrin-1 protein and Netrin-1 mRNA expression in the forebrain. In Figure 1 below, we show a high magnification immunofluorescent image of a coronal forebrain section showing Netrin-1 protein expression.

      Author response image 1.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      In Figures 2 and 3 below we show low and high magnification images from an RNAscope experiment confirming that cells in the forebrain regions examined express Netrin-1 mRNA.

      Author response image 2.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, fmi: forceps minor of the corpus callosum, IL: Infralimbic Cortex, PrL: Prelimbic Cortex

      Author response image 3.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      Regarding UNC5c, this receptor homologue is expressed by dopamine neurons in the rodent ventral tegmental area (Daubaras et al., 2014; Manitt et al., 2010; Phillips et al., 2022). This does not preclude UNC5c expression in other cell types. UNC5c receptors are ubiquitously expressed in the brain throughout development, performing many different developmental functions (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). In this study we are interested in UNC5c expression by dopamine neurons, and particularly by their axons projecting to the nucleus accumbens. We therefore used immunofluorescent staining in the nucleus accumbens, showing UNC5 expression in TH+ axons. This work adds to the study by Manitt et al., 2010, which examined UNC5 expression in the VTA. Manitt et al. used Western blotting to demonstrate that UNC5 expression in VTA dopamine neurons increases during adolescence, as can be seen in the following figure:

       References:
      

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.20110.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (2) Figure 1 used shRNA to knockdown Netrin-1 in the Septum and these mice were subjected to behavioral testing. These results, again, are not supported by any valid data that the knockdown approach actually worked in dopaminergic axons. It is also unclear whether knocking down Netrin-1 in the septum will re-route dopaminergic axons or lead to cell death in the dopaminergic neurons in the substantia nigra pars compacta?

      First we want to clarify and emphasize, that our knockdown approach was not designed to knock down Netrin-1 in dopamine neurons or their axons. Our goal was to knock down Netrin-1 expression in cells expressing this guidance cue gene in the dorsal peduncular cortex.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      We agree that our experiments do not address the fate of the dopamine axons that are misrouted away from the medial prefrontal cortex. This research is ongoing, and we have now added a note regarding this to our manuscript.

      Our current hypothesis, based on experiments being conducted as part of another line of research in the lab, is that these axons are rerouted to a different brain region which they then ectopically innervate. In these experiments we are finding that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References:

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (3) Another issue with Figure1J. It is unclear whether the viruses were injected into a WT mouse model or into a Cre-mouse model driven by a promoter specifically expresses in dorsal peduncular cortex? The authors should provide evidence that Netrin-1 mRNA and proteins are indeed significantly reduced. The authors should address the anatomic results of the area of virus diffusion to confirm the virus specifically infected the cells in dorsal peduncular cortex.

      All the virus knockdown experiments were conducted in wild type mice, we added this information to Figure 1k.

      The efficacy of the shRNA in knocking down Netrin-1 was demonstrated by Cuesta et al. (2020) both in vitro and in vivo, as we show in our response to the reviewer’s previous comment above.

      We also now provide anatomical images demonstrating the localization of the injection and area of virus diffusion in the mouse forebrain. In Author response image 4 below the area of virus diffusion is visible as green fluorescent signal.

      Author response image 4.

      Fluorescent microscopy image of a mouse forebrain demonstrating the localization of the injection of a virus to knock down Netrin-1. The location of the virus is in green, while cell nuclei are in blue (DAPI). Abbreviations: DP: dorsopeduncular cortex IL: infralimbic cortex

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (4) The authors need to provide information regarding the efficiency and duration of knocking down. For instance, in Figure 1K, the mice were tested after 53 days post injection, can the virus activity in the brain last for such a long time?

      In our study we are interested in the role of Netrin-1 expression in the guidance of dopamine axons from the nucleus accumbens to the medial prefrontal cortex. The critical window for these axons leaving the nucleus accumbens and growing to the cortex is early adolescence (Reynolds et al., 2018b). This is why we injected the virus at the onset of adolescence, at postnatal day 21. As dopamine axons grow from the nucleus accumbens to the prefrontal cortex, they pass through the dorsal peduncular cortex. We disrupted Netrin-1 expression at this point along their route to determine whether it is the Netrin-1 present along their route that guides these axons to the prefrontal cortex. We hypothesized that the shRNA Netrin-1 virus would disrupt the growth of the dopamine axons, reducing the number of axons that reach the prefrontal cortex and therefore the number of axons that innervate this region in adulthood.

      We conducted our behavioural tests during adulthood, after the critical window during which dopamine axon growth occurs, so as to observe the enduring behavioral consequences of this misrouting. This experimental approach is designed for the shRNa Netrin-1 virus to be expressed in cells in the dorsopeduncular cortex when the dopamine axons are growing, during adolescence.

       References:
      

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      (5) In Figure 1N-Q, silencing Netrin-1 results in less DA axons targeting to infralimbic cortex, but why the Netrin-1 knocking down mice revealed the improved behavior?

      This is indeed an intriguing finding, and we have now added a mention of it to our manuscript. We have demonstrated that misrouting dopamine axons away from the medial prefrontal cortex during adolescence alters behaviour, but why this improves their action impulsivity ability is something currently unknown to us. One potential answer is that the dopamine axons are misrouted to a different brain region that is also involved in controlling impulsive behaviour, perhaps the dorsal striatum (Kim and Im, 2019) or the orbital prefrontal cortex (Jonker et al., 2015).

      We would also like to note that we are finding that other manipulations that appear to reroute dopamine axons to unintended targets can lead to reduced action impulsivity as measured using the Go No Go task. As we mentioned above, current experiments in the lab, which are part of a different line of research, are showing that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood, but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro2014-0043 Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      (6) What is the effect of knocking down UNC5C on dopamine axons guidance to the cortex?

      We have found that mice that are heterozygous for a nonsense Unc5c mutation, and as a result have reduced levels of UNC5c protein, show reduced amphetamine-induced locomotion and stereotypy (Auger et al., 2013). In the same manuscript we show that this effect only emerges during adolescence, in concert with the growth of dopamine axons to the prefrontal cortex. This is indirect but strong evidence that UNC5c receptors are necessary for correct adolescent dopamine axon development.

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (7) In Figures 2-4, the authors only showed the amount of DA axons and UNC5C in NAcc. However, it remains unclear whether these experiments also impact the projections of dopaminergic axons to other brain regions, critical for the behavioral phenotypes. What about other brain regions such as prefrontal cortex? Do the projection of DA axons and UNC5c level in cortex have similar pattern to those in NAcc?

      UNC5c receptors are expressed throughout development and are involved in many developmental processes (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). We cannot say whether the pattern we observe here is unique to the nucleus accumbens, but it is certainly not universal throughout the brain.

      The brain region we focus on in our manuscript, in addition to the nucleus accumbens, is the medial prefrontal cortex. Close and thorough examination of the prefrontal cortices of adult mice revealed practically no UNC5c expression by dopamine axons. However, we did observe very rare cases of dopamine axons expressing UNC5c. It is not clear whether these rare cases are present before or during adolescence.

      Below is a representative set of images of this observation, which is now also included as Supplementary Figure 4:

      Author response image 5.

      Expression of UNC5c protein in the medial prefrontal cortex of an adult male mouse. Low (A) and high (B) magnification images demonstrate that there is little UNC5c expression in dopamine axons in the medial prefrontal cortex. Here we identify dopamine axons by immunofluorescent staining for tyrosine hydroxylase (TH, see our response to comment #9 regarding the specificity of the TH antibody for dopamine axons in the prefrontal cortex). This figure is also included as Supplementary Figure 4 in the manuscript. Abbreviations: fmi: forceps minor of the corpus callosum, mPFC: medial prefrontal cortex.

      References:

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254- 10.20110.2011

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (8) Can overexpression of UNC5c or Netrin-1 in male winter hamsters mimic the observations in summer hamsters? Or overexpression of UNC5c in female summer hamsters to mimic the winter hamster? This would be helpful to confirm the causal role of UNC5C in guiding DA axons during adolescence.

      This is an excellent question. We are very interested in both increasing and decreasing UNC5c expression in hamster dopamine axons to see if we can directly manipulate summer hamsters into winter hamsters and vice versa. We are currently exploring virus-based approaches to design these experiments and are excited for results in this area.

      (9) The entire study relied on using tyrosine hydroxylase (TH) as a marker for dopaminergic axons. However, the expression of TH (either by IHC or IF) can be influenced by other environmental factors, that could alter the expression of TH at the cellular level.

      This is an excellent point that we now carefully address in our methods by adding the following:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      Furthermore, we are not aware of any other processes in the forebrain that are known to be immunopositive for TH under any environmental conditions.

      To reduce confusion, we have replaced the abbreviation for dopamine – DA – with TH in the relevant panels in Figures 1, 2, 3, and 4 to clarify exactly what is represented in these images. As can be seen in these images, fluorescent green labelling is present only in axons, which is to be expected of dopamine labelling in these forebrain regions.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (10) Are Netrin-1/UNC5C the only signal guiding dopamine axon during adolescence? Are there other neuronal circuits involved in this process?

      Our intention for this study was to examine the role of Netrin-1 and its receptor UNC5C specifically, but we do not suggest that they are the only molecules to play a role. The process of guiding growing dopamine axons during adolescence is likely complex and we expect other guidance mechanisms to also be involved. From our previous work we know that the Netrin-1 receptor DCC is critical in this process (Hoops and Flores, 2017; Reynolds et al., 2023). Several other molecules have been identified in Netrin-1/DCC signaling processes that control corpus callosum development and there is every possibility that the same or similar molecules may be important in guiding dopamine axons (Schlienger et al., 2023).

      References:

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      (11) Finally, despite the authors' claim that the dopaminergic axon project is sensitive to the duration of daylight in the hamster, they never provided definitive evidence to support this hypothesis.

      By “definitive evidence” we think that the reviewer is requesting a single statistical model including measures from both the summer and winter groups. Such a model would provide a probability estimate of whether dopamine axon growth is sensitive to daylight duration. Therefore, we ran these models, one for male hamsters and one for female hamsters.

      In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      Reviewer 3

      (1) Fig 1 A and B don't appear to be the same section level.

      The reviewer is correct that Fig 1B is anterior to Fig 1A. We have changed Figure 1A to match the section level of Figure 1B.

      (2) Fig 1C. It is not clear that these axons are crossing from the shell of the NAC.

      We have added a dashed line to Figure 1C to highlight the boundary of the nucleus accumbens, which hopefully emphasizes that there are fibres crossing the boundary. We also include here an enlarged image of this panel:

      Author response image 6.

      An enlarged image of Figure1c in the manuscript. The nucleus accumbens (left of the dotted line) is densely packed with TH+ axons (in green). Some of these TH+ axons can be observed extending from the nucleus accumbens medially towards a region containing dorsally oriented TH+ fibres (white arrows).

      (3) Fig 1. Measuring width of the bundle is an odd way to measure DA axon numbers. First the width could be changing during adult for various reasons including change in brain size. Second, I wouldn't consider these axons in a traditional bundle. Third, could DA axon counts be provided, rather than these proxy measures.

      With regards to potential changes in brain size, we agree that this could have potentially explained the increased width of the dopamine axon pathway. That is why it was important for us to use stereology to measure the density of dopamine axons within the pathway. If the width increased but no new axons grew along the pathway, we would have seen a decrease in axon density from adolescence to adulthood. Instead, our results show that the density of axons remained constant.

      We agree with the reviewer that the dopamine axons do not form a traditional “bundle”. Therefore, throughout the manuscript we now avoid using the term bundle.

      Although we cannot count every single axon, an accurate estimate of this number can be obtained using stereology, an unbiassed method for efficiently quantifying large, irregularly distributed objects. We used stereology to count TH+ axons in an unbiased subset of the total area occupied by these axons. Unbiased stereology is the gold-standard technique for estimating populations of anatomical objects, such as axons, that are so numerous that it would be impractical or impossible to measure every single one. Here and elsewhere we generally provide results as densities and areas of occupancy (Reynolds et al., 2022). To avoid confusion, we now clarify that we are counting the width of the area that dopamine axons occupy (rather than the dopamine axon “bundle”).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (4) TH in the cortex could also be of noradrenergic origin. This needs to be ruled out to score DA axons

      This is the same comment as Reviewer 1 #9. Please see our response below, which we have also added to our methods:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (5) Netrin staining should be provided with NeuN + DAPI; its not clear these are all cell bodies. An in situ of Netrin would help as well.

      A similar comment was raised by Reviewer 1 in point #1. Please see below the immunofluorescent and RNA scope images showing expression of Netrin-1 protein and mRNA in the forebrain.

      Author response image 7.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      Author response image 8.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). RNAscope was used to generate this image. Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, IL: Infralimbic Cortex, PrL: Prelimbic Cortex, fmi: forceps minor of the corpus callosum

      Author response image 9.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      (6) The Netrin knockdown needs validation. How strong was the knockdown etc?

      This comment was also raised by Reviewer 1 #1.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (7) If the conclusion that knocking down Netrin in cortex decreases DA innervation of the IL, how can that be reconciled with Netrin-Unc repulsion.

      This is an intriguing question and one that we are in the planning stages of addressing with new experiments.

      Although we do not have a mechanistic answered for how a repulsive receptor helps guide these axons, we would like to note that previous indirect evidence from a study by our group also suggests that reducing UNC5c signaling in dopamine axons in adolescence increases dopamine innervation to the prefrontal cortex (Auger et al, 2013).

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (8) The behavioral phenotype in Fig 1 is interesting, but its not clear if its related to DA axons/signaling. IN general, no evidence in this paper is provided for the role of DA in the adolescent behaviors described.

      We agree with the reviewer that the behaviours we describe in adult mice are complex and are likely to involve several neurotransmitter systems. However, there is ample evidence for the role of dopamine signaling in cognitive control behaviours (Bari and Robbins, 2013; Eagle et al., 2008; Ott et al., 2023) and our published work has shown that alterations in the growth of dopamine axons to the prefrontal cortex leads to changes in impulse control as measured via the Go/No-Go task in adulthood (Reynolds et al., 2023, 2018a; Vassilev et al., 2021).

      The other adolescent behaviour we examined was risk-like taking behaviour in male and female hamsters (Figures 4 and 5), as a means of characterizing maturation in this behavior over time. We decided not to use the Go/No-Go task because as far as we know, this has never been employed in Siberian Hamsters and it will be difficult to implement. Instead, we chose the light/dark box paradigm, which requires no training and is ideal for charting behavioural changes over short time periods. Indeed, risk-like taking behavior in rodents and in humans changes from adolescence to adulthood paralleling changes in prefrontal cortex development, including the gradual input of dopamine axons to this region.

      References:

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: cross-species translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439–456. doi:10.1007/s00213-008-1127-6

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      (9) Fig2 - boxes should be drawn on the NAc diagram to indicate sampled regions. Some quantification of Unc5c would be useful. Also, some validation of the Unc5c antibody would be nice.

      The images presented were taken medial to the anterior commissure and we have edited Figure 2 to show this. However, we did not notice any intra-accumbens variation, including between the core and the shell. Therefore, the images are representative of what was observed throughout the entire nucleus accumbens.

      To quantify UNC5c in the accumbens we conducted a Western blot experiment in male mice at different ages. A one-way ANOVA analyzing band intensity (relative to the 15-day-old average band intensity) as the response variable and age as the predictor variable showed a significant effect of age (F=5.615, p=0.01). Posthoc analysis revealed that 15-day-old mice have less UNC5c in the nucleus accumbens compared to 21- and 35-day-old mice.

      Author response image 10.

      The graph depicts the results of a Western blot experiment of UNC5c protein levels in the nucleus accumbens of male mice at postnatal days 15, 21 or 35 and reveals a significant increase in protein levels at the onset adolescence.

      Our methods for this Western blot were as follows: Samples were prepared as previously (Torres-Berrío et al., 2017). Briefly, mice were sacrificed by live decapitation and brains were flash frozen in heptane on dry ice for 10 seconds. Frozen brains were mounted in a cryomicrotome and two 500um sections were collected for the nucleus accumbens, corresponding to plates 14 and 18 of the Paxinos mouse brain atlas. Two tissue core samples were collected per section, one for each side of the brain, using a 15-gauge tissue corer (Fine surgical tools Cat no. NC9128328) and ejected in a microtube on dry ice. The tissue samples were homogenized in 100ul of standard radioimmunoprecipitation assay buffer using a handheld electric tissue homogenizer. The samples were clarified by centrifugation at 4C at a speed of 15000g for 30 minutes. Protein concentration was quantified using a bicinchoninic acid assay kit (Pierce BCA protein assay kit, Cat no.PI23225) and denatured with standard Laemmli buffer for 5 minutes at 70C. 10ug of protein per sample was loaded and run by SDS-PAGE gel electrophoresis in a Mini-PROTEAN system (Bio-Rad) on an 8% acrylamide gel by stacking for 30 minutes at 60V and resolving for 1.5 hours at 130V. The proteins were transferred to a nitrocellulose membrane for 1 hour at 100V in standard transfer buffer on ice. The membranes were blocked using 5% bovine serum albumin dissolved in tris-buffered saline with Tween 20 and probed with primary (UNC5c, Abcam Cat. no ab302924) and HRP-conjugated secondary antibodies for 1 hour. a-tubulin was probed and used as loading control. The probed membranes were resolved using SuperSignal West Pico PLUS chemiluminescent substrate (ThermoFisher Cat no.34579) in a ChemiDoc MP Imaging system (Bio-Rad). Band intensity was quantified using the ChemiDoc software and all ages were normalized to the P15 age group average.

      Validation of the UNC5c antibody was performed in the lab of Dr. Liu, from whom it was kindly provided. Briefly, in the validation study the authors showed that the anti-UNC5C antibody can detect endogenous UNC5C expression and the level of UNC5C is dramatically reduced after UNC5C knockdown. The antibody can also detect the tagged-UNC5C protein in several cell lines, which was confirmed by a tag antibody (Purohit et al., 2012; Shao et al., 2017).

      References:

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      (10) "In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, and reduction in UNC5C expression appears to cause growth of mesolimbic dopamine axons to the prefrontal cortex".....This is confusing. Figure 2 shows a developmental increase in UNc5c not a decrease. So when is the "reduction in Unc5c expression" occurring?

      We apologize for the mistake in this sentence. We have corrected the relevant passage in our manuscript as follows:

      In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, particularly when mesolimbic and mesocortical dopamine projections segregate in the nucleus accumbens (Manitt et al., 2010; Reynolds et al., 2018a). In contrast, dopamine axons in the prefrontal cortex do not express UNC5c except in very rare cases (Supplementary Figure 4). In adult male mice with Unc5c haploinsufficiency, there appears to be ectopic growth of mesolimbic dopamine axons to the prefrontal cortex (Auger et al., 2013). This miswiring is associated with alterations in prefrontal cortex-dependent behaviours (Auger et al., 2013).

      References:

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      (11) In Fig 3, a statistical comparison should be made between summer male and winter male, to justify the conclusions that the winter males have delayed DA innervation.

      This analysis was also suggested by Reviewer 1, #11. Here is our response:

      We analyzed the summer and winter data together in ANOVAs separately for males and females. In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      (12) Should axon length also be measured here (Fig 3)? It is not clear why the authors have switched to varicosity density. Also, a box should be drawn in the NAC cartoon to indicate the region that was sampled.

      It is untenable to quantify axon length in the prefrontal cortex as we cannot distinguish independent axons. Rather, they are “tangled”; they twist and turn in a multitude of directions as they make contact with various dendrites. Furthermore, they branch extensively. It would therefore be impossible to accurately quantify the number of axons. Using unbiased stereology to quantify varicosities is a valid, well-characterized and straightforward alternative (Reynolds et al., 2022).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (13) In Fig 3, Unc5c should be quantified to bolster the interesting finding that Unc5c expression dynamics are different between summer and winter hamsters. Unc5c mRNA experiments would also be important to see if similar changes are observed at the transcript level.

      We agree that it would be very interesting to see how UNC5c mRNA and protein levels change over time in summer and winter hamsters, both in males, as the reviewer suggests here, and in females. We are working on conducting these experiments in hamsters as part of a broader expansion of our research in this area. These experiments will require a lengthy amount of time and at this point we feel that they are beyond the scope of this manuscript.

      (14) Fig 4. The peak in exploratory behavior in winter females is counterintuitive and needs to be better discussed. IN general, the light dark behavior seems quite variable.

      This is indeed a very interesting finding, which we have expanded upon in our manuscript as follows:

      When raised under a winter-mimicking daylength, hamsters of either sex show a protracted peak in risk taking. In males, it is delayed beyond 80 days old, but the delay is substantially less in females. This is a counterintuitive finding considering that dopamine development in winter females appears to be accelerated. Our interpretation of this finding is that the timing of the risk-taking peak in females may reflect a balance between different adolescent developmental processes. The fact that dopamine axon growth is accelerated does not imply that all adolescent maturational processes are accelerated. Some may be delayed, for example those that induce axon pruning in the cortex. The timing of the risk-taking peak in winter female hamsters may therefore reflect the amalgamation of developmental processes that are advanced with those that are delayed – producing a behavioural effect that is timed somewhere in the middle. Disentangling the effects of different developmental processes on behaviour will require further experiments in hamsters, including the direct manipulation of dopamine activity in the nucleus accumbens and prefrontal cortex.

      Full Reference List

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: crossspecies translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439– 456. doi:10.1007/s00213-008-1127-6

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro-2014-0043

      Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1-mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      Torres-Berrío A, Lopez JP, Bagot RC, Nouel D, Dal-Bo G, Cuesta S, Zhu L, Manitt C, Eng C, Cooper HM, Storch K-F, Turecki G, Nestler EJ, Flores C. 2017. DCC Confers Susceptibility to Depression-like Behaviors in Humans and Mice and Is Regulated by miR-218. Biological psychiatry 81:306–315. doi:10.1016/j.biopsych.2016.08.017

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      Private Comments

      Reviewer #1

      (12) The language should be improved. Some expression is confusing (line178-179). Also some spelling errors (eg. Figure 1M).

      We have removed the word “Already” to make the sentence in lines 178-179 clearer, however we cannot find a spelling error in Figure 1M or its caption. We have further edited the manuscript for clarity and flow.

      Reviewer #2

      (1) The authors claim to have revealed how the 'timing of adolescence is programmed in the brain'. While their findings certainly shed light on molecular, circuit and behavioral processes that are unique to adolescence, their claim may be an overstatement. I suggest they refine this statement to discuss more specifically the processes they observed in the brain and animal behavior, rather than adolescence itself.

      We agree with the reviewer and have revised the manuscript to specify that we are referring to the timing of specific developmental processes that occur in the adolescent brain, not adolescence overall.

      (2) Along the same lines, the authors should also include a more substantiative discussion of how they selected their ages for investigation (for both mice and hamsters), For mice, their definition of adolescence (P21) is earlier than some (e.g. Spear L.P., Neurosci. and Beh. Reviews, 2000).

      There are certainly differences of opinion between researchers as to the precise definition of adolescence and the period it encompasses. Spear, 2000, provides one excellent discussion of the challenges related to identifying adolescence across species. This work gives specific ages only for rats, not mice (as we use here), and characterizes post-natal days 28-42 as being the conservative age range of “peak” adolescence (page 419, paragraph 1). Immediately thereafter the review states that the full adolescent period is longer than this, and it could encompass post-natal days 20-55 (page 419, paragraph 2).

      We have added the following statement to our methods:

      There is no universally accepted way to define the precise onset of adolescence. Therefore, there is no clear-cut boundary to define adolescent onset in rodents (Spear, 2000). Puberty can be more sharply defined, and puberty and adolescence overlap in time, but the terms are not interchangeable. Puberty is the onset of sexual maturation, while adolescence is a more diffuse period marked by the gradual transition from a juvenile state to independence. We, and others, suggest that adolescence in rodents spans from weaning (postnatal day 21) until adulthood, which we take to start on postnatal day 60 (Reynolds and Flores, 2021). We refer to “early adolescence” as the first two weeks postweaning (postnatal days 21-34). These ranges encompass discrete DA developmental periods (Kalsbeek et al., 1988; Manitt et al., 2011; Reynolds et al., 2018a), vulnerability to drug effects on DA circuitry (Hammerslag and Gulley, 2014; Reynolds et al., 2018a), and distinct behavioral characteristics (Adriani and Laviola, 2004; Makinodan et al., 2012; Schneider, 2013; Wheeler et al., 2013).

      References:

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette MP, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. Doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

      (3) Figure 1 - the conclusions hinge on the Netrin-1 staining, as shown in panel G, but the cells are difficult to see. It would be helpful to provide clearer, more zoomed images so readers can better assess the staining. Since Netrin-1 expression reduces dramatically after P4 and they had to use antigen retrieval to see signal, it would be helpful to show some images from additional brain regions and ages to see if expression levels follow predicted patterns. For instance, based on the allen brain atlas, it seems that around P21, there should be high levels of Netrin-1 in the cerebellum, but low levels in the cortex. These would be nice controls to demonstrate the specificity and sensitivity of the antibody in older tissue.

      We do not study the cerebellum and have never stained this region; doing so now would require generating additional tissue and we’re not sure it would add enough to the information provided to be worthwhile. Note that we have stained the forebrain for Netrin-1 previously, providing broad staining of many brain regions (Manitt et al., 2011)

      References:

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      (4) Figure 3 - Because mice tend to avoid brightly-lit spaces, the light/dark box is more commonly used as a measure of anxiety-like behavior than purely exploratory behavior (including in the paper they cited). It is important to address this possibility in their discussion of their findings. To bolster their conclusions about the coincidence of circuit and behavioral changes in adolescent hamsters, it would be useful to add an additional measure of exploratory behaviors (e.g. hole board).

      Regarding the light/dark box test, this is an excellent point. We prefer the term “risk taking” to “anxiety-like” and now use the former term in our manuscript. Furthermore, our interest in the behaviour is purely to chart the development of adolescent behaviour across our treatment groups, not to study a particular emotional state. Regardless of the specific emotion or emotions governing the light/dark box behaviour, it is an ideal test for charting adolescent shifts in behaviour as it is well-characterized in this respect, as we discuss in our manuscript.

      (5) Supplementary Figure 4,5 The authors defined puberty onset using uterine and testes weights in hamsters. While the weights appear to be different for summer and winter hamsters, there were no statistical comparison. Please add statistical analyses to bolster claims about puberty start times. Also, as many studies use vaginal opening to define puberty onset, it would be helpful to discuss how these measurements typically align and cite relevant literature that described use of uterine weights. Also, Supplementary Figures 4 and 5 were mis-cited as Supp. Fig. 2 in the text (e.g. line 317 and others).

      These are great suggestions. We have added statistical analyses to Supplementary Figures 5 and 6 and provided Vaginal Opening data as Supplementary Figure 7. The statistical analyses confirm that all three characters are delayed in winter hamsters compared to summer hamsters.

      We have also added the following references to the manuscript:

      Darrow JM, Davis FC, Elliott JA, Stetson MH, Turek FW, Menaker M. 1980. Influence of Photoperiod on Reproductive Development in the Golden Hamster. Biol Reprod 22:443–450. doi:10.1095/biolreprod22.3.443

      Ebling FJP. 1994. Photoperiodic Differences during Development in the Dwarf Hamsters Phodopus sungorus and Phodopus campbelli. Gen Comp Endocrinol 95:475–482. doi:10.1006/gcen.1994.1147

      Timonin ME, Place NJ, Wanderi E, Wynne-Edwards KE. 2006. Phodopus campbelli detect reduced photoperiod during development but, unlike Phodopus sungorus, retain functional reproductive physiology. Reproduction 132:661–670. doi:10.1530/rep.1.00019

      (6) The font in many figure panels is small and hard to read (e.g. 1A,D,E,H,I,L...). Please increase the size for legibility.

      We have increased the font size of our figure text throughout the manuscript.

      Reviewer #3

      (15) Fig 1 C,D. Clarify the units of the y axis

      We have now fixed this.

      Full Reference List

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625 Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editors for their careful reading of our manuscript and for the detailed and constructive feedback on our work. Please find attached the revised version of the manuscript. We performed an extensive revision of the manuscript to address the issues raised by the referees. We provide new analyses (regarding the response consistency and the neural complexity), added supplementary figures and edits to figures and texts. Based on the reviewers’ comments, we introduced several major changes to the manuscript.

      Most notably, we

      • added a limitation statement to emphasize the speculative nature of our interpretation of the timing of word processing/associative binding

      • emphasized the limitations of the control condition

      • added analyses on the interaction between memory retrieval after 12h versus 36h

      • clarified our definition of episodic memory

      • added detailed analyses of the “Feeling of having heard” responses and the confidence ratings

      We hope that the revised manuscript addresses the reviewers' comments to their satisfaction. We believe that the revised manuscript has been significantly improved owing to the feedback provided. Below you can find a point-by-point response to each reviewer comment in blue. We are looking forward that the revision will be published in the Journal eLife.

      Reviewer #1 (Public Review):

      The authors show that concurrently presenting foreign words and their translations during sleep leads to the ability to semantically categorize the foreign words above chance. Specifically, this procedure was successful when stimuli were delivered during slow oscillation troughs as opposed to peaks, which has been the focus of many recent investigations into the learning & memory functions of sleep. Finally, further analyses showed that larger and more prototypical slow oscillation troughs led to better categorization performance, which offers hints to others on how to improve or predict the efficacy of this intervention. The strength here is the novel behavioral finding and supporting physiological analyses, whereas the biggest weakness is the interpretation of the peak vs. trough effect.

      R1.1. Major importance:

      I believe the authors could attempt to address this question: What do the authors believe is the largest implication of this studies? How far can this technique be pushed, and how can it practically augment real-world learning?

      We revised the discussion to put more emphasis on possible practical applications of this study (lines 645-656).

      In our opinion, the strength of this paper is its contribution to the basic understanding of information processing during deep sleep, rather than its insights on how to augment realworld learning. Given the currently limited data on learning during sleep, we believe it would be premature to make strong claims about potential practical applications of sleep-learning. In addition, as pointed out in the discussion section, we do not know what adverse effects sleep-learning has on other sleep-related mechanisms such as memory consolidation.

      R1.2. Lines 155-7: How do the authors argue that the words fit well within the half-waves when the sounds lasted 540 ms and didn't necessarily start right at the beginning of each half-wave? This is a major point that should be discussed, as part of the down-state sound continues into the up-state. Looking at Figure 3A, it is clear that stimulus presented in the slow oscillation trough ends at a time that is solidly into the upstate, and would not neurolinguists argue that a lot of sound processing occurs after the end of the sound? It's not a problem for their findings, which is about when is the best time to start such a stimulus, but it's a problem for the interpretation. Additionally, the authors could include some discussion on whether possibly presenting shorter sounds would help to resolve the ambiguities here.

      The word pairs’ presentations lasted on average ~540 ms. Importantly, the word pairs’ onset was timed to occur 100 ms before the maximal amplitude of the targeted peaks/troughs.

      Therefore, most of a word’s sound pattern appeared during the negative going half-wave (about 350ms of 540ms). Importantly, Brodbeck and colleagues (2022) have shown that phonemes are continuously analyzed and interpreted with delays of about 50-200 ms, peaking at 100ms delay. These results suggest that word processing started just following the negative maximum of a trough and finished during the next peak. Our interpretation (e.g. line 520+) suggests that low-level auditory processing reaches the auditory cortex before the positive going half-wave. During the positive going half-wave the higher-level semantic networks appear the extract the presented word's meaning and associate the two simultaneously presented words. We clarified the time course regarding slow-wave phases and sound presentation in the manuscript (lines 158-164). Moreover, we added the limitation that we cannot know for sure when and in which slow-wave phase words were processed (lines 645-656). Future studies might want to look at shorter lasting stimuli to narrow down the timing of the word processing steps in relation to the sleep slow waves.

      R1.3. Medium importance:

      Throughout the paper, another concern relates to the term 'closed-loop'. It appears this term has been largely misused in the literature, and I believe the more appropriate term here is 'real-time' (Bergmann, 2018, Frontiers in Psychology; Antony et al., 2022, Journal of Sleep Research). For instance, if there were some sort of algorithm that assessed whether each individual word was successfully processed by the brain during sleep and then the delivery of words was subsequently changed, that could be more accurately labelled as 'closed-loop'.

      We acknowledge that the meaning of “closed-loop” in its narrowest sense is not fulfilled here. We believe that “slow oscillation phase-targeted, brain-state-dependent stimulation” is the most appropriate term to describe the applied procedure (BSDBS, Bergmann, 2018). We changed the wording in the manuscript to brain-state-dependent stimulation algorithm. Nevertheless, we would like to point out that the algorithm we developed and used (TOPOSO) is very similar to the algorithms often termed closed-loop algorithm in memory and sleep (e.g. Esfahani et al., 2023; Garcia-Molina et al., 2018; Ngo et al., 2013, for a comparison of TOPOSO to these techniques see Wunderlin et al., 2022 and for more information about TOPOSO see Ruch et al., 2022).

      R1.4. Figure 5 and corresponding analyses: Note that the two conditions end up with different sounds with likely different auditory complexities. That is, one word vs. two words simultaneously likely differ on some low-level acoustic characteristics, which could explain the physiological differences. Either the authors should address this via auditory analyses or it should be added as a limitation.

      This is correct, the two conditions differ on auditory complexities. Accordingly, we added this issue as another limitation of the study (line 651-653). We had decided for a single word control condition to ensure that no associative learning (between pseudowords) could take place in the control condition because this was the critical learning process in the experimental condition. We would like to point out that we observed significant differences in brain responses to the presentation of word-pairs (experimental condition) vs single pseudowords (control condition) in the Trough condition, but not the Peak condition. If indeed low-level acoustic characteristics explained the EEG differences occurring between the two conditions then one would expect these differences occurring in both the trough and the peak condition because earlier studies showed that low-level acoustic processing proceeds in both phases of slow waves (Andrillon et al., 2016; Batterink et al., 2016; Daltrozzo et al., 2012).

      R1.5. Line 562-7 (and elsewhere in the paper): "episodic" learning is referenced here and many times throughout the paper. But episodic learning is not what was enhanced here. Please be mindful of this wording, as it can be confusing otherwise.

      The reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticised (e.g., Dew & Cabeza, 2011; Hannula et al., 2023; Henke, 2010; Reder et al., 2009; Shohamy & Turk-Browne, 2013).

      We stand by our claim that sleep-learning was of episodic nature. Here we use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000) and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). We revised the manuscript to clarify that and how our definition differs from traditional definitions. Please see reviewer comment R3.1 for a more extensive answer.

      Reviewer #2 (Public Review):

      In this project, Schmidig, Ruch and Henke examined whether word pairs that were presented during slow-wave sleep would leave a detectable memory trace 12 and 36 hours later. Such an effect was found, as participants showed a bias to categorize pseudowords according to a familiar word that they were paired with during slow-wave sleep. This behavior was not accompanied by any sign of conscious understanding of why the judgment was made, and so demonstrates that long-term memory can be formed even without conscious access to the presented content. Unconscious learning occurred when pairs were presented during troughs but not during peaks of slow-wave oscillations. Differences in brain responses to the two types of presentation schemes, and between word pairs that were later correctly- vs. incorrectly-judged, suggest a potential mechanism for how such deep-sleep learning can occur.

      The results are very interesting, and they are based on solid methods and analyses. Results largely support the authors' conclusions, but I felt that there were a few points in which conclusions were not entirely convincing:

      R2.1. As a control for the critical stimuli in this study, authors used a single pseudoword simultaneously played to both ears. This control condition (CC) differs from the experimental condition (EC) in a few dimensions, among them: amount of information provided, binaural coherence and word familiarity. These differences make it hard to conclude that the higher theta and spindle power observed for EC over CC trials indicate associative binding, as claimed in the paper. Alternative explanations can be made, for instance, that they reflect word recognition, as only EC contains familiar words.

      We agree. In the revised version of the manuscript, we emphasise this as a limitation of our study (line 653-656). Moreover, we understand that the differences between stimuli of the control and the experimental condition must not rely only on the associative binding of two words. We cautioned our interpretation of the findings.

      Interestingly, EC vs CC exhibits differences following trough- but not peak targeting (see R1.4). If indeed all the EC vs CC differences were unrelated to associative binding, we would expect the same EC vs CC differences when peaks were targeted. Hence, the selective EC vs CC differences in the trough condition suggest that the brain is more responsive to sound, information, word familiarity and word semantics during troughs, where we found successful learning, compared to peaks, where no learning occurred. Troughtargeted word pairs (EC) versus foreign words (CC) enhanced the theta power 336 at 500 ms following word onset and this theta enhancement correlated significantly with interindividual retrieval performance indicating that theta probably promoted associative learning during sleep. This correlation was insignificant for spindle power.

      R2.2. The entire set of EC pairs were tested both following 12 hours and following 36 hours. Exposure to the pairs during test #1 can be expected to have an effect over memory one day later, during test #2, and so differences between the tests could be at least partially driven by the additional activation and rehearsal of the material during test #1. Therefore, it is hard to draw conclusions regarding automatic memory reorganization between 12 and 36 hours after unconscious learning. Specifically, a claim is made regarding a third wave of plasticity, but we cannot be certain that the improvement found in the 36 hour test would have happened without test #1.

      We understand that the retrieval test at 12h may have had an impact on performance on the retrieval test at 36h. Practicing retrieval of newly formed memories is known to facilitate future retrieval of the same memories (e.g. Karpicke & Roediger, 2008). Hence, practicing the retrieval of sleep-formed memories during the retrieval test at 12h may have boosted performance at 36h.

      However, recent literature suggests that retrieval practice is only beneficial when corrective feedback is provided (Belardi et al., 2021; Metcalfe, 2017). In our study, we only presented the sleep-played pseudowords at test and participants received no feedback regarding the accuracy of their responses. Thus, a proper conscious re-encoding could not take place. Nevertheless, the retrieval at 12h may have altered performance at 36h in other ways. For example, it could have tagged the reactivated sleep-formed memories for enhanced consolidation during the next night (Rabinovich Orlandi et al., 2020; Wilhelm et al., 2011).

      We included a paragraph on the potential carry-over effects from retrieval at 12h on retrieval at 36h in the discussion section (line 489-496; line 657-659). Furthermore, we removed the arguments about the “third wave of plasticity”.

      R2.3. Authors claim that perceptual and conceptual processing during sleep led to increased neural complexity in troughs. However, neural complexity was not found to differ between EC and CC, nor between remembered and forgotten pairs. It is therefore not clear to me why the increased complexity that was found in troughs should be attributed to perceptual and conceptual word processing, as CC contains meaningless vowels. Moreover, from the evidence presented in this work at least, I am not sure there is room to infer causation - that the increase in HFD is driven by the stimuli - as there is no control analysis looking at HFD during troughs that did not contain stimulation.

      With the analysis of the HFD we would like to provide an additional perspective to the oscillation-based analysis. We checked whether the boundary condition of Peak and Trough targeting changes the overall complexity or information content in the EEG. Our goal was to assess the change in neural complexity (relative to a pre-stimulus baseline) following the successful vs unsuccessful encoding of word pairs during sleep.

      We acknowledge that a causal interpretation about HFD is not warranted, and we revised the manuscript accordingly. It was unexpected that we could not find the same results in the contrast of EC vs CC or correct vs incorrect word pairs. We suggest that our signal-to noise ratio might have been too weak.

      One could argue that the phase targeting alone (without stimulation) induces peak/trough differences in complexity. We cannot completely rule out this concern. But we tried to use the EEG that was not influenced by the ongoing slow-wave: the EEG 2000-500ms before the stimulus onset and 500-2000ms after the stimulus onset. Therefore, we excluded the 1s of the targeted slow-wave, hoping that most of the phase inherent complexity should have faded out (see Figure 2). We could not further extend the time window of analysis due to the minimal stimulus onset interval of 2s. Of course we cannot exclude that the targeted Trough impacted the following HFD. We clarified this in the manuscript (line 384-425).

      Furthermore, we did find a difference of neural complexity between the pre-stimulus baseline and the post-stimulus complexity in the Peak condition but not in the Trough condition (we now added this contrast to the manuscript, line 416-419). Hence, the change in neural complexity is a reaction to the interaction of the specific slow-wave phase with the processing of the word pairs. Even though these results cannot provide unambiguous, causal links, we think they can figure as an important start for other studies to decipher neural complexity during slow wave sleep.

      Reviewer #3 (Public Review):

      The study aims at creating novel episodic memories during slow wave sleep, that can be transferred in the awake state. To do so, participants were simultaneously presented during sleep both foreign words and their arbitrary translations in their language (one word in each ear), or as a control condition only the foreign word alone, binaurally. Stimuli were presented either at the trough or the peak of the slow oscillation using a closed-loop stimulation algorithm. To test for the creation of a flexible association during sleep, participant were then presented at wake with the foreign words alone and had (1) to decide whether they had the feeling of having heard that word before, (2) to attribute this word to one out of three possible conceptual categories (to which translations word actually belong), and (3) to rate their confidence about their decision.

      R3.1. The paper is well written, the protocol ingenious and the methods are robust. However, the results do not really add conceptually to a prior publication of this group showing the possibility to associate in slow wave sleep pairs of words denoting large or small object and non words, and then asking during ensuing wakefulness participant to categorise these non words to a "large" or "small" category. In both cases, the main finding is that this type of association can be formed during slow wave sleep if presented at the trough (versus the peak) of the slow oscillation. Crucially, whether these associations truly represent episodic memory formation during sleep, as claimed by the authors, is highly disputable as there is no control condition allowing to exclude the alternative, simpler hypothesis that mere perceptual associations between two elements (foreign word and translation) have been created and stored during sleep (which is already in itself an interesting finding). In this latter case, it would be only during the awake state when the foreign word is presented that its presentation would implicitly recall the associated translation, which in turn would "ignite" the associative/semantic association process eventually leading to the observed categorisation bias (i.e., foreign words tending to be put in the same conceptual category than their associated translation). In the absence of a dis-confirmation of this alternative and more economical hypothesis, and if we follow Ocam's razor assumption, the claim that there is episodic memory formation during sleep is speculative and unsupported, which is a serious limitation irrespective of the merits of the study. The title and interpretations should be toned down in this respect

      Our study conceptually adds to and extends the findings by Züst et al. (a) by highlighting the precise time-window or brain state during which sleep-learning is possible (e.g. slow-wave trough targeting), (b) by demonstrating the feasibility of associative learning during night sleep, and (c) by uncovering the longevity of sleep-formed memories.

      We acknowledge that the reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticised (e.g, (Dew & Cabeza, 2011; Hannula et al., 2023; Henke, 2010; Reder et al., 2009; Shohamy & Turk-Browne, 2013). We stand by our claim that sleep-learning was of episodic nature. We use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000), and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). The core computational features of episodic memory are 1) rapid learning, 2) association formation, and 3) a compositional and flexible representation of the associations in long-term memory.

      Therefore, we revised the manuscript to emphasize how our definition differs from traditional definitions (line 64).

      For the current study, we designed a retrieval task that calls on the core computational features of episodic memory by assessing flexible retrieval of sleep-formed compositional word-word associations. Reviewer 3 suggests an alternative interpretation for the learning observed here: mere perceptual associations between foreign words and translations words are stored during sleep, and semantic associations are only inferred at retrieval testing during ensuing wakefulness. First, these processing steps would require the rapid soundsound associative encoding, long-term storage, and the flexible sound retrieval, which would still require hippocampal processing and computations in the episodic memory system. Second, this mechanism seems highly laborious and inefficient. The sound pattern of a word at 12 hours after learning triggers the reactivation of an associated sound pattern of another word. This sound pattern then elicits the activation of the translation words’ semantics leading to the selection of the correct superordinate semantic category at test.

      Overall, we believe that our pairwise-associative learning paradigm triggered a rapid conceptual-associative encoding process mediated by the hippocampus that provided for flexible representations of foreign and translation words in episodic memory. This study adds to the existing literature by examining specific boundary conditions of sleep-learning and demonstrates the longevity (at least 36 hours) of sleep-learned associations.

      Other remarks:

      R3.2. Lines 43-45 : the assumption that the sleeping brain decides whether external events can be disregarded, requires awakening or should be stored for further consideration in the waking state is dubious, and the supporting references date from a time (the 60') during which hypnopedia was investigated in badly controlled sleep conditions (leaving open the doubt about the possibility that it occurred during micro awakenings)

      We revised the manuscript to add timelier and better controlled studies that bolster the 60ties-born claim (line 40-51). Recently, it has been shown that the sleeping brain preferentially processes relevant information. For example the information conveyed by unfamiliar voices (Ameen et al., 2022), emotional content (Holeckova et al., 2006; Moyne et al., 2022), our own compared to others’ names (Blume et al., 2018).

      R3.3. 1st paragraph, lines 48-53 , the authors should be more specific about what kind of new associations and at which level they can be stored during sleep according to recent reports, as a wide variety of associations (mostly elementary levels) are shown in the cited references. Limitations in information processing during sleep should also be acknowledged.

      In the lines to which R3 refers, we cite an article (Ruch & Henke, 2020) in which two of the three authors of the current manuscript elaborate in detail what kind of associations can be stored during sleep. We revised these lines to more clearly present the current understanding of the potential and the limitations of sleep-learning (line 40-51). Although information processing during sleep is generally reduced (Andrillon et al., 2016), a variety of different kinds of associations can be stored, ranging from tone-odour to word-word association (Arzi et al., 2012, 2014; Koroma et al., 2022; Züst et al., 2019).

      R3.4. The authors ran their main behavioural analyses on delayed retrieval at 36h rather than 12h with the argument that retrieval performance was numerically larger at 36 than 12h but the difference was non-significant (line 181-183), and that effects were essentially similar. Looking at Figure 2, is the trough effect really significant at 12h ? In any case, the fact that it is (numerically) higher at 36 than 12h might suggest that the association created at the first 12h retrieval (considering the alternative hypothesis proposed above) has been reinforced by subsequent sleep.

      The Trough effect at 12h is not significant, as stated on line 185 (“Planned contrasts against chance level revealed that retrieval performance significantly exceeded chance at 36 hours only (P36hours = 0.036, P12hours = 0.094).”). It seems that our wording was not clear. Therefore, we refined the description of the behavioural analysis in the manuscript (lines 188-193).

      In brief, we report an omnibus ANOVA with a significant main effect of targeting type (Trough vs Peak, main effect Peak versus Trough: F(1,28) = 5.237, p = 0.030, d = 0.865). Because Trough-targeting led to significantly better memory retention than Peak-targeting, we computed a second ANOVA, solely including participants with through-targeted word-pair encoding. The memory retention in the Trough condition is above chance (MTrough = 39.11%, SD = 10.76; FIntercept (1,14) = 5.660, p = 0.032) and does not significantly differ between the 12h and 36h retrieval (FEncoding-Test Delay (1,14) = 1.308, p = 0.272). However, the retrieval performance at 36h numerically exceeds the performance at 12h and the direct comparison against chance reveals that the 36h but not the 12h retrieval was significant (P36hours = 0.036, P12hours = 0.094). Hence, we found no evidence for above chance performance at the 12h retrieval and focused on the retrieval after 36h in the EEG analysis.

      We agree with the reviewer that the subsequent sleep seems to have improved consolidation and subsequent retrieval. We assume that the reviewer suggests that participants merely formed perceptual associations during sleep and encoded episodic-like associations during testing at 12h (as pointed out in R 3.1). However, we believe that it is unlikely that the awake encoding of semantic associations during the 12h retrieval led to improved performance after 36h. We changed the discussion regarding the interaction between retrieval at 12h and 36h (line 505-512, also see R 2.2)

      R3.5> In the discussion section lines 419-427, the argument is somehow circular in claiming episodic memory mechanisms based on functional neuroanatomical elements that are not tested here, and the supporting studies conducted during sleep were in a different setting (e.g. TMR)

      Indeed, the TMR and animal studies are a different setting compared to the present study. We re-wrote this part and only focused on the findings of Züst and colleagues (2019), who examined hippocampal activity during the awake retrieval of sleep-formed memories (lines 472-482). Additionally, we would like to emphasise that our main reasoning is that the task requirements called upon the episodic memory system.

      R3.6. Supplementary Material: in the EEG data the differentiation between correct and incorrect ulterior classifications when presented at the peak of the slow oscillation is only significant in association with 36h delayed retrieval but not at 12h, how do the authors explain this lack of effect at 12 hour ?

      We assume that the reviewer refers to the TROUGH condition (word-pairs targeted at a slow-wave trough) and not as written to the peak condition. We argue that the retention performance at 12h is not significantly above chance (M12hours = 37.4%, P12hours = 0.094).

      Hence, the distinction between “correctly” and “incorrectly” categorised word pairs was not informative for the EEG analysis during sleep. For whatever reason the 12h retrieval was not significantly above chance, the less successful memory recall and thus a less balanced trial count makes recall accuracy a worse delineator for separating EEG trials then the recall performance after 36 hours.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor importance:

      Abstract: The opening framing is confusing here and in the introduction. Why frame the paper in the broadest terms about awakenings and threats from the environment when this is a paper about intersections between learning & memory and sleep? I do understand that there is an interesting point to be made about the counterintuitive behavioral findings with respect to sleep generally being perceived as a time when stimuli are blocked out, but this does not seem to me to be the broadest points or the way to start the paper. The authors should consider this but of course push back if they disagree.

      We understand the reviewer’s criticism but believe that this has more to do with personal preferences than with the scientific value or validity of our work. We believe that it is our duty as researchers to present our study in a broader context because this may help readers from various fields to understand why the work is relevant. To some readers, evidence for learning during sleep may seem trivial, to others, it may seem impossible or a weird but useless conundrum. By pointing out potential evolutionary benefits of the ability to acquire new information during sleep, we help the broad readership of eLife understand the relevance of this work.

      Lines 31-32: "Neural complexity" -> "neural measures of complexity" because it isn't clear what "neural complexity" means at this point in the abstract. Though, note my other point that I believe this analysis should be removed.

      To our understanding, “neural complexity” is a frequently used term in the field and yields more than 4000 entries on google scholar. Whereas ‘neural measures of complexity’ only finds 3 hits on google scholar [September 2023]. In order to link our study with other studies on neural complexity, we would like to keep this terminology. As an example, two recent publications using “neural complexity” are Lee et al. (2020) and Frohlich et al. (2022).

      Lines 42-43: The line of work on 'sentinel' modes would be good to cite here (e.g., Blume et al., 2017, Brain & Language).

      We added the suggested citation to the manuscript (lines 52).

      Lines 84-90: While I appreciate the authors desire to dig deep and try to piece this all together, this is far too speculative in my opinion. Please see my other points on the same topic.

      In this paragraph, we point out why both peaks and troughs are worth exploring for their contributions to sensory processing and learning during sleep. Peaks and troughs are contributing mutually to sleep-learning. Our speculations should inspire further work aimed at pinning down the benefits of peaks and troughs for sleep-learning. We clarified the purpose and speculative nature of our arguments in the revised version of the manuscript.

      Line 109: "outlasting" -> "lasting over" or "lasting >"

      We changed the wording accordingly.

      Line 111: I believe 'nonsense' is not the correct term here, and 'foreign' (again) would be preferred. Some may be offended to hear their foreign word regarded as 'nonsense'. However, please let me know if I have misunderstood.

      We would like to use the linguistic term “pseudoword” (aligned with reviewer 2’s comment) and we revised the manuscript accordingly.

      Figure 1A: "Enconding" -> "Encoding"

      Thank you for pointing this out.

      Lines 201-2: Were there interactions between confidence and correctness on the semantic categorization task? Were correct responses given with more confidence than incorrect ones? This would not necessarily be a problem for the authors' account, as there can of course be implicit influences on confidence (i.e., fluency).

      As is stated in the results section, confidence ratings did not differ significantly between correct and incorrect assignments (Trough condition: F(1,14) = 2.36, p = 0.15); Peak condition: F(1,14) = 0.48, p = 0.50).

      Line 236: "Nicknazar" -> "Niknazar"

      Thank you for pointing this out.

      Line 266: "profited" -> "benefited"

      We changed the wording accordingly.

      Lines 280-4: There seems some relevance here with Malerba et al. (2018) and her other papers to categorize slow oscillations.

      Diving into the details on how to best categorise slow oscillations is beyond the scope of this manuscript. Here, we build on work from the field of microstate analyses and use two measures to describe and quantify the targeted brain states: the topography of the electric field (i.e., the correlation of the electric field with an established template or “microstate”), and the field strength (global field power, GFP). While the topography of a quasi-stable electric field reflects activity in a specific neural network, the strength (GFP) of a field most likely mirrors the degree of activation (or inactivity) in the specific network. Here, we find that consistent targeting of a specific network state yielding a strong frontal negativity benefitted learning during sleep. For a more detailed explanation of the slow-wave phase targeting see (Ruch et al., 2022).

      Lines 343-6: Was it intentional to have 0.5 s (0.2-0.7 s) surrounding the analysis around 500 ms but only 0.4 s (0.8-1.2 s) surrounding the analysis around 1 s? Could the authors use the same size interval or justify having them be different?

      We apologise for the misleading phrasing and we clarified this in the revised manuscript. We applied the same procedure for the comparison of later correctly vs incorrectly classified pseudowords as we did for the comparison between EC and CC. Hence, we analysed the entire window from 0s to 2.5s with a cluster-based permutation approach. Contrary to the EC vs CC contrast, no cluster remained significant for the comparison of the subsequent memory effect. By mistake we reported the wrong time window. In the revised manuscript, the paragraph is corrected (lines 364-369).

      Line 356-entire HFD section: it is unclear what's gained by this analysis, as it could simply be another reflection of the state of the brain at the time of word presentation. In my opinion, the authors should remove this analysis and section, as it does not add clarity to other aspects of the paper.

      (If the authors keep the section) Line 361-2 - "Moreover, high HFD values have been associated with cognitive processing (Lau et al., 2021; Parbat & Chakraborty, 2021)." This statement is vague. Could the authors elaborate?

      Please see our answer to Reviewer 2 (2.3) for a more detailed explanation. In brief, we would like to keep the analysis with the broad time window of -2 to -0.5 and from 0.5 to 2 s.

      Lines 403-4: How was it determined that these neural networks mediated both conscious/unconscious processes? Perhaps the authors meant to make a different point, but the way it reads to me is that there is evidence that some neural networks are conscious and others are not and both forms engage in similar functions.

      We revised the manuscript to be more precise and clear: “The conscious and unconscious rapid encoding and flexible retrieval of novel relational memories was found to recruit the same or similar networks including the hippocampus(Henke et al., 2003; Schneider et al., 2021). This suggests that conscious and unconscious relational memories are processed by the same memory system.” (p. 22, top).

      Lines 433-41: Performance didn't actually significantly increase from 12 to 36 hours, so this is all too speculative in my opinion.

      We removed the speculative claim that performance may have increased from the retrieval at 12 hours to the retrieval at 36 hours.

      Line 534: "assisted by enhanced" -> "coincident with". It's unclear whether theta reflects successful processing as having occurred or whether it directly affects or assists with it.

      We have adjusted the wording to be more cautious, as suggested (line 588).

      Line 572-4: Rothschild et al. (2016) is relevant here.

      Unfortunately, we do not see the relevance of this article within the context of our work.

      Line 577 paragraph: The authors may consider adding a note on the importance of ethical considerations surrounding this form of 'inception'.

      We extended this part by adding ethical considerations to the discussion section (Stickgold et al., 2021, line 657).

      Line 1366: It would be better if the authors could eventually make their data publicly available. This is obviously not required, but I encourage the authors to consider it if they have not considered it already.

      In my opinion, the discussion is too long. I really appreciate the authors trying to figure out the set of precise times in which each level of neural processing might occur and how this intersects with their slow oscillation phase results. However, I found a lot of this too speculative, especially given that the sounds may bleed into parts of other phases of the slow oscillation. I do not believe this is a problem unique to these authors, as many investigators attempting to target certain phases in the target memory reactivation literature have faced the same problem, but I do believe the authors get ahead of the data here. In particular, there seems to be one paragraph in the discussion that is multiple pages long (p. 22-24). This paragraph I believe has too much detail and should be broken up regardless, as it is difficult for the reader to follow.

      Considering the recent literature, we believe this interpretation best explains the data. As argued earlier, we believe that a speculative interpretation of the reported phenomena can provide substantial added value because it inspires future experimental work. We have improved the manuscript by clearly distinguishing between data and interpretation. We do declare the speculative nature of some offered interpretations. We hope that these speculations, which are testable hypotheses (!), will eventually be confirmed or refuted experimentally.

      Reviewer #2 (Recommendations For The Authors):

      I very much enjoyed the paper and think it describes important findings. I have a few suggestions for improvement, and minor comments that caught my eye during reading:

      (1) I was missing an analysis of CC ERP, and its comparison to EC ERP.

      We added this analysis to the manuscript (line 299-301). The comparison of CC ERP with EC ERP did not yield any significant cluster for either the peak (cluster-level Monte Carlo p=0.54) or the trough (cluster-level Monte Carlo p>0.37). We assume that the noise level was too high for the identification of differences between CC and EC ERP.

      (2) Regarding my public review comment #2, some light can be shed on between-test effects, I believe, using an item-based analysis - looking at correlations between items' classifications in test #1 and test #2. The assumption seems to be that items that were correct in test #1 remained correct in test #2 while other new correct classifications were added, owing to the additional consolidation happening between the two tests. But that is an empirical question that can be easily tested. If no consistency in item classification is found, on the other hand, or if only consistency in correct classification is found, that would be interesting in itself. This item-based analysis can help tease away real memory from random correct classification. For instance, the subset of items that are consistently classified correctly could be regarded as non-fluke at higher confidence and used as the focus of subsequent-memory analysis instead of the ones that were correct only in test #2.

      Thanks, we re-analysed the data accordingly. Participants were consistent at choosing a specific object category for an item at 12 hours and 36 hours (consistency rate = 47% same category, chance level is 1/3). Moreover, the consistency rate did not differ between the Trough and the Peak condition (MTrough = 47.2%, MPeak = 47.0%, P = 0.98). The better retrieval performance in the Trough compared to the Peak condition after 36 hours is due to: A) if participants were correct at 12h, they chose again the correct answer at 36h (Trough: 20% & Peak: 14%). B) Following an incorrect answer at 12h, participants switched to another object category at 36h (Trough: 72%, Peak: 67%). C) If participants switched the object category following an incorrect answer at 12h, they switched more often to the correct category at 36h in the trough versus the peak condition (Trough: in 56% & Peak: 53%). Hence, the data support the reviewer’s assumption: items that were correct after 12 hours remained correct after 36 hours, while other new correct classifications were generated at 36h owing to the additional consolidation happening between the two tests. We added this finding to the manuscript (line 191-200, Figure S6):

      Author response image 1.

      As suggested, we re-analysed the ERP with respect to the subsequent memory effect. This time we computed four conditions according to the reviewer’s argument about consistently correctly classified pseudowords, presented in the figure below: ERP of trials that were correctly classified at 36h (blue), ERP of trials that were incorrectly classified at 36h (light blue), ERP of trials that were correctly classified twice (brown) and ERP of trials that were not correctly classified twice (orange, all trials that are not in brown). Please note that the two blue lines are reported in the manuscript and include all trials. The brown and the orange line take the consistency into account and together include as well all trials.

      Author response image 2.

      By excluding even more trials from the group of correct retrieval responses, the noise level gets high. Therefore, the difference between the twice-correct and the not-twice-correct trials is not significant (cluster-level Monte Carlo p > 0.27). Because the ERP of twice-correct trials seems very similar to the ERP of the trials correctly classified at 36h at frontal electrodes, we assume that our ERP effect is not driven by a few extreme subjects. Similarly, not-twicecorrect trials (orange) have a stronger frontal trough than the trials incorrectly classified at 36h (light blue).

      (3) In a similar vein, a subject-based analysis would be highly interesting. First and foremost, readers would benefit from seeing the lines that connect individual dots across the two tests in figures 2B and 2C. It is reasonable to expect that only a subset of participants were successful learners in this experiment. Finding them and analyzing their results separately could be revealing.

      We added a Figure S1 to the supplementary material, providing the pairing between performance of the 12h and the 36h retrieval.

      It is an interesting idea to look at successful learners alone. We computed the ERP of the subsequent memory effect for those participants, who had an above change retrieval accuracy at 36h. The result shows a similar effect as reported for all participants (frontal cluster ~0-0.3s). The p-value is only 0.08 because only 9 of 15 participants exhibited an above chance retrieval performance at 36 hours.

      Author response image 3.

      ERP effect of correct (blue) vs incorrect (light blue) pseudoword category assignment of participants with a retrieval performance above chance at 36h (SD as shades):

      We prefer to not include this data in the manuscript, but are happy to provide it here.

      (4) I wondered why the authors informed subjects of the task in advance (that they will be presented associations when they slept)? I imagine this may boost learning as compared to completely naïve subjects. Whether this is the reason or not, I think an explanation of why this was done is warranted, and a statement whether authors believe the manipulation would work otherwise. Also, the reader is left wondering why subjects were informed only about test #1 and not about test #2 (and when were they told about test #2).

      Subjects were informed of all the tests upfront. We apologize for the inconsistency in the manuscript and revised the method part. The explanation of why participants were informed is twofold: a) Participants had to sleep with in-ear headphones. We wanted to explain to participants why these are necessary and why they should not remove them. b) We hoped that participants would be expecting unconsciously sounds played during sleep, would process these sounds efficiently and would remain deeply asleep (no arousals).

      (5) FoHH is a binary yes/no question, and so may not have been sensitive enough to demonstrate small differences in familiarity. For comparison, the Perceptual Awareness Scale (Ramsøy & Overgaard, 2004) that is typically used in studies of unconscious processing is of a 4-point scale, and this allows to capture more nuanced effects such as partial consciousness and larger response biases. Regardless, it would be informative to have the FoHH numbers obtained in this study, and not just their comparison between conditions. Also, was familiarity of EC and CC pseudowords compared? One may wonder whether hearing the pseudowords clearly vs. in one ear alongside a familiar word would make the word slightly more familiar.

      We apologize for having simplified this part too much in the manuscript. Indeed, the FoHH is comparable to the PAS. We used a 4-point scale, where participants rated their feeling of whether they have heard the pseudoword during previous sleep. In the revised manuscript, we report the complete results (line 203-223). The FoHH did not differ between any of the suggested contrasts. Thus, for both the peak and the trough condition, the FoHH did not differ between sleep-played vs new; correct EC trials vs new; correct vs incorrect EC trials; EC vs CC trials. To illustrate the results, a figure of the FoHH has been added to the supplement (Figure S4).

      (6) Similarly, it would be good to report the numbers of the confidence ratings in the paper as well.

      In the revised manuscript, we extended the description of the confidence rating results. We added the descriptive statistics (line 224-236) and included a corresponding figure in the supplement (Figure S5).

      Minor/aesthetic comments:

      We implemented all the following suggestions.

      (1) I suggest using "pseudoword" or "nonsense word" instead of "foreign word", because "foreign word" typically means a real word from a different language. It is quite confusing when starting to read the paper.

      After reconsidering, we think that pseudoword is the appropriate linguistic term and have revised the manuscript accordingly.

      (2) Lines 1000-1001: "The required sample size of N = 30 was determined based on a previous sleep-learning study". I was missing a description of what study you are referring to.

      (3) I am not sure I understood the claim nor the rationale made in lines 414-417. Is the claim that pairs did not form one integrated engram? How do we know that? And why would having one engram not enable extracting the meaning from a visual-auditory presentation of the cue? The sentence needs some rewording and/or unpacking.

      (4) Were categories counterbalanced (i.e., did each subjects' EC contain 9 animal words, 9 tool words and 9 place words)?

      (5) Asterisks indicating significant effects are missing from Figure 4 and S2.

      (6) Fig1 legend: "Participants were played with pairs" is ungrammatical.

      (7) Line 1093: no need for a comma.

      (8) Line 1336: missing opening parenthesis

      (9) Line 430: "observe" instead of "observed".

      (10) Line 466: two dots instead of one..

      Reviewer #3 (Recommendations For The Authors):

      Methods: 2 separate ANOVAs are performed (lines 160-185), but would not it make more sense to combine both in one ? If kept separated then a correction for multiple comparisons might be needed (p/2 = 0.025)

      We computed an omnibus ANOVA. In a next step, we examined the effect in the significant targeting condition by computing another ANOVA. For further explanations, see reviewer comment 3.4.

      References

      Ameen, M. S., Heib, D. P. J., Blume, C., & Schabus, M. (2022). The Brain Selectively Tunes to Unfamiliar Voices during Sleep. Journal of Neuroscience, 42(9), 1791–1803. https://doi.org/10.1523/JNEUROSCI.2524-20.2021

      Andrillon, T., Poulsen, A. T., Hansen, L. K., Léger, D., & Kouider, S. (2016). Neural Markers of Responsiveness to the Environment in Human Sleep. The Journal of Neuroscience, 36(24), Article 24. https://doi.org/10.1523/JNEUROSCI.0902-16.2016

      Arzi, A., Holtzman, Y., Samnon, P., Eshel, N., Harel, E., & Sobel, N. (2014). Olfactory Aversive Conditioning during Sleep Reduces Cigarette-Smoking Behavior. Journal of Neuroscience, 34(46), Article 46. https://doi.org/10.1523/JNEUROSCI.2291-14.2014

      Arzi, A., Shedlesky, L., Ben-Shaul, M., Nasser, K., Oksenberg, A., Hairston, I. S., & Sobel, N. (2012). Humans can learn new information during sleep. Nature Neuroscience, 15(10), Article 10. https://doi.org/10.1038/nn.3193

      Batterink, L. J., Creery, J. D., & Paller, K. A. (2016). Phase of Spontaneous Slow Oscillations during Sleep Influences Memory-Related Processing of Auditory Cues. Journal of Neuroscience, 36(4), 1401–1409. https://doi.org/10.1523/JNEUROSCI.3175-15.2016

      Belardi, A., Pedrett, S., Rothen, N., & Reber, T. P. (2021). Spacing, Feedback, and Testing Boost Vocabulary Learning in a Web Application. Frontiers in Psychology, 12. https://www.frontiersin.org/articles/10.3389/fpsyg.2021.757262

      Bergmann, T. O. (2018). Brain State-Dependent Brain Stimulation. Frontiers in Psychology, 9, 2108. https://doi.org/10.3389/fpsyg.2018.02108

      Blume, C., del Giudice, R., Wislowska, M., Heib, D. P. J., & Schabus, M. (2018). Standing sentinel during human sleep: Continued evaluation of environmental stimuli in the absence of consciousness. NeuroImage, 178, 638–648. https://doi.org/10.1016/j.neuroimage.2018.05.056

      Brodbeck, C., & Simon, J. Z. (2022). Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Frontiers in Neuroscience, 16. https://www.frontiersin.org/articles/10.3389/fnins.2022.828546

      Cohen, N. J., & Eichenbaum, H. (1993). Memory, Amnesia, and the Hippocampal System. A Bradford Book.

      Daltrozzo, J., Claude, L., Tillmann, B., Bastuji, H., & Perrin, F. (2012). Working memory is partially preserved during sleep. PloS One, 7(12), Article 12.

      Dew, I. T. Z., & Cabeza, R. (2011). The porous boundaries between explicit and implicit memory: Behavioral and neural evidence. Annals of the New York Academy of Sciences, 1224(1), 174–190. https://doi.org/10.1111/j.1749-6632.2010.05946.x

      Esfahani, M. J., Farboud, S., Ngo, H.-V. V., Schneider, J., Weber, F. D., Talamini, L. M., & Dresler, M. (2023). Closed-loop auditory stimulation of sleep slow oscillations: Basic principles and best practices. Neuroscience & Biobehavioral Reviews, 153, 105379. https://doi.org/10.1016/j.neubiorev.2023.105379

      Frohlich, J., Chiang, J. N., Mediano, P. A. M., Nespeca, M., Saravanapandian, V., Toker, D., Dell’Italia, J., Hipp, J. F., Jeste, S. S., Chu, C. J., Bird, L. M., & Monti, M. M. (2022). Neural complexity is a common denominator of human consciousness across diverse regimes of cortical dynamics. Communications Biology, 5(1), Article 1. https://doi.org/10.1038/s42003-022-04331-7

      Gabrieli, J. D. E. (1998). Cognitive neuroscience of human memory. Annual Review of Psychology, 87–115.

      Garcia-Molina, G., Tsoneva, T., Jasko, J., Steele, B., Aquino, A., Baher, K., Pastoor, S., Pfundtner, S., Ostrowski, L., Miller, B., Papas, N., Riedner, B., Tononi, G., & White, D. P. (2018). Closed-loop system to enhance slow-wave activity. Journal of Neural Engineering, 15(6), 066018. https://doi.org/10.1088/1741-2552/aae18f

      Hannula, D. E., Minor, G. N., & Slabbekoorn, D. (2023). Conscious awareness and memory systems in the brain. WIREs Cognitive Science, 14(5), e1648. https://doi.org/10.1002/wcs.1648

      Henke, K. (2010). A model for memory systems based on processing modes rather than consciousness. Nature Reviews Neuroscience, 11(7), Article 7. https://doi.org/10.1038/nrn2850

      Henke, K., Mondadori, C. R. A., Treyer, V., Nitsch, R. M., Buck, A., & Hock, C. (2003). Nonconscious formation and reactivation of semantic associations by way of the medial temporal lobe. Neuropsychologia, 41(8), Article 8. https://doi.org/10.1016/S0028-3932(03)00035-6

      Holeckova, I., Fischer, C., Giard, M.-H., Delpuech, C., & Morlet, D. (2006). Brain responses to a subject’s own name uttered by a familiar voice. Brain Research, 1082(1), 142–152. https://doi.org/10.1016/j.brainres.2006.01.089

      Karpicke, J. D., & Roediger, H. L. (2008). The Critical Importance of Retrieval for Learning. Science, 319(5865), 966–968. https://doi.org/10.1126/science.1152408

      Koroma, M., Elbaz, M., Léger, D., & Kouider, S. (2022). Learning New Vocabulary Implicitly During Sleep Transfers With Cross-Modal Generalization Into Wakefulness. Frontiers in Neuroscience, 16, 801666. https://doi.org/10.3389/fnins.2022.801666

      Lee, Y., Lee, J., Hwang, S. J., Yang, E., & Choi, S. (2020). Neural Complexity Measures. Advances in Neural Information Processing Systems, 33, 9713–9724. https://proceedings.neurips.cc/paper/2020/hash/6e17a5fd135fcaf4b49f2860c2474c7 c-Abstract.html

      Metcalfe, J. (2017). Learning from Errors. Annual Review of Psychology, 68(1), 465–489. https://doi.org/10.1146/annurev-psych-010416-044022

      Moscovitch, M. (2008). The hippocampus as a “stupid,” domain-specific module: Implications for theories of recent and remote memory, and of imagination. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 62, 62–79. https://doi.org/10.1037/1196-1961.62.1.62

      Moyne, M., Legendre, G., Arnal, L., Kumar, S., Sterpenich, V., Seeck, M., Grandjean, D., Schwartz, S., Vuilleumier, P., & Domínguez-Borràs, J. (2022). Brain reactivity to emotion persists in NREM sleep and is associated with individual dream recall. Cerebral Cortex Communications, 3(1), tgac003. https://doi.org/10.1093/texcom/tgac003

      Ngo, H.-V. V., Martinetz, T., Born, J., & Mölle, M. (2013). Auditory Closed-Loop Stimulation of the Sleep Slow Oscillation Enhances Memory. Neuron, 78(3), Article 3. https://doi.org/10.1016/j.neuron.2013.03.006

      O’Reilly, R. C., Bhattacharyya, R., Howard, M. D., & Ketz, N. (2014). Complementary Learning Systems. Cognitive Science, 38(6), 1229–1248. https://doi.org/10.1111/j.1551-6709.2011.01214.x

      O’Reilly, R. C., & Rudy, J. W. (2000). Computational principles of learning in the neocortex and hippocampus. Hippocampus, 10(4), 389–397. https://doi.org/10.1002/1098-1063(2000)10:4<389::AID-HIPO5>3.0.CO;2-P

      Rabinovich Orlandi, I., Fullio, C. L., Schroeder, M. N., Giurfa, M., Ballarini, F., & Moncada, D. (2020). Behavioral tagging underlies memory reconsolidation. Proceedings of the National Academy of Sciences, 117(30), 18029–18036. https://doi.org/10.1073/pnas.2009517117

      Reder, L. M., Park, H., & Kieffaber, P. D. (2009). Memory systems do not divide on consciousness: Reinterpreting memory in terms of activation and binding. Psychological Bulletin, 135(1), Article 1. https://doi.org/10.1037/a0013974

      Ruch, S., & Henke, K. (2020). Learning During Sleep: A Dream Comes True? Trends in Cognitive Sciences, 24(3), 170–172. https://doi.org/10.1016/j.tics.2019.12.007

      Ruch, S., Schmidig, F. J., Knüsel, L., & Henke, K. (2022). Closed-loop modulation of local slow oscillations in human NREM sleep. NeuroImage, 264, 119682. https://doi.org/10.1016/j.neuroimage.2022.119682

      Schacter, D. L. (1998). Memory and Awareness. Science, 280(5360), 59–60. https://doi.org/10.1126/science.280.5360.59

      Schneider, E., Züst, M. A., Wuethrich, S., Schmidig, F., Klöppel, S., Wiest, R., Ruch, S., & Henke, K. (2021). Larger capacity for unconscious versus conscious episodic memory. Current Biology, 31(16), 3551-3563.e9. https://doi.org/10.1016/j.cub.2021.06.012

      Shohamy, D., & Turk-Browne, N. B. (2013). Mechanisms for widespread hippocampal involvement in cognition. Journal of Experimental Psychology: General, 142(4), 1159–1170. https://doi.org/10.1037/a0034461

      Squire, L. R., & Dede, A. J. O. (2015). Conscious and Unconscious Memory Systems. Cold Spring Harbor Perspectives in Biology, 7(3), a021667. https://doi.org/10.1101/cshperspect.a021667

      Stickgold, R., Zadra, A., & Haar, A. J. H. (2021). Advertising in Dreams is Coming: Now What? Dream Engineering. https://dxe.pubpub.org/pub/dreamadvertising/release/1

      Tulving, E. (2002). Episodic Memory: From Mind to Brain. Annual Review of Psychology, 53(1), 1–25. https://doi.org/10.1146/annurev.psych.53.100901.135114

      Wilhelm, I., Diekelmann, S., Molzow, I., Ayoub, A., Mölle, M., & Born, J. (2011). Sleep Selectively Enhances Memory Expected to Be of Future Relevance. Journal of Neuroscience, 31(5), 1563–1569. https://doi.org/10.1523/JNEUROSCI.3575-10.2011

      Wunderlin, M., Koenig, T., Zeller, C., Nissen, C., & Züst, M. A. (2022). Automatized online prediction of slow-wave peaks during non-rapid eye movement sleep in young and old individuals: Why we should not always rely on amplitude thresholds. Journal of Sleep Research, 31(6), e13584. https://doi.org/10.1111/jsr.13584

      Züst, M. A., Ruch, S., Wiest, R., & Henke, K. (2019). Implicit Vocabulary Learning during Sleep Is Bound to Slow-Wave Peaks. Current Biology, 29(4), 541-553.e7. https://doi.org/10.1016/j.cub.2018.12.038

    1. The district and state and federal governments haveestablished our standards and handed our curriculumdown to us. These standards make up the goals estab-lished for all of our students. How we reach these goalsmay require different paths. The core of differentiatedinstruction is flexibility in content, process, and productbased on student strengths, needs, and learning styles.

      Reaching these goals does require an enormous amount of flexibility by educators. It is easy to make a to-do list, it is much more difficult to complete that to-do list. I think it would be helpful to have direct student incentives so that they can see the immediate reward for their efforts instead of the abstract idea that being more educated may make their lives more successful as adults.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review

      [...] A particular strength of the present study is the structural characterization of human PURA, which is a challenging target for structural biology approaches. The molecular dynamics simulations are state-of-the-art, allowing a statistically meaningful assessment of the differences between wild-type and mutant proteins. The functional consequences of PURA mutations at the cellular level are fascinating, particularly the differential compartmentalization of wild-type and mutant PURA variants into certain subcellular condensates.

      Weaknesses that warrant rectification relate to (i) The interpretation of statistically non-significant effects seen in the molecular dynamic simulations.

      We removed from the manuscript the sentence which indicated that we analyzed statistically non-significant effects. Therefore, the above statement has been resolved.

      (ii) The statistical analysis of the differential compartmentalization of PURA variants into processing bodies vs. stress granules, and

      We re-analyzed all cell-biological data and adjusted the statistical analysis of P-bodies and Stress-granule intensity analysis. The new, and improved statistics have replaced the original analyses in the corresponding figures (Figs. 1C and 2B).

      (iii) Insufficient documentation of protein expression levels and knock-down efficiencies.

      Quantification of protein expression levels by Western blotting is shown in Appendix Figure S1. Quantification of knock-down efficiencies by Western blot experiments (Appendix Figure S3).

      Recommendations for the authors: Reviewer #1

      Concerns and Suggested Changes

      (a) I have only one concern about the computational part and that is about statements such as "There are also large differences in the residue surrounding the mutation spot (residues 90 to 100), where the K97E mutant also shows much greater fluctuation. However, these differences are not significant due to the large standard deviations." If the differences are not statistically significant, then I would suggest either removing such a statement or increasing the statistics.

      We agree with the Reviewer’s comment. We removed this sentence from the text.

      Recommendations for the authors: Reviewer #2

      General Comments

      This is a challenging structural target and the authors have made considerable efforts to determine the effect of several mutations on the structure and function. Many of the constructs, however, could not be expressed and/or purified in bacteria. However, it is not clear to what extent other expression systems (e.g. Drosophila or human) were considered and if this would have been beneficial.

      We did not use other expression systems because the wild-type protein is well-behaved when expressed in E. coli. In case a mutant variant cannot be expressed or does not behave well in E. coli, this constitutes a clear indication that the respective mutation impairs the protein’s integrity. Thus, by using E. coli as a reference system for all the variants of PURA protein, we could assess the influence of the mutations on the structural integrity and solubility. Only for the variants that did not show impairment in E. coli expression, we continued to assess in more detail why they are nevertheless functionally impaired and cause PURA Syndrome.

      Concerns and Suggested Changes

      (a) The schematic in Figure 3A would have been helpful for interpreting the mutations discussed in Figures 1 and 2. I would suggest moving it earlier in the text.

      We changed the figure according to the Reviewer’s suggestion.

      (b) I believe the RNA used for binding studies in Figures 3C and D was (CGG)8. Are the two "free" RNA bands a monomer and a dimer (duplex?)?

      Although we do not know for certain, it is indeed likely that the two free RNA bands represent either different secondary structures of the free RNA or a duplex of two molecules. Of note, PURA binds to both “free” RNA bands, indicating that it either does not discriminate between them or melts double-stranded RNA in these EMSAs.

      There also seems to be considerable cooperativity in the binding, so I wonder if a shorter RNA oligonucleotide might facilitate the measurement of Kds.

      The length of the used RNA was selected based on the estimated elongated size of the full-length PURA and the presence of 3 PUR repeats. Assuming that one PUR repeat interacts with about 6-7 bases (data from the co-structure of Drosophila PURA with DNA; PDB-ID: 5FGP) and that full-length PURA forms a dimer consisting of three PUR repeats, the full-length protein in its extended form should cover a nucleic-acid stretch of about 24 bases.

      Also, it is not clear how the affinities were measured particularly for hsPURA III since free band is never fully bound at the highest protein concentration.

      It was not our goal to measure Kds for the interaction of PURA variants with RNA. The EMSA experiments were conducted to detect relative differences in the interaction between PURA variants and RNA. To estimate the differences, we measured total intensity of the bound (shifted) and unbound RNA. The intensities of the bands observed on the scanned EMSA gels were quantified with FUJI ImageJ software. We calculated the percentage of the shifted RNA and normalized it. hsPURA III fragment shows much lower affinity therefore it does not fully shift RNA with the highest protein concentration when compared to the full-length PURA and to PURA I-II.

      (c) Do the human PURA I+II and dmPURA I+ II crystallize in the same space group and have similar packing? Can the observed structural flexibility be due to crystal contacts?

      hsPURA I+II and dmPURA I+II crystallize in different space groups with different crystal packing. In both cases, the asymmetric unit contains 4 independent molecules with the flexible part of the structure composed of the β4 and β8 (β ridge) exposed to solvent. In the case of the Drosophila structure, we do not observe any flexibility of both β-strands. In contrast, for the human PURA structure the β ridge exhibits lots of flexibility and it adopts different conformations in all 4 molecules of the asymmetric unit. We observe similar flexibility of the β4 and β8 (β ridge) in the structure of K97E mutant which contains 2 molecules in the asymmetric unit. We would like to add that we expect crystal contacts to rather stabilize than destabilize domains.

      Similarly, can the conformations observed for the K97E mutant be partially explained by packing?

      Regarding the sequence shift observed for the β5 and β6 strands in hsPURA I+II K97E variant: although the β5 strand with shifted amino acid sequence is involved in the contact with the symmetry-related molecule with another β5 strand we don’t consider this interaction as a source of the shift. To be sure that the shift is not forced by the crystallization, we had performed NMR measurement which confirmed that in solution there is a strong change in the β-stands comparing WT and K97E mutant. This is an unambiguous indication that the structural changes observed in the crystal structure are also happening in solution. In addition, the MD simulations provide additional confirmation of our interpretation that K97E destabilizes the corresponding PUR domain. Taken together, we provide proof from three different angles that the observed differences indeed affect the integrity and hence function of the protein.

      (d) Perhaps, it is my misunderstanding, but I find the NMR data on the Arg sidechains for the K97E confusing. If they are visible for K97E and not WT, doesn't this indicate that there is an exchange between two conformations or more dynamics in the WT structure? This does not seem to be the opposite of the expectation if K97E is thought to have more conformational flexibility.

      Due to a technical issue (peak contour level), arginine side chain resonances were not clearly visible in the WT spectrum. The figure 5F has been updated. Now, they do correspond to those seen in the mutant spectrum. However, to prevent any confusion or mis/overinterpretation, we removed the sentence regarding arginine side chain: "Intriguingly, arginine side chain resonances Nε-Hε were only visible in the K97E variant, while they were broadened out in the wild-type spectrum."

      (e) The most speculative part of the paper is the interpretation of SG and PB localization of PURA in Fig 1 and 2. There is an important issue with the statistics that must be clarified because it would appear that statistical significance was determined using each SG or PB as an independent measurement. This is incorrect and significance should be measured by only using the means of three biological replicates. This is well described here. It is not clear at this time if the reported P values will be confirmed upon reanalysis, and this may require reinterpretation of the data.

      We are grateful for this clarifying comment and agree that the statistical analysis of P-body and stress granule was misleading. Of note, while the figures depicted all the values independent of the biological repeats, the statistical analyses were done on the mean value of each replicate of each cell line and not all raw data points.

      We prepared new Plots, only showing the mean value of each replicate, and also re-calculated P-values. The values have changed only slightly in this new analysis because we now also included the previously labeled outliers (red points) to better demonstrate that significance still exists even when considering them.

      In the new analysis of stress-granule association, only the value of the K97E mutant lost its significance, indicating that its association to stress granules is not lost. Therefore, we adjusted the following sentences in the manuscript.

      Results:

      Original: "While quantification showed a reduced association of hsPURA K97E mutant with G3BP1-positive granules (Fig 1B), the two other mutants, I206F and F233del, showed the same co-localization to stress granules as the wild type control."

      Corrected: "In all the patient-related mutations, no significant reduction in stress granule association was seen when compared to the wild type control (Fig 1C)."

      Original: "The observation that only one of the patient-related mutations of hsPURA, K97E, showed reduced stress granule association indicates that this feature may not constitute a major hallmark of the PURA syndrome. It should be noted however that this interpretation must be considered with some caution as the experiments were performed in a PURA wild-type background."

      Corrected: "As we did not observe significant changes in the association of patient-related mutations of hsPURA to stress granules, it is suggested that that this feature may not constitute a major hallmark of the PURA syndrome. It should be noted however that this interpretation must be considered with some caution as the experiments were performed in a PURA wild-type background."

      (f) A western blot showing the level of overexpression of the PURA proteins should be shown in Figure 1 as well as the KD of endogenous PURA for Figure S2?

      As requested, a Western blot showing the level of overexpression of the different PURA proteins has been added as Appendix Figure S1.

      A Western blot of the siRNA-mediated knock-down experiments of PURA and their corresponding control has been added to Appendix Figure S3. Quantification of three biological repeats showed a significant reduction of PURA protein levels upon knock down.

      (g) While I appreciate that rewriting is time-consuming, I would recommend considering restructuring the manuscript because I think that it would aid the overall clarity. I think the foundation of the work is the structural characterization and would suggest beginning the paper with this data and the biochemical characterization. The co-localization with SGs and PBs and how this may be relevant to disease is much more speculative and is therefore better to present later. While I appreciate that the structural interpretation of why some mutants localize to PBs differently is not entirely clear, I do think that this would provide some context for the discussion.

      In the initial version of the manuscript we first presented the structural characterization of PURA and afterwards the co-localization with SGs and PBs. As this reviewer stated him-/herself in (e), we also noticed that the SG and PB interpretation is the most speculative part of this manuscript. We felt that having this at the end of the results section would weaken the manuscript. On the other hand, we consider that the structural interpretation of mutations is much stronger and has a greater impact for future research. After long discussion we decided to swap the order to leave the most important results for the end of the manuscript.

      Recommendations for the authors: Reviewer #3

      Concerns and Suggested Changes:

      (a) For the characterization of G3BP1-positive stress granules in HeLa cells upon depletion of PURA, it remains unclear what is the efficiency of siRNA? The authors should provide a western blot to indicate how much the endogenous levels were reduced.

      We completely agree with the stated concern and addressed it accordingly. We had performed this experiment prior to submission but for some unknown reason it was not included in the manuscript.

      The Western blot of siRNA-mediated knock-down experiments of PURA and their corresponding control is now shown in Appendix Figure S3. Quantification of three biological repeats, showed a significant reduction of PURA protein levels upon knock down.

      (b) How does knocking down PURA affect DCP1A-positive structures in HeLa cells? Would P bodies be formed even in the absence (or reduction) of total PURA?

      Indeed, the stated question is very interesting. In fact, we have already shown in our recent publication (Molitor et al., 2023) that a knock down of PURA in HeLa and NHDF cells leads to a significant reduction of P-bodies. We actually referred to this finding on page 6:

      "Since hsPURA was recently shown to be required for P-body formation in HeLa cells and fibroblasts (Molitor et al. 2023), PURA-dependent liquid phase separation could potentially also directly contribute to the formation of these granules."

      On the same page, we also refer to the underlying molecular mechanism:

      "However, when putting this observation in perspective with previous reports, it seems unlikely that P-body formation directly depends on phase separation by hsPURA, but rather on its recently reported function as gene regulator of the essential P-body core factors LSM14a and DDX6 (Molitor et al., 2023)."

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      These ingenious and thoughtful studies present important findings concerning how people represent and generalise abstract patterns of sensory data. The issue of generalisation is a core topic in neuroscience and psychology, relevant across a wide range of areas, and the findings will be of interest to researchers across areas in perception, learning, and cognitive science. The findings have the potential to provide compelling support for the outlined account, but there appear other possible explanations, too, that may affect the scope of the findings but could be considered in a revision.

      Thank you for sending the feedback from the three peer reviewers regarding our paper. Please find below our detailed responses addressing the reviewers' comments. We have incorporated these suggestions into the paper and provided explanations for the modifications made.

      We have specifically addressed the point of uncertainty highlighted in eLife's editorial assessment, which concerned alternative explanations for the reported effect. In response to Reviewer #1, we have clarified how Exp. 2c and Exp. 3c address the potential alternative explanation related to "attention to dimensions." Further, we present a supplementary analysis to account for differences in asymptotic learning, as noted by Reviewer #2. We have also clarified how our control experiments address effects associated with general cognitive engagement in the task. Lastly, we have further clarified the conceptual foundation of our paper, addressing concerns raised by Reviewers #2 and #3.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports a series of experiments examining category learning and subsequent generalization of stimulus representations across spatial and nonspatial domains. In Experiment 1, participants were first trained to make category judgments about sequences of stimuli presented either in nonspatial auditory or visual modalities (with feature values drawn from a two-dimensional feature manifold, e.g., pitch vs timbre), or in a spatial modality (with feature values defined by positions in physical space, e.g., Cartesian x and y coordinates). A subsequent test phase assessed category judgments for 'rotated' exemplars of these stimuli: i.e., versions in which the transition vectors are rotated in the same feature space used during training (near transfer) or in a different feature space belonging to the same domain (far transfer). Findings demonstrate clearly that representations developed for the spatial domain allow for representational generalization, whereas this pattern is not observed for the nonspatial domains that are tested. Subsequent experiments demonstrate that if participants are first pre-trained to map nonspatial auditory/visual features to spatial locations, then rotational generalization is facilitated even for these nonspatial domains. It is argued that these findings are consistent with the idea that spatial representations form a generalized substrate for cognition: that space can act as a scaffold for learning abstract nonspatial concepts.

      Strengths:

      I enjoyed reading this manuscript, which is extremely well-written and well-presented. The writing is clear and concise throughout, and the figures do a great job of highlighting the key concepts. The issue of generalization is a core topic in neuroscience and psychology, relevant across a wide range of areas, and the findings will be of interest to researchers across areas in perception and cognitive science. It's also excellent to see that the hypotheses, methods, and analyses were pre-registered.

      The experiments that have been run are ingenious and thoughtful; I particularly liked the use of stimulus structures that allow for disentangling of one-dimensional and two-dimensional response patterns. The studies are also well-powered for detecting the effects of interest. The model-based statistical analyses are thorough and appropriate throughout (and it's good to see model recovery analysis too). The findings themselves are clear-cut: I have little doubt about the robustness and replicability of these data.

      Weaknesses:

      I have only one significant concern regarding this manuscript, which relates to the interpretation of the findings. The findings are taken to suggest that "space may serve as a 'scaffold', allowing people to visualize and manipulate nonspatial concepts" (p13). However, I think the data may be amenable to an alternative possibility. I wonder if it's possible that, for the visual and auditory stimuli, participants naturally tended to attend to one feature dimension and ignore the other - i.e., there may have been a (potentially idiosyncratic) difference in salience between the feature dimensions that led to participants learning the feature sequence in a one-dimensional way (akin to the 'overshadowing' effect in associative learning: e.g., see Mackintosh, 1976, "Overshadowing and stimulus intensity", Animal Learning and Behaviour). By contrast, we are very used to thinking about space as a multidimensional domain, in particular with regard to two-dimensional vertical and horizontal displacements. As a result, one would naturally expect to see more evidence of two-dimensional representation (allowing for rotational generalization) for spatial than nonspatial domains.

      In this view, the impact of spatial pre-training and (particularly) mapping is simply to highlight to participants that the auditory/visual stimuli comprise two separable (and independent) dimensions. Once they understand this, during subsequent training, they can learn about sequences on both dimensions, which will allow for a 2D representation and hence rotational generalization - as observed in Experiments 2 and 3. This account also anticipates that mapping alone (as in Experiment 4) could be sufficient to promote a 2D strategy for auditory and visual domains.

      This "attention to dimensions" account has some similarities to the "spatial scaffolding" idea put forward in the article, in arguing that experience of how auditory/visual feature manifolds can be translated into a spatial representation helps people to see those domains in a way that allows for rotational generalization. Where it differs is that it does not propose that space provides a scaffold for the development of the nonspatial representations, i.e., that people represent/learn the nonspatial information in a spatial format, and this is what allows them to manipulate nonspatial concepts. Instead, the "attention to dimensions" account anticipates that ANY manipulation that highlights to participants the separable-dimension nature of auditory/visual stimuli could facilitate 2D representation and hence rotational generalization. For example, explicit instruction on how the stimuli are constructed may be sufficient, or pre-training of some form with each dimension separately, before they are combined to form the 2D stimuli.

      I'd be interested to hear the authors' thoughts on this account - whether they see it as an alternative to their own interpretation, and whether it can be ruled out on the basis of their existing data.

      We thank the Reviewer for their comments. We agree with the Reviewer that the “attention to dimensions” hypothesis is an interesting alternative explanation. However, we believe that the results of our control experiments Exp. 2c and Exp. 3c are incompatible with this alternative explanation.

      In Exp. 2c, participants are pre-trained in the visual modality and then tested in the auditory modality. In the multimodal association task, participants have to associate the auditory stimuli and the visual stimuli: on each trial, they hear a sound and then have to click on the corresponding visual stimulus. It is thus necessary to pay attention to both auditory dimensions and both visual dimensions to perform the task. To give an example, the task might involve mapping the fundamental frequency and the amplitude modulation of the auditory stimulus to the colour and the shape of the visual stimulus, respectively. If participants pay attention to only one dimension, this would lead to a maximum of 25% accuracy on average (because they would be at chance on the other dimension, with four possible options). We observed that 30/50 participants reached an accuracy > 50% in the multimodal association task in Exp. 2c. This means that we know for sure that at least 60% of the participants paid attention to both dimensions of the stimuli. Nevertheless, there was a clear difference between participants that received a visual pre-training (Exp. 2c) and those who received a spatial pre-training (Exp. 2a) (frequency of 1D vs 2D models between conditions, BF > 100 in near transfer and far transfer). In fact, only 3/50 participants were best fit by a 2D model when vision was the pre-training modality compared to 29/50 when space was the pre-training modality. Thus, the benefit of the spatial pre-training cannot be due solely to a shift in attention toward both dimensions.

      This effect was replicated in Exp. 3c. Similarly, 33/48 participants reached an accuracy > 50% in the multimodal association task in Exp. 3c, meaning that we know for sure that at least 68% of the participants actually paid attention to both dimensions of the stimuli. Again, there was a clear difference between participants who received a visual pre-training (frequency of 1D vs 2D models between conditions, Exp. 3c) and those who received a spatial pre-training (Exp. 3a) (BF > 100 in near transfer and far transfer).

      Thus, we believe that the alternative explanation raised by the Reviewer is not supported by our data. We have added a paragraph in the discussion:

      “One alternative explanation of this effect could be that the spatial pre-training encourages participants to attend to both dimensions of the non-spatial stimuli. By contrast, pretraining in the visual or auditory domains (where multiple dimensions of a stimulus may be relevant less often naturally) encourages them to attend to a single dimension. However, data from our control experiments Exp. 2c and Exp. 3c, are incompatible with this explanation. Around ~65% of the participants show a level of performance in the multimodal association task (>50%) which could only be achieved if they were attending to both dimensions (performance attending to a single dimension would yield 25% and chance performance is at 6.25%). This suggests that participants are attending to both dimensions even in the visual and auditory mapping case.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, L&S investigates the important general question of how humans achieve invariant behavior over stimuli belonging to one category given the widely varying input representation of those stimuli and more specifically, how they do that in arbitrary abstract domains. The authors start with the hypothesis that this is achieved by invariance transformations that observers use for interpreting different entries and furthermore, that these transformations in an arbitrary domain emerge with the help of the transformations (e.g. translation, rotation) within the spatial domain by using those as "scaffolding" during transformation learning. To provide the missing evidence for this hypothesis, L&S used behavioral category learning studies within and across the spatial, auditory, and visual domains, where rotated and translated 4-element token sequences had to be learned to categorize and then the learned transformation had to be applied in new feature dimensions within the given domain. Through single- and multiple-day supervised training and unsupervised tests, L&S demonstrated by standard computational analyses that in such setups, space and spatial transformations can, indeed, help with developing and using appropriate rotational mapping whereas the visual domain cannot fulfill such a scaffolding role.

      Strengths:

      The overall problem definition and the context of spatial mapping-driven solution to the problem is timely. The general design of testing the scaffolding effect across different domains is more advanced than any previous attempts clarifying the relevance of spatial coding to any other type of representational codes. Once the formulation of the general problem in a specific scientific framework is done, the following steps are clearly and logically defined and executed. The obtained results are well interpretable, and they could serve as a good stepping stone for deeper investigations. The analytical tools used for the interpretations are adequate. The paper is relatively clearly written.

      Weaknesses:

      Some additional effort to clarify the exact contribution of the paper, the link between analyses and the claims of the paper, and its link to previous proposals would be necessary to better assess the significance of the results and the true nature of the proposed mechanism of abstract generalization.

      (1) Insufficient conceptual setup: The original theoretical proposal (the Tolman-Eichenbaum-Machine, Whittington et al., Cell 2020) that L&S relate their work to proposes that just as in the case of memory for spatial navigation, humans and animals create their flexible relational memory system of any abstract representation by a conjunction code that combines on the one hand, sensory representation and on the other hand, a general structural representation or relational transformation. The TEM also suggests that the structural representation could contain any graph-interpretable spatial relations, albeit in their demonstration 2D neighbor relations were used. The goal of L&S's paper is to provide behavioral evidence for this suggestion by showing that humans use representational codes that are invariant to relational transformations of non-spatial abstract stimuli and moreover, that humans obtain these invariances by developing invariance transformers with the help of available spatial transformers. To obtain such evidence, L&S use the rotational transformation. However, the actual procedure they use actually solved an alternative task: instead of interrogating how humans develop generalizations in abstract spaces, they demonstrated that if one defines rotation in an abstract feature space embedded in a visual or auditory modality that is similar to the 2D space (i.e. has two independent dimensions that are clearly segregable and continuous), humans cannot learn to apply rotation of 4-piece temporal sequences in those spaces while they can do it in 2D space, and with co-associating a one-to-one mapping between locations in those feature spaces with locations in the 2D space an appropriate shaping mapping training will lead to the successful application of rotation in the given task (and in some other feature spaces in the given domain). While this is an interesting and challenging demonstration, it does not shed light on how humans learn and generalize, only that humans CAN do learning and generalization in this, highly constrained scenario. This result is a demonstration of how a stepwise learning regiment can make use of one structure for mapping a complex input into a desired output. The results neither clarify how generalizations would develop in abstract spaces nor the question of whether this generalization uses transformations developed in the abstract space. The specific training procedure ensures success in the presented experiments but the availability and feasibility of an equivalent procedure in a natural setting is a crucial part of validating the original claim and that has not been done in the paper.

      We thank the Reviewer for their detailed comments on our manuscript. We reply to the three main points in turn.

      First, concerning the conceptual grounding of our work, we would point out that the TEM model (Whittington et al., 2020), however interesting, is not our theoretical starting point. Rather, as we hope the text and references make clear, we ground our work in theoretical work from the 1990/2000s proposing that space acts as a scaffold for navigating abstract spaces (such as Gärdenfors, 2000). We acknowledge that the TEM model and other experimental work on the implication of the hippocampus, the entorhinal cortex and the parietal cortex in relational transformations of nonspatial stimuli provide evidence for this general theory. However, our work is designed to test a more basic question: whether there is behavioural evidence that space scaffolds learning in the first place. To achieve this, we perform behavioural experiments with causal manipulation (spatial pre-training vs no spatial pre-training) have the potential to provide such direct evidence. This is why we claim that:

      “This theory is backed up by proof-of-concept computational simulations [13], and by findings that brain regions thought to be critical for spatial cognition in mammals (such as the hippocampal-entorhinal complex and parietal cortex) exhibit neural codes that are invariant to relational transformations of nonspatial stimuli. However, whilst promising, this theory lacks direct empirical evidence. Here, we set out to provide a strong test of the idea that learning about physical space scaffolds conceptual generalisation.“

      Second, we agree with the Reviewer that we do not provide an explicit model for how generalisation occurs, and how precisely space acts as a scaffold for building representations and/or applying the relevant transformations to non-spatial stimuli to solve our task. Rather, we investigate in our Exp. 2-4 which aspects of the training are necessary for rotational generalisation to happen (and conclude that a simple training with the multimodal association task is sufficient for ~20% participants). We now acknowledge in the discussion the fact that we do not provide an explicit model and leave that for future work:

      “We acknowledge that our study does not provide a mechanistic model of spatial scaffolding but rather delineate which aspects of the training are necessary for generalisation to happen.”

      Finally, we also agree with the Reviewer that our task is non-naturalistic. As is common in experimental research, one must sacrifice the naturalistic elements of the task in exchange for the control and the absence of prior knowledge of the participants. We have decided to mitigate as possible the prior knowledge of the participants to make sure that our task involved learning a completely new task and that the pre-training was really causing the better learning/generalisation. The effects we report are consistent across the experiments so we feel confident about them but we agree with the Reviewer that an external validation with more naturalistic stimuli/tasks would be a nice addition to this work. We have included a sentence in the discussion:

      “All the effects observed in our experiments were consistent across near transfer conditions (rotation of patterns within the same feature space), and far transfer conditions (rotation of patterns within a different feature space, where features are drawn from the same modality). This shows the generality of spatial training for conceptual generalisation. We did not test transfer across modalities nor transfer in a more natural setting; we leave this for future studies.”

      (2) Missing controls: The asymptotic performance in experiment 1 after training in the three tasks was quite different in the three tasks (intercepts 2.9, 1.9, 1.6 for spatial, visual, and auditory, respectively; p. 5. para. 1, Fig 2BFJ). It seems that the statement "However, our main question was how participants would generalise learning to novel, rotated exemplars of the same concept." assumes that learning and generalization are independent. Wouldn't it be possible, though, that the level of generalization depends on the level of acquiring a good representation of the "concept" and after obtaining an adequate level of this knowledge, generalization would kick in without scaffolding? If so, a missing control is to equate the levels of asymptotic learning and see whether there is a significant difference in generalization. A related issue is that we have no information on what kind of learning in the three different domains was performed, albeit we probably suspect that in space the 2D representation was dominant while in the auditory and visual domains not so much. Thus, a second missing piece of evidence is the model-fitting results of the ⦰ condition that would show which way the original sequences were encoded (similar to Fig 2 CGK and DHL). If the reason for lower performance is not individual stimulus difficulty but the natural tendency to encode the given stimulus type by a combo of random + 1D strategy that would clarify that the result of the cross-training is, indeed, transferring the 2D-mapping strategy.

      We agree with the Reviewer that a good further control is to equate performance during training. Thus, we have run a complementary analysis where we select only the participants that reach > 90% accuracy in the last block of training in order to equate asymptotic performance after training in Exp. 1. The results (see Author response image 1) replicates the results that we report in the main text: there is a large difference between groups (relative likelihood of 1D vs. 2D models, all BF > 100 in favour of a difference between the auditory and the spatial modalities, between the visual and the spatial modalities, in both near and far transfer, “decisive” evidence). We prefer not to include this figure in the paper for clarity, and because we believe this result is expected given the fact that 0/50 and 0/50 of the participants in the auditory and visual condition used a 2D strategy – thus, selecting subgroups of these participants cannot change our conclusions.

      Author response image 1.

      Results of Exp. 1 when selecting participants that reached > 90% accuracy in the last block of training. Captions are the same as Figure 2 of the main text.

      Second, the Reviewer suggested that we run the model fitting analysis only on the ⦰ condition (training) in Exp. 1 to reveal whether participants use a 1D or a 2D strategy already during training. Unfortunately, we cannot provide the model fits only in the ⦰ condition in Exp. 1 because all models make the same predictions for this condition (see Fig S4). However, note that this is done by design: participants were free to apply whatever strategy they want during training; we then used the generalisation phase with the rotated stimuli precisely to reveal this strategy. Further, we do believe that the strategy used by the participants during training and the strategy during transfer are the same, partly because – starting from block #4 – participants have no idea whether the current trial is a training trial or a transfer trial, as both trial types are randomly interleaved with no cue signalling the trial type. We have made this clear in the methods:

      “They subsequently performed 105 trials (with trialwise feedback) and 105 transfer trials including rotated and far transfer quadruplets (without trialwise feedback) which were presented in mixed blocks of 30 trials. Training and transfer trials were randomly interleaved, and no clue indicated whether participants were currently on a training trial or a transfer trial before feedback (or absence of feedback in case of a transfer trial).”

      Reviewer #3 (Public Review):

      Summary:

      Pesnot Lerousseau and Summerfield aimed to explore how humans generalize abstract patterns of sensory data (concepts), focusing on whether and how spatial representations may facilitate the generalization of abstract concepts (rotational invariance). Specifically, the authors investigated whether people can recognize rotated sequences of stimuli in both spatial and nonspatial domains and whether spatial pre-training and multi-modal mapping aid in this process.

      Strengths:

      The study innovatively examines a relatively underexplored but interesting area of cognitive science, the potential role of spatial scaffolding in generalizing sequences. The experimental design is clever and covers different modalities (auditory, visual, spatial), utilizing a two-dimensional feature manifold. The findings are backed by strong empirical data, good data analysis, and excellent transparency (including preregistration) adding weight to the proposition that spatial cognition can aid abstract concept generalization.

      Weaknesses:

      The examples used to motivate the study (such as "tree" = oak tree, family tree, taxonomic tree) may not effectively represent the phenomena being studied, possibly confusing linguistic labels with abstract concepts. This potential confusion may also extend to doubts about the real-life applicability of the generalizations observed in the study and raises questions about the nature of the underlying mechanism being proposed.

      We thank the Reviewer for their comments. We agree that we could have explained ore clearly enough how these examples motivate our study. The similarity between “oak tree” and “family tree” is not just the verbal label. Rather, it is the arrangement of the parts (nodes and branches) in a nested hierarchy. Oak trees and family trees share the same relational structure. The reason that invariance is relevant here is that the similarity in relational structure is retained under rigid body transformations such as rotation or translation. For example, an upside-down tree can still be recognised as a tree, just as a family tree can be plotted with the oldest ancestors at either top or bottom. Similarly, in our study, the quadruplets are defined by the relations between stimuli: all quadruplets use the same basic stimuli, but the categories are defined by the relations between successive stimuli. In our task, generalising means recognising that relations between stimuli are the same despite changes in the surface properties (for example in far transfer). We have clarify that in the introduction:

      “For example, the concept of a “tree” implies an entity whose structure is defined by a nested hierarchy, whether this is a physical object whose parts are arranged in space (such as an oak tree in a forest) or a more abstract data structure (such as a family tree or taxonomic tree). [...] Despite great changes in the surface properties of oak trees, family trees and taxonomic trees, humans perceive them as different instances of a more abstract concept defined by the same relational structure.”

      Next, the study does not explore whether scaffolding effects could be observed with other well-learned domains, leaving open the question of whether spatial representations are uniquely effective or simply one instance of a familiar 2D space, again questioning the underlying mechanism.

      We would like to mention that Reviewer #2 had a similar comment. We agree with both Reviewers that our task is non-naturalistic. As is common in experimental research, one must sacrifice the naturalistic elements of the task in exchange for the control and the absence of prior knowledge of the participants. We have decided to mitigate as possible the prior knowledge of the participants to make sure that our task involved learning a completely new task and that the pre-training was really causing the better learning/generalisation. The effects we report are consistent across the experiments so we feel confident about them but we agree with the Reviewer that an external validation with more naturalistic stimuli/tasks would be a nice addition to this work. We have included a sentence in the discussion:

      “All the effects observed in our experiments were consistent across near transfer conditions (rotation of patterns within the same feature space), and far transfer conditions (rotation of patterns within a different feature space, where features are drawn from the same modality). This shows the generality of spatial training for conceptual generalisation. We did not test transfer across modalities nor transfer in a more natural setting; we leave this for future studies.”

      Further doubt on the underlying mechanism is cast by the possibility that the observed correlation between mapping task performance and the adoption of a 2D strategy may reflect general cognitive engagement rather than the spatial nature of the task. Similarly, the surprising finding that a significant number of participants benefited from spatial scaffolding without seeing spatial modalities may further raise questions about the interpretation of the scaffolding effect, pointing towards potential alternative interpretations, such as shifts in attention during learning induced by pre-training without changing underlying abstract conceptual representations.

      The Reviewer is concerned about the fact that the spatial pre-training could benefit the participants by increasing global cognitive engagement rather than providing a scaffold for learning invariances. It is correct that the participants in the control group in Exp. 2c have poorer performances on average than participants that benefit from the spatial pre-training in Exp. 2a and 2b. The better performances of the participants in Exp. 2a and 2b could be due to either the spatial nature of the pre-training (as we claim) or a difference in general cognitive engagement. .

      However, if we look closely at the results of Exp. 3, we can see that the general cognitive engagement hypothesis is not well supported by the data. Indeed, the participants in the control condition (Exp. 3c) have relatively similar performances than the other groups during training. Rather, the difference is in the strategy they use, as revealed by the transfer condition. The majority of them are using a 1D strategy, contrary to the participants that benefited from a spatial pre-training (Exp 3a and 3b). We have included a sentence in the results:

      “Further, the results show that participants who did not experience spatial pre-training were still engaged in the task, but were not using the same strategy as the participants who experienced spatial pre-training (1D rather than 2D). Thus, the benefit of the spatial pre-training is not simply to increase the cognitive engagement of the participants. Rather, spatial pre-training provides a scaffold to learn rotation-invariant representation of auditory and visual concepts even when rotation is never explicitly shown during pre-training.”

      Finally, Reviewer #1 had a related concern about a potential alternative explanation that involved a shift in attention. We reproduce our response here: we agree with the Reviewer that the “attention to dimensions” hypothesis is an interesting (and potentially concerning) alternative explanation. However, we believe that the results of our control experiments Exp. 2c and Exp. 3c are not compatible with this alternative explanation.

      Indeed, in Exp. 2c, participants are pre-trained in the visual modality and then tested in the auditory modality. In the multimodal association task, participants have to associate the auditory stimuli and the visual stimuli: on each trial, they hear a sound and then have to click on the corresponding visual stimulus. It is necessary to pay attention to both auditory dimensions and both visual dimensions to perform well in the task. To give an example, the task might involve mapping the fundamental frequency and the amplitude modulation of the auditory stimulus to the colour and the shape of the visual stimulus, respectively. If participants pay attention to only one dimension, this would lead to a maximum of 25% accuracy on average (because they would be at chance on the other dimension, with four possible options). We observed that 30/50 participants reached an accuracy > 50% in the multimodal association task in Exp. 2c. This means that we know for sure that at least 60% of the participants actually paid attention to both dimensions of the stimuli. Nevertheless, there was a clear difference between participants that received a visual pre-training (Exp. 2c) and those who received a spatial pre-training (Exp. 2a) (frequency of 1D vs 2D models between conditions, BF > 100 in near transfer and far transfer). In fact, only 3/50 participants were best fit by a 2D model when vision was the pre-training modality compared to 29/50 when space was the pre-training modality. Thus, the benefit of the spatial pre-training cannot be due solely to a shift in attention toward both dimensions.

      This effect was replicated in Exp. 3c. Similarly, 33/48 participants reached an accuracy > 50% in the multimodal association task in Exp. 3c, meaning that we know for sure that at least 68% of the participants actually paid attention to both dimensions of the stimuli. Again, there was a clear difference between participants who received a visual pre-training (frequency of 1D vs 2D models between conditions, Exp. 3c) and those who received a spatial pre-training (Exp. 3a) (BF > 100 in near transfer and far transfer).

      Thus, we believe that the alternative explanation raised by the Reviewer is not supported by our data. We have added a paragraph in the discussion:

      “One alternative explanation of this effect could be that the spatial pre-training encourages participants to attend to both dimensions of the non-spatial stimuli. By contrast, pretraining in the visual or auditory domains (where multiple dimensions of a stimulus may be relevant less often naturally) encourages them to attend to a single dimension. However, data from our control experiments Exp. 2c and Exp. 3c, are incompatible with this explanation. Around ~65% of the participants show a level of performance in the multimodal association task (>50%) which could only be achieved if they were attending to both dimensions (performance attending to a single dimension would yield 25% and chance performance is at 6.25%). This suggests that participants are attending to both dimensions even in the visual and auditory mapping case.”

      Conclusions:

      The authors successfully demonstrate that spatial training can enhance the ability to generalize in nonspatial domains, particularly in recognizing rotated sequences. The results for the most part support their conclusions, showing that spatial representations can act as a scaffold for learning more abstract conceptual invariances. However, the study leaves room for further investigation into whether the observed effects are unique to spatial cognition or could be replicated with other forms of well-established knowledge, as well as further clarifications of the underlying mechanisms.

      Impact:

      The study's findings are likely to have a valuable impact on cognitive science, particularly in understanding how abstract concepts are learned and generalized. The methods and data can be useful for further research, especially in exploring the relationship between spatial cognition and abstract conceptualization. The insights could also be valuable for AI research, particularly in improving models that involve abstract pattern recognition and conceptual generalization.

      In summary, the paper contributes valuable insights into the role of spatial cognition in learning abstract concepts, though it invites further research to explore the boundaries and specifics of this scaffolding effect.

      Reviewer #1 (Recommendations For The Authors):

      Minor issues / typos:

      P6: I think the example of the "signed" mapping here should be "e.g., ABAB maps to one category and BABA maps to another", rather than "ABBA maps to another" (since ABBA would always map to another category, whether the mapping is signed or unsigned).

      Done.

      P11: "Next, we asked whether pre-training and mapping were systematically associated with 2Dness...". I'd recommend changing to: "Next, we asked whether accuracy during pre-training and mapping were systematically associated with 2Dness...", just to clarify what the analyzed variables are.

      Done.

      P13, paragraph 1: "only if the features were themselves are physical spatial locations" either "were" or "are" should be removed.

      Done.

      P13, paragraph 1: should be "neural representations of space form a critical substrate" (not "for").

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors use in multiple places in the manuscript the phrases "learn invariances" (Abstract), "formation of invariances" (p. 2, para. 1), etc. It might be just me, but this feels a bit like 'sloppy' wording: we do not learn or form invariances, rather we learn or form representations or transformations by which we can perform tasks that require invariance over particular features or transformation of the input such as the case of object recognition and size- translation- or lighting-invariance. We do not form size invariance, we have representations of objects and/or size transformations allowing the recognition of objects of different sizes. The authors might change this way of referring to the phenomenon.

      We respectfully disagree with this comment. An invariance occurs when neurons make the same response under different stimulation patterns. The objects or features to which a neuron responds is shaped by its inputs. Those inputs are in turn determined by experience-dependent plasticity. This process is often called “representation learning”. We think that our language here is consistent with this status quo view in the field.

      Reviewer #3 (Recommendations For The Authors):

      • I understand that the objective of the present experiment is to study our ability to generalize abstract patterns of sensory data (concepts). In the introduction, the authors present examples like the concept of a "tree" (encompassing a family tree, an oak tree, and a taxonomic tree) and "ring" to illustrate the idea. However, I am sceptical as to whether these examples effectively represent the phenomena being studied. From my perspective, these different instances of "tree" do not seem to relate to the same abstract concept that is translated or rotated but rather appear to share only a linguistic label. For instance, the conceptual substance of a family tree is markedly different from that of an oak tree, lacking significant overlap in meaning or structure. Thus, to me, these examples do not demonstrate invariance to transformations such as rotations.

      To elaborate further, typically, generalization involves recognizing the same object or concept through transformations. In the case of abstract concepts, this would imply a shared abstract representation rather than a mere linguistic category. While I understand the objective of the experiments and acknowledge their potential significance, I find myself wondering about the real-world applicability and relevance of such generalizations in everyday cognitive functioning. This, in turn, casts some doubt on the broader relevance of the study's results. A more fitting example, or an explanation that addresses my concerns about the suitability of the current examples, would be beneficial to further clarify the study's intent and scope.

      Response in the public review.

      • Relatedly, the manuscript could benefit from greater clarity in defining key concepts and elucidating the proposed mechanism behind the observed effects. Is it plausible that the changes observed are primarily due to shifts in attention induced by the spatial pre-training, rather than a change in the process of learning abstract conceptual invariances (i.e., modifications to the abstract representations themselves)? While the authors conclude that spatial pre-training acts as a scaffold for enhancing the learning of conceptual invariances, it raises the question: does this imply participants simply became more focused on spatial relationships during learning, or might this shift in attention represent a distinct strategy, and an alternative explanation? A more precise definition of these concepts and a clearer explanation of the authors' perspective on the mechanism underlying these effects would reduce any ambiguity in this regard.

      Response in the public review.

      • I am wondering whether the effectiveness of spatial representations in generalizing abstract concepts stems from their special nature or simply because they are a familiar 2D space for participants. It is well-established that memory benefits from linking items to familiar locations, a technique used in memory training (method of loci). This raises the question: Are we observing a similar effect here, where spatial dimensions are the only tested familiar 2D spaces, while the other 2 spaces are simply unfamiliar, as also suggested by the lower performance during training (Fig.2)? Would the results be replicable with another well-learned, robustly encoded domain, such as auditory dimensions for professional musicians, or is there something inherently unique about spatial representations that aids in bootstrapping abstract representations?

      On the other side of the same coin, are spatial representations qualitatively different, or simply more efficient because they are learned more quickly and readily? This leads to the consideration that if visual pre-training and visual-to-auditory mapping were continued until a similar proficiency level as in spatial training is achieved, we might observe comparable performance in aiding generalization. Thus, the conclusion that spatial representations are a special scaffold for abstract concepts may not be exclusively due to their inherent spatial nature, but rather to the general characteristic of well-established representations. This hypothesis could be further explored by either identifying alternative 2D representations that are equally well-learned or by extending training in visual or auditory representations before proceeding with the mapping task. At the very least I believe this potential explanation should be explored in the discussion section.

      Response in the public review.

      I had some difficulty in following an important section of the introduction: "... whether participants can learn rotationally invariant concepts in nonspatial domains, i.e., those that are defined by sequences of visual and auditory features (rather than by locations in physical space, defined in Cartesian or polar coordinates) is not known." This was initially puzzling to me as the paragraph preceding it mentions: "There is already good evidence that nonspatial concepts are represented in a translation invariant format." While I now understand that the essential distinction here is between translation and rotation, this was not immediately apparent upon first reading. This crucial distinction, especially in the context of conceptual spaces, was not clearly established before this point in the manuscript. For better clarity, it would be beneficial to explicitly contrast and define translation versus rotation in this particular section and stress that the present study concerns rotations in abstract spaces.

      Done.

      • The multi-modal association is crucial for the study, however to my knowledge, it is not depicted or well explained in the main text or figures (Results section). In my opinion, the details of this task should be explained and illustrated before the details of the associated results are discussed.

      We have included an illustration of a multimodal association trial in Fig. S3B.

      Author response image 2.

      • The observed correlation between the mapping task performance and the adoption of a 2D strategy is logical. However, this correlation might not exclusively indicate the proposed underlying mechanism of spatial scaffolding. Could it also be reflective of more general factors like overall performance, attention levels, or the effort exerted by participants? This alternative explanation suggests that the correlation might arise from broader cognitive engagement rather than specifically from the spatial nature of the task. Addressing this possibility could strengthen the argument for the unique role of spatial representations in learning abstract concepts, or at least this alternative interpretation should be mentioned.

      Response in the public review.

      • To me, the finding that ~30% of participants benefited from the spatial scaffolding effect for example in the auditory condition merely through exposure to the mapping (Fig 4D), without needing to see the quadruplets in the spatial modality, was somewhat surprising. This is particularly noteworthy considering that only ~60% of participants adopted the 2D strategy with exposure to rotated contingencies in Experiment 3 (Fig 3D). How do the authors interpret this outcome? It would be interesting to understand their perspective on why such a significant effect emerged from mere exposure to the mapping task.

      • I appreciate the clarity Fig.1 provides in explaining a challenging experimental setup. Is it possible to provide example trials, including an illustration that shows which rotations produce the trail and an intuitive explanation that response maps onto the 1D vs 2D strategies respectively, to aid the reader in better understanding this core manipulation?

      • I like that the authors provide transparency by depicting individual subject's data points in their results figures (e.g. Figs. 2 B, F, J). However, with an n=~50 per condition, it becomes difficult to intuit the distribution, especially for conditions with higher variance (e.g., Auditory). The figures might be more easily interpretable with alternative methods of displaying variances, such as violin plots per data point, conventional error shading using 95%CIs, etc.

      • Why are the authors not reporting exact BFs in the results sections at least for the most important contrasts?

      • While I understand why the authors report the frequencies for the best model fits, this may become difficult to interpret in some sections, given the large number of reported values. Alternatives or additional summary statistics supporting inference could be beneficial.

      As the Reviewer states, there are a large number of figures that we can report in this study. We have chosen to keep this number at a minimum to be as clear as possible. To illustrate the distribution of individual data points, we have opted to display only the group's mean and standard error (the standard errors are included, but the substantial number of participants per condition provides precise estimates, resulting in error bars that can be smaller than the mean point). This decision stems from our concern that including additional details could lead to a cluttered representation with unnecessary complexity. Finally, we report what we believe to be the critical BFs for the comprehension of the reader in the main text, and choose a cutoff of 100 when BFs are high (corresponding to the label “decisive” evidence, some BFs are larger than 1012). All the exact BFs are in the supplementary for the interested readers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work provides a near-complete description of the mechanosensory bristles on the Drosophila melanogaster head and the anatomy and projection patterns of the bristle mechanosensory neurons that innervate them. The data presented are solid. The study has generated numerous invaluable resources for the community that will be of interest to neuroscientists in the field of circuits and behaviour, particularly those interested in mechanosensation and behavioural sequence generation.

      We express our gratitude to the Reviewers for their valuable suggestions, which significantly enhanced the manuscript. The revisions were undertaken, not with the expectation of acceptance, but rather driven by our sincere belief that these revisions would enhance the manuscript's impact for future readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Sensory neurons of the mechanosensory bristles on the head of the fly project to the sub esophageal ganglion (SEZ). In this manuscript, the authors have built on a large body of previous work to comprehensively classify and quantify the head bristles. They broadly identify the nerves that various bristles use to project to the SEZ and describe their region-specific innervation in the SEZ. They use dye-fills, clonal labelling, and electron microscopic reconstructions to describe in detail the phenomenon of somatotopy - conserved peripheral representations within the central brain - within the innervation of these neurons. In the process they develop novel tools to access subsets of these neurons. They use these to demostrate that groups of bristles in different parts of the head control different aspects of the grooming sequence.

      Reviewer #2 (Public Review):

      The authors combine genetic tools, dye fills and connectome analysis techniques to generate a "first-of-its-kind", near complete, synaptic resolution map of the head bristle neurons of Drosophila. While some of the BMN anatomy was already known based on previous work by the authors and other researchers, this is the first time a near complete map has been created for the head BMNs at electron microscopy resolution.

      Strengths:

      (1) The authors cleverly use techniques that allow moving back and forth between periphery (head bristle location) and brain, as well as moving between light microscopy and electron microscopy data. This allows them to first characterize the pathways taken by different head BMNs to project to the brain and also characterize anatomical differences among individual neurons at the level of morphology and connectivity.

      (2) The work is very comprehensive and results in a near complete map of all I’m head BMNs.

      (3) Authors also complement this anatomical characterization with a first-level functional analysis using optogenetic activation of BMNs that results in expected directed grooming behavior.

      Weaknesses:

      (1) The clustering analysis is compelling but cluster numbers seem to be arbitrarily chosen instead of by using some informed metrics.

      We made revisions to the manuscript that address this concern. Please see our response to “recommendations for authors” for a description of these revisions.

      (2) It could help provide context if authors revealed some of the important downstream pathways that could explain optogenetics behavioral phenotypes and previously shown hierarchical organization of grooming sequences.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      (3) In contrast to the rigorous quantitative analysis of the anatomical data, the behavioral data is analyzed using much more subjective methods. While I do not think it is necessary to perform a rigorous analysis of behaviors in this anatomy focused manuscript, the conclusions based on behavioral analysis should be treated as speculative in the current form e.g. calling "nodding + backward walking" as an avoidance response is not justified as it currently stands. Strong optogenetic activation could lead to sudden postural changes that due to purely biomechanical constraints could lead to a couple of backward steps as seen in the example videos. Moreover since the quantification is manual, it is not clear what the analyst interprets as backward walking or nodding. Interpretation is also concerning because controls show backward walking (although in fewer instances based on subjective quantification).

      While unbiased machine vision-based methods would nicely complement the present work, this type of analysis is not yet working to distinguish between different head grooming movements. Therefore, we are currently limited to manual annotation for our behavioral analysis. That said, we do not believe that our manual annotation is subjective. The grooming movements that we examine in this work are distinguishable from each other through frame-by-frame manual annotation of video at 30 fps. Our annotation of the grooming and backward motions performed by flies are based on previous publications that established a controlled vocabulary defining each movement (Hampel et al., 2020a, 2017, 2015; Seeds et al., 2014). In this work, we added head nodding to this controlled vocabulary that is described in the Materials and methods. We have added additional text to the third paragraph of the Material and methods section entitled “Behavioral analysis procedures” that we hope better describes our behavioral analysis. This description now reads:

      Head nodding was annotated when the fly tilted its head downward by any amount until it returned its head back in its original position. This movement often occurred in repeated cycles. Therefore, the “start” was scored at the onset of the first forward movement and the “stop” when the head returned to its original position on the last nod.

      We do not make any firm conclusions about the head movements (nodding) and backwards motions. We refer to nodding as a descriptive term that would allow the reader to better understand what the behavior looks like. We make no firm conclusions about any behavioral functional role that either the nodding or the backward motions might have, with the exception of nodding in the context of grooming. We only suggest that the behaviors appear to be avoidance responses. Furthermore, backward walking was not mentioned. Instead we refer to backward motions. We are only reporting our annotations of these movements that do occur, and are significantly different from controls. We speculate that these could be avoidance responses based on support from the literature. Future studies will be required to understand whether these movements serve real behavioral roles.

      Summary:

      The authors end up generating a near-complete map of head BMNs that will serve as a long-standing resource to the Drosophila research community. This will directly shape future experiments aimed at modeling or functionally analyzing the head grooming circuit to understand how somatotopy guides behaviors.

      Reviewer #3 (Public Review):

      Eichler et al. set out to map the locations of the mechanosensory bristles on the fly head, examine the axonal morphology of the bristle mechanosensory neurons (BMNs) that innervate them, and match these to electron microscopy reconstructions of the same BMNs in a previously published EM volume of the female adult fly brain. They used BMN synaptic connectivity information to create clusters of BMNs that they show occupy different regions of the subesophageal zone brain region and use optogenetic activation of subsets of BMNs to support the claim that the morphological projections and connectivity of defined groups of BMNs are consistent with the parallel model for behavioral sequence generation.

      The authors have beautifully cataloged the mechanosensory bristles and the projection paths and patterns of the corresponding BMN axons in the brain using detailed and painstaking methods. The result is a neuroanatomy resource that will be an important community resource. To match BMNs reconstructed in an electron microscopy volume of the adult fly brain, the authors matched clustered reconstructed BMNs with light-level BMN classes using a variety of methods, but evidence for matching is only summarized and not demonstrated in a way that allows the reader to evaluate the strength of the evidence. The authors then switch from morphology-based categorization to non-BMN connectivity as a clustering method, which they claim demonstrates that BMNs form a somatotopic map in the brain. This map is not easily appreciated, and although contralateral projections in some populations are clear, the distinct projection zones that are mentioned by the authors are not readily apparent. Because of the extensive morphological overlap between connectivity-based clusters, it is not clear that small projection differences at the projection level are what determines the post-synaptic connectivity of a given BMN cluster or their functional role during behavior. The claim the somatotopic organization of BMN projections is preserved among their postsynaptic partners to form parallel sensory pathways is not supported by the result that different connectivity clusters still have high cosine similarity in a number of cases (i.e. Clusters 1 and 3, or Clusters 1 and 2). Finally, the authors use tools that were generated during the light-level characterization of BMN projections to show that specifically activating BMNs that innervate different areas of the head triggers different grooming behaviors. In one case, activation of a single population of sensory bristles (lnOm) triggers two different behaviors, both eye and dorsal head grooming. This result does not seem consistent with the parallel model, which suggests that these behaviors should be mutually exclusive and rely on parallel downstream circuitry.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      This work will have a positive impact on the field by contributing a complete accounting of the mechanosensory bristles of the fruit fly head, describing the brain projection patterns of the BMNs that innervate them, and linking them to BMN sensory projections in an electron microscopy volume of the adult fly brain. It will also have a positive impact on the field by providing genetic tools to help functionally subdivide the contributions of different BMN populations to circuit computations and behavior. This contribution will pave the way for further mechanistic study of central circuits that subserve grooming circuits.

      Recommendations for the authors:

      All three reviewers appreciated the work presented in this manuscript. There were also a few overlapping concerns that were raised that are summarised below, should the authors wish to address them:

      Somatotopy: We recommend that the authors describe the extent of prior knowledge in more detail to highlight their contribution better.

      We made revisions that better highlight the extent of prior knowledge about somatotopy. We describe how previous studies showed bristle mechanosensory neurons in insects are somatotopically organized, but these studies were not comprehensive descriptions of complete somatotopic maps for the head or body. To our knowledge, our study provides the first comprehensive and synaptic resolution somatotopic map of a head for any animal. This sets the stage for the complete definition of the interface between somatotopically-organized mechanosensory neurons and postsynaptic circuits, which has broad implications for future studies on aimed grooming, and mechanosensation in general. Below we itemize revisions to the Introduction, Discussion, and Figures to provide a clearer statement of the significance of our study as it relates to somatotopy.

      (1) Newly added Figure 1 – figure supplement 1 more explicitly grounds the study in somatotopy, providing a working model of the organization of the circuit pathways that produce the grooming sequence. This model features somatotopy as shown in Figure 1 – figure supplement 1C.

      (2) Figure 1 – figure supplement 1 is incorporated into the Introduction in the second, third, and fourth paragraphs, the first paragraph of the Results section titled “Somatotopically-organized parallel BMN pathways”, and the second and third paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      (3) We added text to the end of the fourth paragraph of the Introduction that now reads: “In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.”

      (4) There is a Discussion section that further explains the extent of prior knowledge and our contributions on somatotopy that is titled “A synaptic resolution somatotopic map of the head BMNs”. Additionally, the previous version of this section had a paragraph on the broader implications of our work as it relates to somatotopy across species. In light of the reviewer comments, we decided to make this paragraph into its own Discussion section to better highlight the broader significance of our work. This section is titled “First synaptic resolution somatotopic map of the head”.

      The somatotopy isn't overtly obvious - perhaps they could try mapping presynaptic sites and provide landmarks to improve visualisation.

      We made the following revisions to better highlight the head BMN somatotopy. One point of confusion from the previous manuscript version stemmed from us not explicitly defining the somatotopic organization that we observed. There seemed to be confusion that we were defining the head somatotopy based only on the small projection differences among BMNs from neighboring head locations. While we believe that these small differences indeed correspond to somatotopy, we failed to highlight that there are overt differences in the brain projections of BMNs from distant locations on the head. For example, Figure 5B (right panel) shows the distinct projections between the LabNv (brown) and AntNv (blue) BMNs that innervate bristles on the ventral and dorsal head, respectively. Thus, BMN types innervating neighboring bristles show overlapping projections with small projection differences, whereas those innervating distant bristles show non overlapping projections into distinct zones.

      Our analysis of postsynaptic connectivity similarity also shows somatotopic organization among the BMN postsynaptic partners, as BMN types innervating the same or neighboring bristle populations show high connectivity similarity (Figure 8, old Figure 7). Below we highlight major revisions to the text and Figures that hopefully better reveal the head somatotopy.

      (1) In the last paragraph of the Introduction we added text that explicitly frames the experiments in terms of somatotopic organization: “This reveals somatotopic organization, where BMNs innervating neighboring bristles project to the same zones in the CNS while those innervating distant bristles project to distinct zones. Analysis of the BMN postsynaptic connectome reveals that neighboring BMNs show higher connectivity similarity than distant BMNs, providing evidence of somatotopically organized postsynaptic circuit pathways.”

      (2) We mention an example of overt somatotopy from Figure 5 in the Results section titled “EM-based reconstruction of the head BMN projections in a full adult brain”. The text reads “For example, BMNs from the Eye- and LabNv have distinct ventral and anterior projections, respectively. This shows how the BMNs are somatotopically organized, as their distinct projections correspond to different bristle locations on the head (Figure 5B,C).”

      (3) In new Figure 8 (part of old Figure 7), we modified panels that correspond to the cosine similarity analysis of postsynaptic connectivity. The major revision was to plot the cosine similarity clusters onto the head bristles so that the bristles are now colored based on their clusters (C). This shows how neighboring BMNs cluster together, and therefore show similar postsynaptic connectivity. We believe that this provides a nice visualization of somatotopic organization in BMN postsynaptic connectivity. We also added the clustering dendrogram as recommended by Reviewer #2 (Figure 8A).

      (4) In new Figure 8, we added new panels (D-F) that summarize our anatomical and connectomic analysis showing different somatotopic features of the head BMNs. Different BMN types innervate bristles at neighboring and distant proximities (D). BMNs that innervate neighboring bristles project into overlapping zones (E, example of reconstructed BM-Fr and -Ant neurons with non-overlapping BM-MaPa neurons) and show postsynaptic connectivity similarity (F, example connectivity map of three BM types on cosine similarity data).

      (5) To accompany the new Figure 8D-F panels, we added a paragraph to summarize the different somatotopic features of the head BMNs that were identified based on our anatomical and connectomic analysis. This is the last paragraph in the Results section titled “Somatotopically-organized parallel BMN pathways”:

      Our results reveal head bristle proximity-based organization among the BMN projections and their postsynaptic partners to form parallel mechanosensory pathways. BMNs innervating neighboring bristles project into overlapping zones in the SEZ, whereas those innervating distant bristles project to distinct zones (example of BM-Fr, -Ant, and -MaPa neurons shown in Figure 8D,E). Cosine similarity analysis of BMN postsynaptic connectivity revealed that BMNs innervating the same bristle populations (same types) have the highest connectivity similarity. Figure 8F shows example parallel connections for BM-Fr, -Ant, and -MaPa neurons (vertical arrows), where the edge width indicates the number of synapses from each BMN type to their major postsynaptic partners. Additionally, BMNs innervating neighboring bristle populations showed postsynaptic connectivity similarity, while BMNs innervating distant bristles show little or none. For example, BM-Fr and -Ant neurons have connections to common postsynaptic partners, whereas BM-MaPa neurons show only weak connections with the main postsynaptic partners of BM-Fr or -Ant neurons (Figure 8F, connections under 5% of total BMN output omitted). These results suggest that BMN somatotopy could have different possible levels of head spatial resolution, from specific bristle populations (e.g. Ant bristles), to general head areas (e.g. dorsal head bristles).

      We also refer to Figure 8D-F to illustrate the different somatotopic features in the Discussion. These references can be found in the following Discussion sections titled “A synaptic resolution somatotopic map of the head BMNs (fourth paragraph)”, and “Parallel circuit architecture underlying the grooming sequence (second paragraph)”.

      (6) In addition to improving the Figures, we provide additional tools that enable readers to explore the BMN somatotopy in a more interactive way. That is, we provide 5 different FlyWire.ai links in the manuscript Results section that enable 3D visualization of the different reconstructed BMNs (e.g. FlyWire.ai link 1).

      Note: In working on old Figure 7 to address this Reviewer suggestion, we also reordered panels A-E. We believe that this was a more logical ordering than in the previous draft. These panels are now the only data shown in Figure 7, as the cosine similarity analysis is now in Figure 8. We hope that splitting these panels into two Figures will improve manuscript readability.

      Light EM Mapping: A better description of methods by which this mapping was done would be helpful. Perhaps the authors could provide a few example parallel representations of the EM and light images in the main figure would help the reader better appreciate the strength of their approach.

      We have done as the Reviewers suggested and added panels to Figure 6 that show examples of the LM and EM image matching (Figure 6A,B). We added two examples that used different methods for labeling the LM imaged BMNs, including MCFO labeling of an individual BM-InOc neuron and driver line labeling of a major portion of BM-InOm neurons using InOmBMN-LexA. These panels are referred to in the first paragraph of the Results section titled “Matching the reconstructed head BMNs with their bristles”. Note that examples for all LM/EM matched BMN types are shown in Figure 6 – figure supplement 2.

      We had provided Figure 6 – figure supplement 2 in the reviewed manuscript that shows all the above requested “parallel representations of the EM and light images”. However, the Reviewer critiques made us realize that the purpose of this figure supplement was not clearly indicated. Therefore, we have revised Figure 6 – figure supplement 2 and its legend to make its purpose clearer. First, we changed the legend title to better highlight its purpose. The legend is now titled: “Matching EM reconstructed BMN projections with light microscopy (LM) imaged BMNs that innervate specific bristles”. Second, we added label designations to the figure panel rows that highlight the LM and EM comparisons. That is, the rows for light microscopy images of BMNs are indicated with LM and the rows for EM reconstructed BMN images are labeled with EM. Reviewer #3 had indicated that it was not clear what labeling methods were used to visualize the LM imaged BM-InOm neurons in Figure 6 – figure supplement 2N. Therefore, we added text to the figure and the legend to better highlight the different methods used. Panels A and B were also cropped to accommodate the above mentioned revisions.

      The manuscript also provides an extensive Materials and methods section that describes the different lines of evidence that were used to assign the reconstructed BMNs as specific types. We changed the title to better highlight the purpose of this methods section to “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. The evidence used to support the assignment of the different BMN types is also summarized in Figure 6 – figure supplement 3.

      Parallel circuit model: The authors motivate their study with this. We're recommending that they define expectations of such circuitry, its alternatives (including implications for downstream pathways), and behavior before they present their results. We're also recommending that they interpret their behavioural results in the context of these circuits.

      Our primary motivation for doing the experiments described in this manuscript was to help define the neural circuit architecture underlying the parallel model that drives the Drosophila grooming sequence. This manuscript provides a comprehensive assessment of the first layer of this circuit architecture. A byproduct of this work is a contribution that offers immediate utility and significance to the Drosophila connectomics community. Namely, the description of the majority of mechanosensory neurons on the head, with their annotation in the recently released whole brain connectome dataset (FlyWire.ai). In writing this manuscript, we tried to balance both of these things, which was difficult to write. We very much appreciate the Reviewers' comments that have highlighted points of confusion in our original draft. We hope that the revised draft is now clearer and more logically presented. We have made revisions to the text and provided a new figure supplement (Figure 1 - figure supplement 1) and new panels in Figure 8. Below we highlight the major revisions.

      (1) The Introduction was revised to more explicitly ground the study in the parallel model, while also removing details that were not pertinent to the experiments presented in the manuscript.

      The first paragraph introduces different features of the parallel model. To better focus the reader on the parts of the model that were being assessed in the manuscript, we removed the following sentences: “Performance order is established by an activity gradient among parallel circuits where earlier actions have the highest activity and later actions have the lowest. A winner-take-all network selects the action with the highest activity and suppresses the others. The selected action is performed and then terminated to allow a new round of competition and selection of the next action.” Note that these sentences are included in the third and fourth paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      The first paragraph of the Introduction now introduces a bigger picture view of the model that emphasizes the two main features: 1) a parallel circuit architecture that ensures all mutually exclusive actions to be performed in sequence are simultaneously readied and competing for output, and 2) hierarchical suppression among the parallel circuits, where earlier actions suppress later actions.

      (2) Newly added Figure 1 – figure supplement 1 provides a working model of grooming (Reviewer # 1 suggestion). We now more strongly emphasize that the study aimed to define the parallel neural circuit architecture underlying the grooming sequence, focusing on the mechanosensory layer of this architecture. In particular, we refer to the new Figure 1 – figure supplement 1 that has been added to better convey the hypothesized grooming neural circuit architecture. Figure 1 – figure supplement 1 is incorporated into the Introduction (paragraphs two, three, and four), Results section titled “Somatotopically-organized parallel BMN pathways (first paragraph)”, and last Discussion section titled “Parallel circuit architecture underlying the grooming sequence (second and third paragraphs)”.

      (3) New panels in Figure 8 update the model of parallel circuit organization as it relates to somatotopy (D-F). These panels show the parallel circuits hypothesized by the model, but also indicate convergence, with different possible levels of head resolution for these circuits. We describe above where these panels are referenced in the text.

      (4) We added a new paragraph in the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence” that better incorporates the results from this manuscript into the working model of grooming. This paragraph is shown below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      Aside from this summary of major concerns, the detailed recommendations are attached below.

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the quality and exhaustive body of work presented in this manuscript. I have a few comments that the authors may want to consider:

      (1) The authors motivate this study by posing that it would allow them to uncover whether the complex grooming behaviour of flies followed a parallel model of circuit function. It would have been nice to have been introduced to what the alternative model might be and what each would mean for organisation of the circuit architecture. Some guiding schematics would go a long way in illustrating this point. Modifying the discussion along these lines would also be helpful.

      We made several revisions to the manuscript that address this recommendation. Among these revisions, we added Figure 1 – figure supplement 1 that includes a working model for grooming. Please see above for a description of these revisions.

      (2) The authors mention the body of work that has mapped head bristles and described somatotopy. It would be useful to discuss in more detail what these studies have shown and highlight where the gaps are that their study fills.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (3) The dye-fills and reconstructions that are single colour could use a boundary to demarcate the SEZ. This would help in orienting the reader.

      We agree with Reviewer #1 that Figure 4 and its supplements could use some indicator that would orient the reader with respect to the dye filled or stochastically labeled neurons. The images are of the entire SEZ in the ventral brain, and in the case of some panels, the background staining enables visualization of the brain (e.g. Figure 4H,M,N. To help orient the reader in this region, we added a dotted line to indicate the approximate SEZ midline. This also enables the reader to more clearly see which of the BMN types cross the midline.

      Midline visual guides were added for Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      (4) The comparison between the EM and the fills/clones are not obvious. And particularly because they are not directly determined, it would be nice to have the EM reconstruction alongside the dye-fills. This would work very nicely in the supplementary figure with the multiple fills of the same bristles. I think this would really drive home the point.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (5) Are there unnoticed black error-bars floating around in many of the gray-scale images?

      The black bars were masking white scale bars in the images. We have removed the black bars and remade the images without scale bars. This was done for the following Figures: Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      Reviewer #2 (Recommendations For The Authors):

      (1) The only point in the paper I found myself going back and forth between methods/supp and text was when authors discuss about the clustering. I think it would help the reader if a few sentences about cosine clustering used for connectivity based clustering were included in the main text. Also, for NBLAST hierarchical clustering, it would help if some informed metrics could be used for defining cluster numbers (e.g. Braun et al, 2010 PLOS ONE shows how Ward linkage cost could be used for hierarchical clustering).

      Depending on where the cut height is placed on the dendrogram for cosine similarity of BMNs, different features of the BMN type postsynaptic connectivity are captured. As the number of clusters is increased (lower cut height), clustering is mainly among BMNs of the same type, showing that these BMNs have the highest connectivity similarity. As the number of clusters is reduced (higher cut height), BMNs innervating neighboring bristles on the head are clustered, revealing three general clusters corresponding to the dorsal, ventral, and posterior head. This reveals somatotopy based clustering among same and neighboring BMN types. The cut height shown in Figure 8 and Figure 8 – figure supplement 2 was chosen because it highlighted both of these features.

      The NBLAST clustering shows similar results to the connectivity based clustering with respect to neighboring and distant BMN types. As the number of clusters increases BMNs of the same type are clustered, and these types can be further subdivided into morphologically distinct subtypes. As the number of clusters is reduced, the clustering captures neighboring BMNs. Thus, neighboring BMN types showed high morphology similarity (and proximity) with each other, and low similarity with distant BMN types.

      Please see our responses to a Reviewer #3 critique below for further description of the clustering results.

      On the same lines it would help if the clustering dendrograms were included in the main figure.

      We thank Reviewer #2 for this comment. We have added the dendrogram to Figure 8A, a change that we feel makes this Figure much easier to understand.

      (2) It could help provide intuition if the authors revealed some of the downstream targets and their implication in explaining the behavioral phenotypes.

      While this will be the subject of at least two forthcoming manuscripts, we have added text to the present manuscript that provides insight into BMN postsynaptic targets. Our previous work (Hampel et al. 2015) described a mechanosensory connected neural circuit that elicits grooming of the antennae. While this previous study demonstrated that the Johnston’s organ mechanosensory neurons are synaptically and functionally connected with this circuit, our preliminary analysis indicates that it is also connected with BM-Ant neurons. We hypothesize that there are additional such circuits that are responsible for eliciting grooming of other head locations.

      To better highlight potential downstream targets in the manuscript, we now mention the antennal circuit in the Introduction. This text reads: In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.

      There is also text in the Discussion that addresses this Reviewer comment. It describes the antennal circuit and mentions the possibility that other similar circuits may exist. This can be found in the third paragraph of the section titled “Circuits that elicit aimed grooming of specific head locations”.

      (3) Authors find that opto activation of BMNs leads to grooming of targeted as well as neighboring areas. Is there any sequence observed here? i.e. first clean targeted area and then clean neighboring area? I wonder if the answer to this is something as simple as common post-synaptic targets which is essentially reducing the resolution of the BMN sensory map. Some more speculation on this interesting result could be helpful.

      We appreciate and agree with this point from Reviewer #2, and have tried to better emphasize the possible implications for grooming that the overlapping projections and connectivity among BMNs innervating neighboring bristles may have. This is now better addressed in the Results and Discussion sections. Below we highlight where this is addressed:

      (1) In the second paragraph of the Results section titled “Activation of subsets of head BMNs elicits aimed grooming of specific locations” we added text that suggests the possibility that grooming of the stimulated and neighboring locations could be due to the overlapping projections and connectivity. This text reads: This suggested that head BMNs elicit aimed grooming of their corresponding bristle locations, but also neighboring locations. This result is consistent with our anatomical and connectomic data indicating that BMNs innervating neighboring bristles show overlapping projections and postsynaptic connectivity similarity (see Discussion).

      (2) In the fourth paragraph of the Discussion section titled “A synaptic resolution somatotopic map of the head BMNs”, we added a sentence to the end of the fourth paragraph that alludes to further discussion of this topic. This sentence reads: This overlap may have implications for aimed grooming behavior. For example, neighboring BMNs could connect with common neural circuits to elicit grooming of overlapping locations (discussed more below).

      (3) In the fourth paragraph of the Discussion section titled “Circuits that elicit aimed grooming of specific head locations” there is a paragraph that mentions the possibility of mechanosensory convergence onto common postsynaptic circuits to promote grooming of the stimulated area, along with neighboring areas. This paragraph is below.

      We find that activation of specific BMN types elicits both aimed grooming of their corresponding bristle locations and neighboring locations. This suggests overlap in the locations that are groomed with the activation of different BMN types. Such overlap provides a means of cleaning the area surrounding the stimulus location. Interestingly, our NBLAST and cosine similarity analysis indicates that neighboring BMNs project into overlapping zones in the SEZ and show common postsynaptic connectivity. Thus, we hypothesize that neighboring BMNs connect with common neural circuits (e.g. antennal grooming circuit) to elicit overlapping aimed grooming of common head locations.

      (4) In the new second paragraph of the Discussion section titled “Parallel circuit architecture underlying the grooming sequence” we further discuss the issue of the BMN “sensory map. This paragraph is below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      (4) If authors were to include a summary table that shows all known attributes about BMN type as columns that could be very useful as a resource to the community. Table columns could include attributes like "bristle name", "nerve tract", "FlyWire IDs of all segments corresponding to the bristle class". "split-Gal4 line or known enhancer" , etc.

      We provided a table that includes much of this information after the manuscript had already gone out for review. We regret that this was not available. This is now provided as Supplementary file 3. This table provides the following information for each reconstructed BMN: BMN name, bristle type, nerve, flywire ID, flywire coordinates, NBLAST cluster (cut height 1), NBLAST cluster (cut height 5), and cosine cluster (cut height 4.5). Note that the driver line enhancers for targeting specific BMN types are shown in Figure 3I.

      Specific Points:

      Figure 4C-V:

      • I find it a bit difficult to distinguish ipsi- from contra-lateral projections. Maybe indicate the midline as a thin, stippled line?

      We thank the Reviewer #2 for this suggestion. We have now added lines in the panels in Figure 4C-V to indicate the approximate location of the midline. We also added lines to the Figure 4 – figure supplements as described above.

      I think this Fig reference is wrong "the red-light stimulus also elicited backward motions with control flies (Figure 6B,C, control, black trace, Video 5)." should be Fig 8B,C

      We have fixed this error.

      Reviewer #3 (Recommendations For The Authors):

      Introduction:

      Motivating this study in terms of understanding the neural mechanisms that execute the parallel model seems to overstate what you will achieve with the current study. If you want to motivate it this way, I suggest focusing on the grooming sequence of the head along (eyes, antennae, proboscis).

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that many of the revisions focus on the head grooming sequence. We also made minor revisions to the Introduction that further emphasize the focus on head grooming.

      Results:

      Figure 1. Please indicate that this is a male fly in either the figure title or in the figure itself.

      We added a male symbol to Figure 1A.

      Figure 3. Panel J is referenced in the main body text and in the figure caption, but there is no Fig 3J.

      Panel J is shown in the upper right corner of Figure 3. We realize that the placement of this panel is not ideal, but this was the only place that we could fit it. Additionally, the panel works nicely at that location to better enable comparison with panel C. We have revised the text in the Figure 3 legend to better highlight the location of this Figure panel: “Shown in the upper right corner of the figure are the aligned expression patterns of InOmBMN-LexA (red), dBMN-spGAL4 (green), and TasteBMN-spGAL4 (brown).”

      We also added text to a sentence in the results section entitled “Head BMNs project into discrete zones in the ventral brain” that indicates the panel location. This text reads: To further visualize the spatial relationships between these projections, we computationally aligned the expression patterns of the different driver lines into the same brain space (Figure 3J, upper right corner).

      Matching the BMNs to EM reconstructions: why cut the dendrogram at H=5? Would be better to determine cluster number using an unbiased method.

      To match the morphologically distinct EM reconstructed BMNs to their specific bristles, we relied on different lines of evidence, including NBLAST results (discussed more below), dye fill/stochastic labeling/driver line labeling matches, published morphology, nerve projection, bristle number, proximity to other BMNs, and postsynaptic connectivity (summarized in Figure 6 – figure supplement 3). The following Materials and methods section provides a detailed description of the evidence used to assign each BMN type in “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. In many cases, BMN type could be assigned with confidence solely based on morphological comparisons with our light level data (e.g. dye fills), in conjunction with bristle counts to indicate an expected number of BMNs showing similar morphology. Thus, the LM/EM matches and NBLAST clustering were largely complementary.

      The EM reconstructed BMNs were matched as particular BMN types, in part based on examination of the NBLAST data at different cut heights. NBLAST clustering of the BMNs revealed general trends at higher and lower cut heights (Figure 6 – figure supplement 1A, Supplementary file 3). The lowest cut heights included mostly BMNs of the same type innervating the same bristle populations, and smaller clusters that subdivided into morphologically distinct subtypes (see Supplementary file 3 for clusters produced at cut height 1). This revealed that BMNs of the same type tended to show the highest morphological similarity with each other, but they also showed intratype morphological diversity. Higher cut heights produced clusters of BMNs innervating neighboring bristles populations (e.g. ventral head BMNs), showing high morphological similarity among neighboring BMN types.

      We selected the cut height 5 shown in Figure 6 – figure supplement 1A,B because it captures examples of both same and neighboring type clustering. For example, it captures a cluster of mostly BM-Taste neurons (Cluster 16), and neighboring BMN types, including those from the dorsal head (Cluster 14) or ventral head (Cluster 15).

      Based on reviewer comments, we realized that the way we wrote the BMN matching section in the Results indicated more reliance on the NBLAST clustering than what was actually necessary, distorting the way we actually matched the BMNs. Therefore, we softend the first couple of sentences to place less emphasis on the importance of the NBLAST. We also indicated that the readers can find the resulting clusters at different cut heights, referring to Figure 6 – figure supplement 1A and Supplementary file 3. The first two sentences of the first paragraph in the Results section titled “Matching the reconstructed head BMNs with their bristles” now read:

      The reconstructed BMN projections were next matched with their specific bristle populations. The projections were clustered based on morphological similarity using the NBLAST algorithm (example clustering at cut height 5 shown in Figure 6 – figure supplement 1A,B, Supplementary file 3, FlyWire.ai link 2) (Costa et al., 2016). Clusters could be assigned as BMN types based on their similarity to light microscopy images of BMNs known to innervate specific bristles.

      The number of reconstructed BMNs is remarkably similar to what is expected based on bristle counts for each group except for lnOm. Why do you think there is such a large discrepancy there?

      We believe that there is a discrepancy between the number of reconstructed BM-InOm neurons and the number expected based on InOm bristle counts because these bristle counts were based on few flies and these numbers appear to be variable. We did not further investigate the numbers of InOm bristles in this manuscript because we only needed an estimate of their numbers, given that there is over an order of magnitude difference in the eye bristles versus any other head bristle population. Therefore, we could relatively easily conclude that the head BMNs were related to the InOm bristles, based on their sheer numbers and their morphology.

      Figure 6 - figure supplement 2N, please describe these panels better. Main text says the upper image is from lnOmBMN-LexA, but the figure legend doesn't agree.

      We have added text to the figure legend that now makes the contents of panel 2N clear to the reader. Further, we now indicate in the figure legend for each panel, the method used to obtain the labeled neurons (i.e. fill, MCFO, driver), to avoid similar confusion for the other panels.

      Figure 6 - figure supplement 4D. How frequently is there a mismatch between the number of BMNs for a given type across hemispheres?

      Although the full reconstruction of the BMNs on both sides of the brain was beyond the scope of this work, the BMNs on both sides have since been reconstructed and annotated (Schlegal et al. 2023). We plan to provide more analysis of BMNs on both sides of the brain in a forthcoming manuscript. However, the BMN numbers tend to show agreement on both sides of the brain. The table below shows a comparison between the two sides:

      Author response table 1.

      Figures 6 and 7. It would be helpful to include a reference brain in all panels that show cluster morphology. Without landmarks there is nothing to anchor the eye to allow the reader to see the described differences in BMN projection zones and patterns.

      While we apologize for not making this specific change, we have made revisions to other parts of the manuscript to better highlight the somatotopic organization among the BMNs (revisions described above). Please note that we now provide FlyWire.ai publicly available links that enable readers to view the BMN projections in 3D. They can also toggle a brain mesh on and off to provide spatial reference.

      "BMN somatotopic map": It would be helpful to show or describe in more detail what the unique branch morphology for each zone is. It is quite difficult to appreciate, as the groups also have a lot of overlap. Would the unique regions that the BMN groups innervate be easier to see if you plotted presynaptic sites by group? I am left unsure about whether there is a somatotopic map here.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that we did not examine the fine branch morphological differences between BMN types having overlapping projections. Showing these differences would require more extensive anatomical analysis that is beyond the scope of this work. For showing definitive somatotopy, we focused on the overt differences between BMNs innervating bristles at distant locations on the head.

      Overall the strict adherence to the parallel model impacts the interpretation of the data. It would be helpful for the authors to discuss which aspects of the current study are consistent with the parallel model and which results are not consistent.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      Discussion:

      "Circuits that elicit aimed grooming of specific head locations": In the previous paragraph you mention "BMN types innervating neighboring bristle populations have overlapping projections into zones that correspond roughly to the dorsal, ventral, and posterior head. The overlap is likely functionally significant, as cosine similarity analysis revealed that neighboring head BMN types have common postsynaptic partners. However, overlap between neighboring BMN types is only partial, as they show differing projections and postsynaptic connectivity." Then in this paragraph, you say, "How do the parallel-projecting head BMNs interface with postsynaptic neural circuits to elicit aimed grooming of specific head locations? Different evidence supports the hypothesis that the BMNs connect with parallel circuits that each elicit a different aimed grooming movement (Seeds et al., 2014)." The overlapping postsynaptic BMN connectivity seems in conflict with the claim that the circuits are parallel.

      We apologize for this confusion. We now better describe this apparent discrepancy between our results and the parallel model of grooming behavior. We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      We have made additional changes to the manuscript:

      (1) We added Supplementary file 2 that includes links for downloading the image stacks used to generate panels in Figure 1, Figure 2, Figure 3, Figure 4, and figure supplements for these figures. These image stacks are stored in the Brain Image Library (BIL). Rows in the spreadsheet correspond to each image stack. Columns provide information about each stack including: figure panels that each image stack contributed to, image stack title, DOI for each stack (link provides metadata for each stack and file download link), image stack file name, genotype of imaged fly, and information about image stack. References to this file have been made at different locations throughout the text and Figure legends. We also added a section on the BIL data in the Materials and methods entitled “Light microscopy image stack storage and availability”. Old Supplementary file 2 has been renamed Supplementary file 3.

      (2) We added a new reference for FlyWire.ai (Dorkenwald et al. 2023) that was posted as a preprint during the revision of this manuscript.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This is a strong manuscript about the existence of proteins coded by intracellular parasites (here Coxiella) that have evolved to parasitise the lipid transport machinery of their hosts. This is a first in that the parasite protein acts at a distance from the parasite itself, manipulating two of the host organelles - and not acting at their site of contact with PVs. There is considerable research into one protein and its effect when expressed by itself.

      Despite all the advances there are a couple of areas where the manuscript can be improved, and a few extra fairly straightforward experiments added about the amphipathic helix. Even though these are unlikely change the overall message, they would make the story more complete.

      Major points

      More details are required about the amphipathic helix. Check that the AH does target LDs by expression of the AH alone in a GFP chimera +/- oleate and then mutagenesis. Also show the AH in a helical wheel projection (eg by Heliquest) and say if it aligns with similar AHs in homologs (see my point below)

      Fig 1B: In infected cells, do the affected LDs tend to be close to the PVs?

      Also in Fig 1B: highlight small KDEL+ve ER rings around LDs here. Study whether LDs have these in infected cells without the confounder (?artefact) of EPF1 over-expression

      Fig 2A the ER looks quite different here from Fig 1B, even at t0. Grossly the strands are spaced wider apart. In detail there are no rings around LDs. Can the authors explain this? Which morphology is common, especially in cells early in infection without co-expressed protein?

      Fig 6 & Line 237: "As the N-terminal region of CbEPF1 is undefined": I suggest that the authors could do more here. At minimum change model to highlight the strong probability that the N term is a globular domain that functions at the LD ER interface. (What are three other unidentified LD proteins? I suggest omitting them).

      Although the Alphafold prediction for EPF1 is low confidence only, in a few minutes of BLAST searching I found the homolog A0A1J8NR10_9COXI (also FFAT+ve) which has a moderately confident structural prediction for its N terminus. This model has a quite large internal hydrophobic cavity, indicating lipid transfer capability and function similar to known LTPs. This means that action as a "tether" possibly results from experiments with viral promoters (see minor point on terminology).

      Minor:

      Fig 2B: add more arrowheads/arrows to fit legend (says they are both multiple)

      FFAT selectivity for MOSPD2: say if this fits the di Mattia or (as appears likely) it extends the known differences between VAPA/B and MOSPD2. Also say if VAPA is expected to behave as VAPB

      Explain how "Mutations in the CbEPF1 FFAT motif(s) did not influence CbEPF1-GFP localization to either the host ER (Supplementary Fig. 1)". In F3mt this shows that EPF1 has a way to target ER other than FFAT/VAP. Discuss if that is via AH insertion in ER.

      Also, the (admittedly low) level of ER targeting is possibly slightly reduced by F3mt, as shown by greater GFP in the nucleus in the single cells shown. If this is a feature of the whole field of cells, it implies that the FFATs normally work with the AHs to target EPF1 to the ER.

      "clustering" LDs w F3mt: could this indicate dimer formation by CbEPF1? Note: to me it appears wrong to describe fig 4A as showing ER exclusion. LD proximities to each other dominate. It's not 100% clear that LDs cluster as their proximities are not universal: "LD-LD interactions" may be (very) weak.

      Fig 5: can levels of EPF1 here be compared to those in cells undergoing natural infection(approximate comparison by qPCR better than nothing if no antibodies are available)? Fig 5a: would it be possible to increase the number of cells counted to attempt to make the reduced number of LD in F3mt significant?

      Minor

      Line 226: no sequence homology: misses the point- there is the common feature of an AH

      Issue to be discussed, as probably too difficult to experiment on: when EPF1 is on the ER does it engage vap only weakly (implying a means to mask its motifs), since if the interaction is strong vap is then unable to bind other partners?

      Line 245: "MOSPD2, a sole VAP that is known to localize on LD surfaces" (worth citing Zouiouich again here). Do the cells/tissues infected by Coxiella express MOSPD2?

      Line 259/260: this "suggestion" about cholesterol should be toned down. It is a speculation that could be tested in future, but the data here do not suggest it.

      'Tether' this word implies more than just bridging but also a role in the physical formation of the contact. Since EPF1 most likely has an LTP domain, it seems linguistically confusing to refer to it as a tether, especially since the experiments that physically later LD-ER contact involve probable over-expression.

      Discuss whether it is 100% certain that EPF1 is in the host cytosol or whether some experiment(s) at a future date (proteomic/western blotting) will be needed to make that conclusion 100% secure.

      Referees cross-commenting

      COMMENT 1

      I realise that both reviewer 1 and reviewer 3 have considered this MS carefully, but I think that their reviews could be improved in some respects. I will add two comments, one for each of the other reviews.

      Reviewer 1. The review poses multiple questions to the authors suggesting that answering these questions experimentally would strengthen the paper. Some of the points seem to misunderstand what is the accepted standard for membrane cell biological research into membrane contact sites. While it might be that the authors can rebut these points, I think it is preferable to use Cross Commenting as an opportunity to address these issues beforehand.

      Major Comment 1: CbEPF1 and ER-LD contact

      Looking at endogenous proteins: I wondered about the same point, but I concluded that this is not likely to be possible in the scope of this submission. If it were possible then I guess the authors would have attempted it. Looking on Google Scholar I could find no example of an endogenous Coxiella proteins being tagged in the bacterial genome. So the only way to find the portion is via an antibody. Assuming the authors do not have one, I do not think we should ask for one at this stage in the publication process.

      Electron microscopy: the reviewer is incorrect to say that this is necessary. It may be the gold standard, but it is a huge amount of extra work. Furthermore it is not at all necessary when the protein in question localises clearly to the interface between organelles identified by confocal microscopy.

      Can a specific CbEPF1 domain be identified? Here a Amphipathic Helix has been identified, but the lack of dissection of that region by the authors explains this question by the reviewer, which is also shared by Reviewer 3. I agree with the implication that more should be done to dissect that.

      Major Comment 2: CbEPF1 FFAT motifs and VAP binding

      Are the two FFAT motifs redundant or synergistic? I would say that the authors have addressed that to a reasonable extent

      CbEPF1 binding specificity towards a VAP/MOSPD2 Ditto

      Major Comment 3: LD clustering

      Since this is an effect of mutated protein only, I think that the 3 questions posed at the end here need only be addressed in Discussion.

      Major Comment 4: CbEPF1-mediated increase in LD number and size

      less LD upon expression of F1mt or F2mt, compared to WT: this seems wrong. The numbers are the same. The comment about IF images are unjustified as they have been quantified and do show a difference. I agree that the biological relevance is unclear, and that this might be addressed. That would require making a mutant Coxiella strain. While that would make a big different to this work, my feeling is that this is well over a year's work.I would be guided by the authors on that and I would not suggest it as required for this MS.

      De novo LD production at the ER is unlikely: This statement is ill-considered as the FFAT motifs ARE required (Fig 5). Furthermore, in all systems ever reported de novo LD production takes place at the ER, so any alternative would be quite extraordinary.

      Altogether, strengthening this aspect of the study: In my view, this area does not need more work and it would not be constructive to ask for more.

      Major Comment 5: Functional relevance

      assessing the phenotype of a Coxiella CbEPF1 mutant I agree that this would be good, but it mightn't be feasible within the confines of this one paper. In the various projects that have made transposon mutants of Coxiella, has a strain been made that affects EPF1? If not, then the authors should state this and discuss it as work for the future. The reviewers cannot expect any experiments!

      Is VAP required for Coxiella intracellular growth/vacuole maturation? On the surface this suggestion seems to offer an experimental route to understanding EPF1. However, VAP binds to >100 cellular proteins, many relating to lipids traffic and a considerable number of these already lcoalised to lipids droplets (ORP2, MIGA2, VPS13A/C). It is therefore unlikely that such an experiment would be interpretable, and I recommend that this request be reconsidered.

      Are LD formation induced upon infection? Are ER-LD contact increased upon infection? These are very reasonable ideas and the results would be interesting additions to this paper.

      COMMENT 2 I have given one set of comments already. Here are my comments for Reviewer 3.

      The review makes a few assumptions that I question. While it might be that the authors can rebut these assumptions, I think it is preferable to use Cross Commenting as an opportunity to address these issues beforehand.

      Major Point 1: What is surprising is that the BFP-KDEL signal is also localizing to the LD surface: "Surprising" is misguided, as it seems to deny the probability that there is a class of proteins that sit at organelle interfaces binding to both partners simultaneously. Maybe the reviewer means "significant" here, in which case I would agree.

      The authors must perform LD isolation the reviewer is incorrect to say that this must be done. It is a huge amount of extra work. Furthermore it is not at all necessary when the protein in question localises clearly to the structures, and its may not even work as the protein may need a reasonably high general concentration to avoid gradual dissociation (wit any re-association) during organelle purification.

      what features of the protein enable its LD binding? Here an Amphipathic Helix has been identified, but the lack of dissection of that region by the authors explains this question by the reviewer, which is also shared by Reviewer 1. I agree with the implication that more should be done to dissect that.

      Major Point 2: Quantitative image analysis:

      Mander's Colocalization analyses with Costes correction are required No. The images in Figure 4 speak for themselves.

      Please show the LD phenotype of untransfected, and CbEPF1-GFP transfected cells also This s a good idea.

      provide a means to quantify the clustering of LDs Unnecessary. Not all findings need to be quantified.

      Major Point 3:

      Data depends largely on overexpression of the protein in uninfected cells. I agree

      What is the localization of the protein in infected cells? I wondered about the same point, but I concluded that this is not likely to be possible in the scope of this submission. If it were possible then I guess the authors would have attempted it. Looking on Google Scholar I could find no example of an endogenous Coxiella proteins being tagged in the bacterial genome. So the only way to find the portion is via an antibody. Assuming the authors do not have one, I do not think we should ask for one at this stage in the publication process.

      What happens to ER-LD contacts upon infection with C. burnetii? This is a very valid question, and answering it would not only strengthen the manuscript but should be achievable in 1-3 months.

      Significance

      This work takes a reasonably big step towards uncovering how parasites have mimicked the molecular machinery of contact sites, here in the context of ER-LD interactions and tantalizingly suggestive of lipid transfer at that contact site (although hard to get strong evidence for that at this stage). This provides yet more evidence for the conservation and overall importance to cells of lipid transfer at contact sites, as well as reminding us of the ability of parasites to attack every aspect of cell function.

    1. Reviewer #1 (Public Review):

      Summary:

      The motivating questions are an accurate reflection of the current state of knowledge surrounding striatal pathway function. The comparisons of pathway function across striatal subregion, activation & inhibition, and task context are laudable and extremely important for advancing the subfield. Had these manipulations, to the largest extent possible been performed in single animals (e.g. activate dSPNs of DMS or DLS in the same mouse across the 3 tasks), this would have significantly strengthened the impact and conclusions that could be drawn by making this set of studies even more so internally consistent and directly comparable. While this is no longer possible, a conceptually related and fantastic contribution to the subfield (and likely beyond in terms of Opto manipulations of brain areas) would be to directly demonstrate that within their studies their DMS pathway manipulations do not impact nearby DLS activity (and vice versa). This is a significant and non-essential request. More feasibly and reasonably, it would be fantastic and strengthen the conclusions here to more fully detail their opsin expression patterns in DMS vs DLS groups and perhaps attempt to relate individual opsin profiles and fiberoptic targeting with behavioral outcomes across tests.

      Strengths:

      A comprehensive and paired comparison of inhibition and activation of striatal pathways across subregions and tasks is a very important and meaningful step towards reconciling contradictory results on striatal pathway function that are observed across labs (who typically focus on one subregion, one task setting, and often do not directly report comparisons of activation and inhibition).

      Weaknesses:

      Figure 1A - the example DMS vs DLS opsin expression and fiber targeting are not terribly convincing that the manipulations will be specific to each subregion (the example in Figure 2A is a little better but I have a similar concern still). The specificity of these manipulations is key to interpretation and conclusions and I strongly feel they should be strengthened here. The best evidence would be direct neural recordings (light in DMS, no effect in DLS, and vice versa), but this is a tall ask and not expected. The next best option, which is readily feasible, is to show not only fiberoptic targeting summaries (as in Figure 1A, Figure 2A) but also a summary of opsin spread for all animals (especially given the two examples appear to have significant spread across DMS and DLS). It would be of great benefit to the field to have these in the Allen Common Coordinate Framework. It would also be fine and useful to utilize the authors' current classical histological atlas alignment methods (e.g. Paxinos pdf). These histological summary figures would also benefit from being larger and more visible (perhaps as separate supplemental figures associated with the main figures).

      Related to the above, it is a concern that the classic view is supported or not because of individual variations in virus/fiber targeting to striatal subregions which likely have greater granularity than the traditional dorsal medial vs lateral (e.g. Hunnicutt et al 2016, Foster et al 2021, Hintiryan et al 2016). Although there may not be enough animals or variation in targeting in the present study to find meaningful relationships, it would strengthen the paper and be a great benefit to the field to know whether for key findings if the strength of behavioral effects correlated with anterior/posterior or medial/lateral or dorsal/ventral fiberoptic coordinates (or the volume of opsin expression profiles).

      Conceptually, a clear new idea or integrative interpretation of prior work (nor even the large body of results within this work) comes to the fore, save for the already appreciated fact that the classic view of opposing pathways is sometimes supported and sometimes not. Two tangible suggestions that I believe would facilitate the influence of this study - (1) can the authors more thoughtfully bridge the logical steps in their results sections and the prior context around them (some topic sentences jump right into results, e.g. line 195: "The inhibition experiment showed), and (2) in discussion, rather than emphasizing when/where the classic view is supported and not, more content on precisely why would be helpful. Some questions more specifically, if DMS/DLS pathway activation/inhibition is *mostly* oppositely appetitive/aversive, what does that mean in the context of spontaneous or reward-guided locomotion? Self-initiated pathway activation/inhibition is in part learned (with very intriguing differences across pathways in the expression across learning) - how should we think about striatal pathway function with regards to learning, spontaneous/innate behaviors, vs over-trained behaviors? When the classic view fails in the dorsal striatum - why? And is a complimentary "model" an actual alternative concept, a distinct mode of circuit function, or just a negative result on the classic view?

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their valuable comments, which definitely make our story stronger.

      2. Description of the planned revisions

      Reviewer 1

      Comments:

      No data are shown from the genome-wide screening approach, including the common regulators of KRAS and HRAS. Information about how imaging data were processed and analysed is missing. A final table of 8 selected factors with phosphatase activity is presented without providing further insight about the selection criteria and other factors.

      This information will be included in the revised manuscript. In the subsequent characterization via image-based quantification of GFP-KRAS membrane localization, a Manders´ coefficient was calculated. A respective chapter in the methods section on how this was done is missing.

      This information will be provided in the revised manuscript. I would be happy to see the following analyses to strengthen the dataset:

      • Reconstitution experiments and further validation to show that it is dependent on the enzymatic activity of MTMRs.

      MTMR3 knockdown (KD) cells will be rescued with wildtype (WT) MTMR3 or the phosphatase mutant MTMR3 (C413S, PMID: 11676921). MTMR4 KD cells will be rescued with WT MTMR4 or the phosphatase mutant MTMR4 (C407S, PMID: 20736309). In these cells, the PM localization of KRAS and PtdSer will be examined by confocal and electron microscopy. - Additive effect upon depletion of multiple MTMRs? Are they functionally co-operative?

      MTMR3 and 4 KD cells will be rescued with WT MTMR4 and 3, respectively, and the PM localization of KRAS and PtdSer will be examined by confocal and electron microscopy. - Signalling analysis is very limited (Fig. 5). Do the authors detect any defects in K-RAS driven downstream signaling in these cells upon depletion of MTMRs.

      Human pancreatic ductal adenocarcinoma (PDAC) cell lines that harbor oncogenic mutant KRAS and their growth is KRAS signaling-dependent (MiaPaCa2 and AsPC1), and PDAC cell line harboring WT KRAS and their growth is KRAS signaling-independent (BxPC3) will be infected with lentivirus expressing shRNAs against MTMR 2, 3, 4 or 7. Their growth (proliferation assay) and KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) will be measure. Reviewer 2

      Major comments

      The unbiased siRNA screen used to identify proteins that impact KRAS membrane localization was a very nice approach to identify MTMR proteins. Although there is a clear phenotype of KRAS mislocalization associated with knockdown of the various MTMR proteins, the data provided does not prove a causational role for the MTMR proteins in maintaining PtdSer content, nor KRAS localization, at the PM. The current data does not provide a mechanism by which MTMR proteins are influencing this process, but rather speculates using existing literature that it is the loss in MTMR 3-phosphotase activity that leads to decreased PtdSer in the membrane. There is a series of conversions and exchanges that act upon PI3P (the substrate of MTMR proteins) and PI to generate PtdSer in the PM; thus, it is a dynamic process that is influenced by a variety of different proteins and transporters [3, 4, 5, 6]. To prove their single-protein-driven hypothesis, the authors should clone and express a mutant MTMR protein construct that contains an inactive phosphatase catalytic domain, to prove that it is indeed MTMR's generation of PI (which is further converted into PI4P) in the membrane that is responsible for maintaining PtdSer content and KRAS localization. Without this, there is not enough evidence to support this claim.

      MTMR3 knockdown (KD) cells will be rescued with wildtype (WT) MTMR3 or the phosphatase mutant MTMR3 (C413S, PMID: 11676921). MTMR4 KD cells will be rescued with WT MTMR4 or the phosphatase mutant MTMR4 (C407S, PMID: 20736309). In these cells, the PM localization of KRAS and PtdSer will be examined by confocal and electron microscopy. In addition, the authors speculate that ORP5 is a critical intermediate in this process, and that the loss in PI4P/ORP5 at the PM following MTMR knockdown is responsible for the decrease in PtdSer at the PM. The authors should knockdown ORP5 in MTMR-wildtype cells, since it is downstream of their proposed mechanism, and see whether this leads to comparable reductions in PtdSer levels and KRAS mislocalization at the PM. This would confirm ORP5 as having a major role in this setting and would support the initial mechanistic hypothesis. These experiments are imperative to forming an appropriate conclusion, especially since some of their current data contradicts their mechanistic hypothesis: the authors identify a decrease in whole cell PtdSer content, not just PM PtdSer content, when MTMR proteins are knocked down. Based on this result, one would predict that a secondary or supporting mechanism must exist that contributes to a reduction in whole cell PtdSer content, which likely contributes to its loss at the PM as well. The authors describe in line 360 how "previous work has shown that PM PI4P depletion indirectly blocks PtdSer synthase 1 and 2 activities," to explain this reduction in total cell levels of PtdSer. The authors should look at PtdSer synthase 1 and 2 activities in the presence of MTMR knockdown, as the loss in PtdSer at the PM may rely more heavily on synthase activity than ORP-dependent transfer of PtdSer.

      Investigating the PM localization of KRAS and PtdSer after silencing ORP5 in MTMR WT mammalian cell lines has been published (PMID: 31451509 and 34903667). In these studies, silencing ORP5 1) reduces the levels of PtdSer and KRAS from the plasma membrane (PM), 2) reduces KRAS signal output, 3) blocks the growth of KRAS-dependent PDAC in vitro and in vivo. These studies have been appropriately cited in our manuscript in lines 82 and 277. Although the c. Elegans model that was used to investigate downstream let-60 (RAS ortholog) activity through a multi-vulva phenotype is quite intriguing, it is more critical to assess downstream RAS pathway activation, especially in the human colorectal adenocarcinoma or the human mammary gland ductal carcinoma cell lines. Not only would this line of questioning provide a higher significance and increase the clinical applicability of these findings, but it is also crucial to support the author's claim that MTMR knockdown can influence mutant KRAS activity. Although small changes in KRAS localization to the PM can have significant effects on downstream signaling, these effects need to be measured and confirmed in this setting. The authors should perform western blots to assess the activation of both the PI3K and MAPK pathway in the MTMR knockdown cell lines.

      Human pancreatic ductal adenocarcinoma (PDAC) cell lines that harbor oncogenic mutant KRAS and their growth is KRAS signaling-dependent (MiaPaCa2 and AsPC1), and PDAC cell line harboring WT KRAS and their growth is KRAS signaling-independent (BxPC3) will be infected with lentivirus expressing shRNAs against MTMR 2, 3, 4 or 7. Their growth (proliferation assay) and KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) will be measure. In addition to this, it might be important to know whether there are any changes in the levels of the KRAS protein itself, as recycling/transport pathways may be impacted by its lack of recruitment to the plasma membrane.

      Total KRAS protein expression will be measured in MTMR KD cell lines. Finally, the authors show that proliferation is inhibited by MTMR knockdown as a readout of RAS activity. The authors should also assess the levels of cell death, as the inhibition of mutant KRAS in cancer cells would likely lead to cell death. The authors do not describe why reducing any one of the MTMR proteins alone is sufficient to deplete the PM of PtdSer. This sort of discussion is important for understanding compensatory or regulatory mechanisms in place between the MTMR proteins, as this may influence PtdSer levels at the PM. For example, it has been shown that MTMR2 can stabilize MTMR13 on membranes. Do the levels, stability, or localization of the other MTMR proteins change when one specific MTMR is knocked down? Is this why we see an effect on PtdSer in any one of the knockdowns? The authors should at the very least provide western blots for each of the MTMR proteins discussed in the presence of each individual MTMR knockdown.

      MTMR3 knockdown (KD) cells will be rescued with WT MTMR3 or the phosphatase mutant MTMR3 (C413S, PMID: 11676921). MTMR4 KD cells will be rescued with WT MTMR4 or the phosphatase mutant MTMR4 (C407S, PMID: 20736309). In these cells, the PM localization of KRAS and PtdSer will be examined by confocal and electron microscopy. In addition, we will measure endogenous MTMR 2/3/4/7 proteins levels in the presence of each individual MTMR KD by immunoblotting. In addition to the above experiments, the MTMR hairpins should be expressed in a secondary or tertiary cell line to prove that these events are not specific to the current model used. Since their current human mammary gland ductal carcinoma cell line overexpresses a mutant KRAS-GFP construct, perhaps doing similar experiments in a cancer cell line that already expresses an endogenous mutant KRAS might provide a better model.

      Human pancreatic ductal adenocarcinoma (PDAC) cell lines that harbor oncogenic mutant KRAS and their growth is KRAS signaling-dependent (MiaPaCa2 and AsPC1), and PDAC cell line harboring WT KRAS and their growth is KRAS signaling-independent (BxPC3) will be infected with lentivirus expressing shRNAs against MTMR 2, 3, 4 or 7. Their growth (proliferation assay) and KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) will be measure. Although this protein would not include a GFP-tag, other ways of visualizing its localization at the PM (such as immunofluorescent staining) could be used to confirm its localization there.

      The anti-KRAS antibody for IF has not been reported to my knowledge. In addition, the effects on downstream RAS signaling could be measured through western blot of PI3K and MAPK pathways.

      Human pancreatic ductal adenocarcinoma (PDAC) cell lines that harbor oncogenic mutant KRAS and their growth is KRAS signaling-dependent (MiaPaCa2 and AsPC1), and PDAC cell line harboring WT KRAS and their growth is KRAS signaling-independent (BxPC3) will be infected with lentivirus expressing shRNAs against MTMR 2, 3, 4 or 7. Their growth (proliferation assay) and KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) will be measure. Supplemental Figure 4 is incorrectly referred to in the text as Supplemental Figure 3 (line 257-258). The text reads, "Confocal microscopy further demonstrates that HRASG12V cellular localization is not disrupted after silencing MTMR 2/3/4/7 (Fig. S3)" but Figure S3 is an EM image of PM basal sheets from T47D cells expressing GFP-KRASG12V. Supplemental Figure 4 shows that mutant HRAS is unaffected by the various MTMR knockdowns.

      They will be labeled correctly in the revised manuscript. Since the authors show decreased proliferation in mutant KRAS cells following MTMR knockdown, the authors should also investigate any changes to proliferation rates in mutant HRAS cell lines following MTMR knockdown. This data is necessary to prove that MTMR-driven changes in downstream RAS signaling are specific to mutant KRAS and not mutant HRAS.

      Cell proliferation assay will be performed using MTMR 2/3/4/7-silenced T47D cell lines stably expressing oncogenic mutant HRAS (HRASG12V) to address this questions. It may also be important for the authors to also show any effects on wildtype RAS localization to the PM when MTMR-2,-3,-4, and -7 are knocked down, to show whether this is a oncoprotein-specific event.

      Cells expressing the truncated mutant KRAS, which contains the minimal membrane anchor and does not have G-domain will be infected with lentivirus expressing shRNA against MTMR 2/3/4/7, and their localization will be examined. The representative images chosen for Figure 4 diminish the reliability of the data, as it is difficult to see a visible change in the PI3P probe between the control and MTMR knockdown cells in these images. Since the authors rely on the Mander's coefficient and the number of gold particles throughout much of the paper, having the same conclusion quantitatively but not qualitatively for these assays is confusing. Perhaps the authors should elaborate on whether MTMR knockdown has a stronger effect on PtSer and KRAS PM presence than PI3P PM presence.

      We will include the discussion in the revised manuscript. They should also describe their method for identifying early endosomes, since they switch back and forth between describing the content of the PM and of early endosomes, such as in Figure 1 and Figure 4.

      We will include the information in the revised manuscript. Minor comments:

      An additional experiment that may add another layer of clinical applicability would be the use of an MTMR inhibitor in this cell line, to see whether similar effects can be achieved pharmacologically [7]. This would provoke other researchers to investigate MTMR inhibitors in vitro and in vivo to assess the effect on mutant KRAS cancers.

      • This is an important point, but while vanadate, a general phospho-tyrosine phosphatase (PTP) inhibitor, has been reported to inhibit myotubulin, a family member of MTMR (PMID: 8995372 and 1943774), there are no commercially available MTMR-specific inhibitors. Using vanadate to inhibit MTMR proteins will produce non-specific effects by blocking other PTPs. The inclusion of cell lines that express KRAS proteins of different mutational statuses would be extremely interesting, as KRAS' orientation within the plasma membrane has been shown to be altered by these mutations. This fact should potentially be considered when choosing a secondary or tertiary cell line to do additional experiments in, but it is not necessary for the authors to elaborate on how MTMR proteins may impact different KRAS mutants for the scope of this project.

      For the aforementioned experiments using human KRAS-dependent and -independent PDAC cell lines, we will use MiaPaCa2 (KRASG12C) and AsPC1 (KRASG12D). Reviewer #3

      *Major comments: *

      One of the two main manuscript claims indicates that KRAS12V "function" is impaired upon MTMR knockdown. While this is an obvious phenotype expected by mislocalizing KRAS from the inner PM it is not sufficiently demonstrated in the current version of the manuscript. Western blots of at least MAPK and PI3K signalling following MTMR knockdown in KRAS-dependent cell lines should be included. In addition to the T47D cells used in the manuscript, it would be ideal to include a KRAS-mutant cell line from tumour types where KRAS mutations are more frequent that in breast.

      • Human pancreatic ductal adenocarcinoma (PDAC) cell lines that harbor oncogenic mutant KRAS and their growth is KRAS signaling-dependent (MiaPaCa2 and AsPC1), and PDAC cell line harboring WT KRAS and their growth is KRAS signaling-independent (BxPC3) will be infected with lentivirus expressing shRNAs against MTMR 2, 3, 4 or 7. Their growth (proliferation assay) and KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) will be measure. Since the MTMR dependent phenotypes are mutant-KRAS specific it would be interesting to study the resulting phenotypes in HRAS-mutant cell line.

      Cell proliferation assay will be performed using MTMR 2/3/4/7-silenced T47D cell lines stably expressing oncogenic mutant HRAS (HRASG12V) to address these questions.

      **Referee cross-commenting**

      After reading the reviews of my colleagues I think there is a clear agreement on the need to further substantiate that KRAS membrane mis-localization is indeed affecting oncogenic output. The use of other KRAS addicted and non-addicted models would further enhance this analysis.

      Likewise, the other two reviewers request experimental evidences to validate the role of MTMR enzymatic activity in the process. This is a pertinent request that I failed to put forward. Suggestions include the use of reconstitution experiments catalytically dead mutants. Also, the use of MTMR small molecule inhibitors is proposed. If those exist with sufficient specificity this would indeed be appropriate to perform.

      Experiments addressing these comments have been described above.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      N/A

      • *

      4. Description of analyses that authors prefer not to carry out

      *Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. *

      Reviewer 2

      R2 suggests to investigate PtdSer synthase 1 and 2 activities in presence of MTMR knockdown, as the loss in PtdSer at the PM may rely more heavily on synthase activity than ORP-dependent transfer of PtdSer.

      Although it is intriguing to examine the effect of MTMR loss on the activities of PtdSer synthase 1 and 2, our lab does not have resources/techniques to carry out the experiment. * *

      The results of this paper rely heavily on one experimental technique, which is calculating a Mander's coefficient and counting the co-localization of the probe of interest with the CellMask stain of the plasma membrane. How this coefficient is derived is explained in appropriate detail in the methods section of this manuscript; however, a secondary route of identifying these changes in membrane constituents would greatly enhance the paper's conclusions. This would eliminate any doubt surrounding the accuracy of the technique, since so much of the data relies on one experimental output.

      In addition to Manders' coefficient for examining the colocalization of KRAS and LactC2 (the PtdSer probe) to propose KRAS/PS redistribution to endomembranes after MTMR loss. To complement this, we also performed quantitative EM to demonstrate the PM depletion of KRAS and PtdSer from the inner PM leaflet. We believe these two techniques would appropriate to investigate KRAS/PtdSer PM depletion and cellular re-distribution. * *

      Reviewer 3

      To further support the conclusions, oncogenic signalling should be studied in the C.elegans model by immunofluorescence of immunohistochemistry. Furthermore, although not strictly required to support the author's claims, it would be interesting to elucidate whether the inhibition of the multivulva phenotype upon MTMR knockdown in vivo results as a consequence of cell death.

      Our collaborator for C. elegans study does not have resources to carry out the proposed IF and IHC experiment. Instead, we will measure KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) and the growth of KRAS-dependent PDAC after MTMR loss. These experiments would be more clinically and physiologically relevant.

    1. “detailed specifications”

      I think it's important to note that task analysis is ONLY used when learners require "detailed specification." Task analysis is important IF we assume that the ID's job is to tell the learner exactly what needs to be done and how. Task analysis is essential when using a behavioristic learning model. However, it may not be as applicable to more constructivist learning models.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study identifies the mitotic localization mechanism for Aurora B and INCENP (parts of the chromosomal passenger complex, CPC) in Trypanosoma brucei. The mechanism is different from that in the more commonly studied opisthokonts and there is solid support from RNAi and imaging experiments, targeted mutations, immunoprecipitations with crosslinking/mass spec, and AlphaFold interaction predictions. The results could be strengthened by biochemically testing proposed direct interactions and demonstrating that the targeting protein KIN-A is a motor. The findings will be of interest to parasitology researchers as well as cell biologists working on mitosis and cell division, and those interested in the evolution of the CPC.

      We thank the editor and the reviewers for their thorough and positive assessment of our work and the constructive feedback to further improve our manuscript. Please find below our responses to the reviewers’ comments. Please note that the conserved glycine residue in the Switch II helix in KIN-A was mistakenly labelled as G209 in the original manuscript. We now corrected it to G210 in the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The CPC plays multiple essential roles in mitosis such as kinetochore-microtubule attachment regulation, kinetochore assembly, spindle assembly checkpoint activation, anaphase spindle stabilization, cytokinesis, and nuclear envelope formation, as it dynamically changes its mitotic localization: it is enriched at inner centromeres from prophase to metaphase but it is relocalized at the spindle midzone in anaphase. The business end of the CPC is Aurora B and its allosteric activation module IN-box, which is located at the C-terminal part of INCENP. In most well-studied eukaryotic species, Aurora B activity is locally controlled by the localization module of the CPC, Survivin, Borealin, and the N-terminal portion of INCENP. Survivin and Borealin, which bind the N terminus of INCENP, recognize histone residues that are specifically phosphorylated in mitosis, while anaphase spindle midzone localization is supported by the direct microtubule-binding capacity of the SAH (single alpha helix) domain of INCENP and other microtubule-binding proteins that specifically interact with INCENP during anaphase, which are under the regulation of CDK activity. One of these examples includes the kinesin-like protein MKLP2 in vertebrates.

      Trypanosoma is an evolutionarily interesting species to study mitosis since its kinetochore and centromere proteins do not show any similarity to other major branches of eukaryotes, while orthologs of Aurora B and INCENP have been identified. Combining molecular genetics, imaging, biochemistry, cross-linking IP-MS (IP-CLMS), and structural modeling, this manuscript reveals that two orphan kinesin-like proteins KIN-A and KIN-B act as localization modules of the CPC in Trypanosoma brucei. The IP-CLMS, AlphaFold2 structural predictions, and domain deletion analysis support the idea that (1) KIN-A and KIN-B form a heterodimer via their coiled-coil domain, (2) Two alpha helices of INCENP interact with the coiled-coil of the KIN-A-KIN-B heterodimer, (3) the conserved KIN-A C-terminal CD1 interacts with the heterodimeric KKT9-KKT11 complex, which is a submodule of the KKT7-KKT8 kinetochore complex unique to Trypanosoma, (4) KIN-A and KIN-B coiled-coil domains and the KKT7-KKT8 complex are required for CPC localization at the centromere, (5) CD1 and CD2 domains of KIN-A support its centromere localization. The authors further show that the ATPase activity of KIN-A is critical for spindle midzone enrichment of the CPC. The imaging data of the KIN-A rigor mutant suggest that dynamic KIN-A-microtubule interaction is required for metaphase alignment of the kinetochores and proliferation. Overall, the study reveals novel pathways of CPC localization regulation via KIN-A and KIN-B by multiple complementary approaches.

      Strengths:

      The major conclusion is collectively supported by multiple approaches, combining site-specific genome engineering, epistasis analysis of cellular localization, AlphaFold2 structure prediction of protein complexes, IP-CLMS, and biochemical reconstitution (the complex of KKT8, KKT9, KKT11, and KKT12).

      We thank the reviewer for her/his positive assessment of our manuscript.

      Weaknesses:

      • The predictions of direct interactions (e.g. INCENP with KIN-A/KIN-B, or KIN-A with KKT9-KKT11) have not yet been confirmed experimentally, e.g. by domain mutagenesis and interaction studies.

      Thank you for this point. It is true that we do not have evidence for direct interactions between KIN-A with KKT9-KKT11. However, the interaction between INCENP with KIN-A/KIN-B is strongly supported by our cross-linking IP-MS of native complexes. Furthermore, we show that deletion of the INCENPCPC1 N-terminus predicted to interact with KIN-A:KIN-B abolishes kinetochore localization.

      • The criteria used to judge a failure of localization are not clearly explained (e.g., Figure 5F, G).

      As suggested by the reviewer in recommendation #14, we have now included example images for each category (‘kinetochores’, ‘kinetochores + spindle’, ‘spindle’) along with a schematic illustration in Fig. 5F.

      • It remains to be shown that KIN-A has motor activity.

      We thank the reviewer for this important comment. Indeed, motor activity remains to demonstrated using an in vitro system, which is beyond the scope of this study. What we show here is that the motor domain of KIN-A effectively co-sediments with microtubules and that spindle localization of KIN-A is abolished upon deletion of the motor domain. Moreover, mutation of a conserved Glycine residue in the Switch II region (G210) to Alanine (‘rigor mutation’, (Rice et al., 1999)), renders KIN-A incapable of translocating to the central spindle, suggesting that its ATPase activity is required for this process. To clarify this point in the manuscript, we have replaced all instances, where we refer to ‘motor activity’ of KIN-A with ‘ATPase activity’ when referring to experiments performed using the KIN-A rigor mutant. In addition, we have included a Multiple Sequence Alignment (MSA) of KIN-A and KIN-B from different kinetoplastids with human Kinesin-1, human Mklp2 and yeast Klp9 in Figure 6A and S6A, showing the conservation of key motifs required for ATP coordination and tubulin interaction. In the corresponding paragraph in the main text, we describe these data as follows:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).’

      • The authors imply that KIN-A, but not KIN-B, interacts with microtubules based on microtubule pelleting assay (Fig. S6), but the substantial insoluble fractions of 6HIS-KINA and 6HIS-KIN-B make it difficult to conclusively interpret the data. It is possible that these two proteins are not stable unless they form a heterodimer.

      This is indeed a possibility. We are currently aiming at purifying full-length recombinant KIN-A and KIN-B (along with the other CPC components), which will allow us to perform in vitro interaction studies and to investigate biochemical properties of this complex (including the role of the motor domains of KIN-A and KIN-B) within the framework of an in-depth follow-up study. To address the point above, we have added the following text in the legend corresponding to Fig. S6:

      ‘Microtubule co-sedimentation assay with 6HIS-KIN-A2-309 (left) and 6HIS-KIN-B2-316 (right). S and P correspond to supernatant and pellet fractions, respectively. Note that both constructs to some extent sedimented even in the absence of microtubules. Hence, lack of microtubule binding for KIN-B may be due to the unstable non-functional protein used in this study.’

      • For broader context, some prior findings should be introduced, e.g. on the importance of the microtubule-binding capacity of the INCENP SAH domain and its regulation by mitotic phosphorylation (PMID 8408220, 26175154, 26166576, 28314740, 28314741, 21727193), since KIN-A and KIN-B may substitute for the function of the SAH domain.

      We have modified the introduction to include the following text and references mentioned by the reviewer: ‘The localization module comprises Borealin, Survivin and the N-terminus of INCENP, which are connected to one another via a three-helical bundle (Jeyaprakash et al., 2007, 2011; Klein et al., 2006). The two modules are linked by the central region of INCENP, composed of an intrinsically disordered domain and a single alpha helical (SAH) domain. INCENP harbours microtubule-binding domains within the N-terminus and the central SAH domain, which play key roles for CPC localization and function (Samejima et al., 2015; Kang et al., 2001; Noujaim et al., 2014; Cormier et al., 2013; Wheatley et al., 2001; Nakajima et al., 2011; Fink et al., 2017; Wheelock et al., 2017; van der Horst et al., 2015; Mackay et al., 1993).’

      Reviewer #2 (Public Review):

      How the chromosomal passenger complex (CPC) and its subunit Aurora B kinase regulate kinetochore-microtubule attachment, and how the CPC relocates from kinetochores to the spindle midzone as a cell transitions from metaphase to anaphase are questions of great interest. In this study, Ballmer and Akiyoshi take a deep dive into the CPC in T. brucei, a kinetoplastid parasite with a kinetochore composition that varies greatly from other organisms.

      Using a combination of approaches, most importantly in silico protein predictions using alphafold multimer and light microscopy in dividing T. brucei, the authors convincingly present and analyse the composition of the T. brucei CPC. This includes the identification of KIN-A and KIN-B, proteins of the kinesin family, as targeting subunits of the CPC. This is a clear advancement over earlier work, for example by Li and colleagues in 2008. The involvement of KIN-A and KIN-B is of particular interest, as it provides a clue for the (re)localization of the CPC during the cell cycle. The evolutionary perspective makes the paper potentially interesting for a wide audience of cell biologists, a point that the authors bring across properly in the title, the abstract, and their discussion.

      The evolutionary twist of the paper would be strengthened 'experimentally' by predictions of the structure of the CPC beyond T. brucei. Depending on how far the authors can extend their in-silico analysis, it would be of interest to discuss a) available/predicted CPC structures in well-studied organisms and b) structural predictions in other euglenozoa. What are the general structural properties of the CPC (e.g. flexible linkers, overall dimensions, structural differences when subunits are missing etc.)? How common is the involvement of kinesin-like proteins? In line with this, it would be good to display the figure currently shown as S1D (or similar) as a main panel.

      We thank the reviewer for her/his encouraging assessment of our manuscript and the appreciation on the extent of the evolutionary relevance of our work. As suggested, we have moved the phylogenetic tree previously shown in Fig. S1D to the main Fig. 1F. Our AF2 analysis of CPC proteins and (sub)complexes from other kinetoplastids failed to predict reliable interactions among CPC proteins except for that between Aurora B and the IN box. It therefore remains unclear whether CPC structures are conserved among kinetoplastids. Because components of CPC remain unknown in other euglenozoa (other than Aurora B and INCENP), we cannot perform structural predictions of CPC in diplonemids or euglenids.

      It remains unclear how common the involvement of kinesin-like proteins with the CPC is in other eukaryotes, partly because we could not identify an obvious homolog of KIN-A/KIN-B outside of kinetoplastids. Addressing this question would require experimental approaches in various eukaryotes (e.g. immunoprecipitation and mass spectrometry of Aurora B) as we carried out in this manuscript using Trypanosoma brucei.

      Reviewer #3 (Public Review):

      Summary:

      The protein kinase, Aurora B, is a critical regulator of mitosis and cytokinesis in eukaryotes, exhibiting a dynamic localisation. As part of the Chromosomal Passenger Complex (CPC), along with the Aurora B activator, INCENP, and the CPC localisation module comprised of Borealin and Survivin, Aurora B travels from the kinetochores at metaphase to the spindle midzone at anaphase, which ensures its substrates are phosphorylated in a time- and space-dependent manner. In the kinetoplastid parasite, T. brucei, the Aurora B orthologue (AUK1), along with an INCENP orthologue known as CPC1, and a kinetoplastid-specific protein CPC2, also displays a dynamic localisation, moving from the kinetochores at metaphase to the spindle midzone at anaphase, to the anterior end of the newly synthesised flagellum attachment zone (FAZ) at cytokinesis. However, the trypanosome CPC lacks orthologues of Borealin and Survivin, and T. brucei kinetochores also have a unique composition, being comprised of dozens of kinetoplastid-specific proteins (KKTs). Of particular importance for this study are KKT7 and the KKT8 complex (comprising KKT8, KKT9, KKT11, and KKT12). Here, Ballmer and Akiyoshi seek to understand how the CPC assembles and is targeted to its different locations during the cell cycle in T. brucei.

      Strengths & Weaknesses:

      Using immunoprecipitation and mass-spectrometry approaches, Ballmer and Akiyoshi show that AUK1, CPC1, and CPC2 associate with two orphan kinesins, KIN-A and KIN-B, and with the use of endogenously expressed fluorescent fusion proteins, demonstrate for the first time that KIN-A and KIN-B display a dynamic localisation pattern similar to other components of the CPC. Most of these data provide convincing evidence for KIN-A and KIN-B being bona fide CPC proteins, although the evidence that KIN-A and KIN-B translocate to the anterior end of the new FAZ at cytokinesis is weak - the KIN-A/B signals are very faint and difficult to see, and cell outlines/brightfield images are not presented to allow the reader to determine the cellular location of these faint signals (Fig S1B).

      We thank the reviewer for their thorough assessment of our manuscript and the insightful feedback to further improve our study. To address the point above, we have acquired new microscopy data for Fig. S1B and S1C, which now includes phase contrast images, and have chosen representative cells in late anaphase and telophase. We hope that the signal of Aurora BAUK1, KIN-A and KIN-B at the anterior end of the new FAZ can be now distinguished more clearly.

      They then demonstrate, by using RNAi to deplete individual components, that the CPC proteins have hierarchical interdependencies for their localisation to the kinetochores at metaphase. These experiments appear to have been well performed, although only images of cell nuclei were shown (Fig 2A), meaning that the reader cannot properly assess whether CPC components have localised elsewhere in the cell, or if their abundance changes in response to depletion of another CPC protein.

      We chose to show close-ups of the nucleus to highlight the different localization patterns of CPC proteins under the different RNAi conditions. In none of these conditions did we observe mis-localization of CPC subunits to the cytoplasm. To clarify this point, we added the following sentence in the legend for Figure 2A:

      ‘A) Representative fluorescence micrographs showing the localization of YFP-tagged Aurora BAUK1, INCENPCPC1, KIN-A and KIN-B in 2K1N cells upon RNAi-mediated knockdown of indicated CPC subunits. Note that nuclear close-ups are shown here. CPC proteins were not detected in the cytoplasm. RNAi was induced with 1 μg/mL doxycycline for 24 h (KIN-B RNAi) or 16 h (all others). Cell lines: BAP3092, BAP2552, BAP2557, BAP3093, BAP2906, BAP2900, BAP2904, BAP3094, BAP2899, BAP2893, BAP2897, BAP3095, BAP3096, BAP2560, BAP2564, BAP3097. Scale bars, 2 μm.’

      Ballmer and Akiyoshi then go on to determine the kinetochore localisation domains of KIN-A and KIN-B. Using ectopically expressed GFP-tagged truncations, they show that coiled-coil domains within KIN-A and KIN-B, as well as a disordered C-terminal tail present only in KIN-A, but not the N-terminal motor domains of KIN-A or KIN-B, are required for kinetochore localisation. These data are strengthened by immunoprecipitating CPC complexes and crosslinking them prior to mass spectrometry analysis (IP-CLMS), a state-of-the-art approach, to determine the contacts between the CPC components. Structural predictions of the CPC structure are also made using AlphaFold2, suggesting that coiled coils form between KIN-A and KIN-B, and that KIN-A/B interact with the N termini of CPC1 and CPC2. Experimental results show that CPC1 and CPC2 are unable to localise to kinetochores if they lack their N-terminal domains consistent with these predictions. Altogether these data provide convincing evidence of the protein domains required for CPC kinetochore localisation and CPC protein interactions. However, the authors also conclude that KIN-B plays a minor role in localising the CPC to kinetochores compared to KIN-A. This conclusion is not particularly compelling as it stems from the observation that ectopically expressed GFP-NLS-KIN-A (full length or coiled-coil domain + tail) is also present at kinetochores during anaphase unlike endogenously expressed YFP-KIN-A. Not only is this localisation probably an artifact of the ectopic expression, but the KIN-B coiled-coil domain localises to kinetochores from S to metaphase and Fig S2G appears to show a portion of the expressed KIN-B coiled-coil domain colocalising with KKT2 at anaphase. It is unclear why KIN-B has been discounted here.

      As the reviewer points out, a small fraction of GFP-NLS-KIN-B317-624 is indeed detectable at kinetochores in anaphase, although most of the protein shows diffuse nuclear staining. There are various explanations for this phenomenon: It is conceivable that the KIN-B motor domain may contribute to microtubule binding and translocation of the CPC from kinetochores onto the spindle in anaphase. In our experiments, ectopically expressed KIN-B317-624 likely outcompetes a fraction of endogenous KIN-B for binding to KIN-A, which could interfere with this translocation process, leaving a population of CPC ‘stranded’ at kinetochores in anaphase. Another possibility, hinted at by the reviewer, is that the C-terminus of KIN-B interacts with receptors at the kinetochore/centromere. Although we do not discount this possibility, we nevertheless decided to focus on KIN-A in this study, because the anaphase kinetochore retention phenotype for both full-length GFP-NLS-KIN-A and -KIN-A309-862 is much stronger than for KIN-B317-624. Two additional reasons were that (i) KIN-A is highly conserved within kinetoplastids, whereas KIN-B orthologs are missing in some kinetoplastids, and (ii) no convincing interactions between KIN-B and kinetochore proteins were predicted by AF2.

      To address the reviewer’s point, we decided to include KIN-B in the title of this manuscript, which now reads: ‘Dynamic localization of the chromosomal passenger complex is controlled by the orphan kinesins KIN-A and KIN-B in the kinetoplastid parasite Trypanosoma brucei’.

      Moreover, we modified the corresponding paragraph in the results section as follows:

      ‘Intriguingly, unlike endogenously YFP-tagged KIN-A, ectopically expressed GFP fusions of both full-length KIN-A and KIN-A310-862 clearly localized at kinetochores even in anaphase (Figs. 2, F and H). Weak anaphase kinetochore signal was also detectable for KIN-B317-624 (Fig. S2F). GFP fusions of the central coiled-coil domain or the C-terminal disordered tail of KIN-A did not localize to kinetochores (data not shown). These results show that kinetochore localization of the CPC is mediated by KIN-A and KIN-B and requires both the central coiled-coil domain as well as the C-terminal disordered tail of KIN-A.’

      Next, using a mixture of RNAi depletion and LacI-LacO recruitment experiments, the authors show that kinetochore proteins KKT7 and KKT9 are required for AUK1 to localise to kinetochores (other KKT8 complex components were not tested here) and that all components of the KKT8 complex are required for KIN-A kinetochore localisation. Further, both KKT7 and KKT8 were able to recruit AUK1 to an ectopic locus in the S phase, and KKT7 recruited KKT8 complex proteins, which the authors suggest indicates it is upstream of KKT8. However, while these experiments have been performed well, the reciprocal experiment to show that KKT8 complex proteins cannot recruit KKT7, which could have confirmed this hierarchy, does not appear to have been performed. Further, since the LacI fusion proteins used in these experiments were ectopically expressed, they were retained (artificially) at kinetochores into anaphase; KKT8 and KIN-A were both able to recruit AUK1 to LacO foci in anaphase, while KKT7 was not. The authors conclude that this suggests the KKT8 complex is the main kinetochore receptor of the CPC - while very plausible, this conclusion is based on a likely artifact of ectopic expression, and for that reason, should be interpreted with a degree of caution.

      We previously showed that RNAi-mediated depletion of KKT7 disrupts kinetochore localization of KKT8 complex members, whereas kinetochore localization of KKT7 is unaffected by disruption of the KKT8 complex (Ishii and Akiyoshi, 2020). Moreover, in contrast to the KKT8 complex, KKT7 remains at kinetochores in anaphase (Akiyoshi and Gull, 2014). These data show that KKT7 is upstream of the KKT8 complex. In this context, the LacI-LacO tethering approach can be very useful to probe whether two proteins (or domains of proteins) could interact in vivo either directly or indirectly. However, a recruitment hierarchy cannot be inferred from such experiments because the data just shows whether X can recruit Y to an ectopic locus (but not whether X is upstream of Y or vice versa). Regarding the retention of Aurora BAUK1 at kinetochores in anaphase upon ectopic expression of GFP-KKT8-LacI, we agree with the reviewer that these data need to be carefully interpreted. Nevertheless, the notion that the KKT7-KKT8 complex recruits the CPC to kinetochores is also strongly supported by IP-MS, RNAi experiments, and AF2 predictions. For clarification and to address the reviewer’s point, we re-formulated the corresponding paragraph in the main text:

      ‘We previously showed that KKT7 lies upstream of the KKT8 complex (Ishii and Akiyoshi, 2020). Indeed, GFP-KKT72-261-LacI recruited tdTomato-KKT8, -KKT9 and -KKT12 (Fig. S4E). Expression of both GFP-KKT72-261-LacI and GFP-KKT8-LacI resulted in robust recruitment of tdTomato-Aurora BAUK1 to LacO foci in S phase (Figs. 4, E and F). Intriguingly, we also noticed that, unlike endogenous KKT8 (which is not present in anaphase), ectopically expressed GFP-KKT8-LacI remained at kinetochores during anaphase (Fig. 4F). This resulted in a fraction of tdTomato-Aurora BAUK1 being trapped at kinetochores during anaphase instead of migrating to the central spindle (Fig. 4F). We observed a comparable situation upon ectopic expression of GFP-KIN-A, which is retained on anaphase kinetochores together with tdTomato-KKT8 (Fig. S4F). In contrast, Aurora BAUK1 was not recruited to LacO foci marked by GFP- KKT72-261-LacI in anaphase (Fig. 4E).’

      Further IP-CLMS experiments, in combination with recombinant protein pull-down assays and structural predictions, suggested that within the KKT8 complex, there are two subcomplexes of KKT8:KKT12 and KKT9:KKT11, and that KKT7 interacts with KKT9:KKT11 to recruit the remainder of the KKT8 complex. The authors also assess the interdependencies between KKT8 complex components for localisation and expression, showing that all four subunits are required for the assembly of a stable KKT8 complex and present AlphaFold2 structural modelling data to support the two subcomplex models. In general, these data are of high quality and convincing with a few exceptions. The recombinant pulldown assay (Fig. 4H) is not particularly convincing as the 3rd eluate gel appears to show a band at the size of KKT11 (despite the labelling indicating no KKT11 was present in the input) but no pulldown of KKT9, which was present in the input according to the figure legend (although this may be mislabeled since not consistent with the text). The text also states that 6HIS-KKT8 was insoluble in the absence of KKT12, but this is not possible to assess from the data presented.

      We thank the reviewer for pointing out an error in the text: ‘Removal of both KKT9 and KKT11 did not impact formation of the KKT8:KKT12 subcomplex’ should read ‘Removal of either KKT9 or KKT11 did not impact formation of the KKT8:KKT12 subcomplex’. Regarding the very faint band perceived to be KKT11 in the 3rd eluate: This band runs slightly lower than KKT11 and likely represents a bacterial contaminant (which we have seen also in other preps in the past). We have made a note of this in the corresponding legend (new Fig. 4I). Moreover, we provide the estimated molecular weights for each subunit, as suggested by the reviewer in recommendation #14 (see below):

      ‘(I) Indicated combinations of 6HIS-tagged KKT8 (~46 kDa), KKT9 (~39 kDa), KKT11 (~29 kDa) and KKT12 (~23 kDa) were co-expressed in E. coli, followed by metal affinity chromatography and SDS-PAGE. The asterisk indicates a common contaminant.’

      The corresponding paragraph in the results section now reads:

      To validate these findings, we co-expressed combinations of 6HIS-KKT8, KKT9, KKT11 and KKT12 in E. coli and performed metal affinity chromatography (Fig. 4I). 6HIS-KKT8 efficiently pulled down KKT9, KKT11 and KKT12, as shown previously (Ishii and Akiyoshi, 2020). In the absence of KKT9, 6HIS-KKT8 still pulled down KKT11 and KKT12. Removal of either KKT9 or KKT11 did not impact formation of the KKT8:KKT12 subcomplex. In contrast, 6HIS-KKT8 could not be recovered without KKT12, indicating that KKT12 is required for formation of the full KKT8 complex. These results support the idea that the KKT8 complex consists of KKT8:KKT12 and KKT9:KKT11 subcomplexes.’

      It is also surprising that data showing the effects of KKT8, KKT9, and KKT12 depletion on KKT11 localisation and abundance are not presented alongside the reciprocal experiments in Fig S4G-J.

      YFP-KKT11 is delocalized upon depletion of KKT8 and KKT9 (see below). Unfortunately, we were unsuccessful in our attempts at deriving the corresponding KKT12 RNAi cell line, rendering this set of data incomplete. Because these data are not of critical importance for this study, we decided not to invest more time in attempting further transfections.

      Author response image 1.

      The authors also convincingly show that AlphaFold2 predictions of interactions between KKT9:KKT11 and a conserved domain (CD1) in the C-terminal tail of KIN-A are likely correct, with CD1 and a second conserved domain, CD2, identified through sequence analysis, acting synergistically to promote KIN-A kinetochore localisation at metaphase, but not being required for KIN-A to move to the central spindle at anaphase. They then hypothesise that the kinesin motor domain of KIN-A (but not KIN-B which is predicted to be inactive based on non-conservation of residues key for activity) determines its central spindle localisation at anaphase through binding to microtubules. In support of this hypothesis, the authors show that KIN-A, but not KIN-B can bind microtubules in vitro and in vivo. However, ectopically expressed GFP-NLS fusions of full-length KIN-A or KIN-A motor domain did not localise to the central spindle at anaphase. The authors suggest this is due to the GPF fusion disrupting the ATPase activity of the motor domain, but they provide no evidence that this is the case. Instead, they replace endogenous KIN-A with a predicted ATPase-defective mutant (G209A), showing that while this still localises to kinetochores, the kinetochores were frequently misaligned at metaphase, and that it no longer concentrates at the central spindle (with concomitant mis-localisation of AUK1), causing cells to accumulate at anaphase. From these data, the authors conclude that KIN-A ATPase activity is required for chromosome congression to the metaphase plate and its central spindle localisation at anaphase. While potentially very interesting, these data are incomplete in the absence of any experimental data to show that KIN-A possesses ATPase activity or that this activity is abrogated by the G209A mutation, and the conclusions of this section are rather speculative.

      Thank you for this important comment, which relates to a similar point raised by Reviewer 1 (see above). Indeed, ATPase and motor activity of KIN-A remain to demonstrated biochemically using recombinant proteins, which is beyond the scope of this study. We generated MSAs of KIN-A and KIN-B from different kinetoplastids with human Kinesin-1, human Mklp2 and yeast Klp9, which are now presented in Figure 6A and S6A. These clearly show that key motifs required for ATP or tubulin binding in other kinesins are highly conserved in KIN-A (but not KIN-B). This includes the conserved glycine residue in the Switch II helix (G234 in human Kinesin-1, G210 in T. brucei KIN-A), which forms a hydrogen bond with the γ-phosphate of ATP, and upon mutation has been shown to impair ATPase activity and trap the motor head in a strong microtubule (‘rigor’) state (Rice et al., 1999; Sablin et al., 1996). The prominent rigor phenotype of KIN-AG210A is consistent with KIN-A having ATPase activity. In addition to the data in Fig. 6A and S6A, we made following changes to the main text:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).

      Ectopically expressed GFP-KIN-A and -KIN-A2-309 partially localized to the mitotic spindle but failed to concentrate at the midzone during anaphase (Figs. 2, F and G), suggesting that N-terminal tagging of the KIN-A motor domain may interfere with its function. To address whether the ATPase activity of KIN-A is required for central spindle localization of the CPC, we replaced one allele of KIN-A with a C-terminally YFP-tagged G210A ATP hydrolysis-defective rigor mutant (Fig. 6A) (Rice et al., 1999) and used an RNAi construct directed against the 3’UTR of KIN-A to deplete the untagged allele. The rigor mutation did not affect recruitment of KIN-A to kinetochores (Figs. S6, C and D). However, KIN-AG210A-YFP marked kinetochores were misaligned in ~50% of cells arrested in metaphase, suggesting that ATPase activity of KIN-A promotes chromosome congression to the metaphase plate (Figs. S6, E-H).’

      Impact:

      Overall, this work uses a wide range of cutting-edge molecular and structural predictive tools to provide a significant amount of new and detailed molecular data that shed light on the composition of the unusual trypanosome CPC and how it is assembled and targeted to different cellular locations during cell division. Given the fundamental nature of this research, it will be of interest to many parasitology researchers as well as cell biologists more generally, especially those working on aspects of mitosis and cell division, and those interested in the evolution of the CPC.

      We thank the reviewer for his/her feedback and thoughtful and thorough assessment of our study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Why did the authors omit KIN-B from the title?

      We decided to add KIN-B in the title. Please see our response to Reviewer #3 (public review).

      (2) Abstract, line 28, "Furthermore, the kinesin motor activity of KIN-A promotes chromosome alignment in prometaphase and CPC translocation to the central spindle upon anaphase onset." This must be revised - see public review.

      We changed this section of the abstract as follows:

      ‘Furthermore, the ATPase activity of KIN-A promotes chromosome alignment in prometaphase and CPC translocation to the central spindle upon anaphase onset. Thus, KIN-A constitutes a unique ‘two-in-one’ CPC localization module in complex with KIN-B, which directs the CPC to kinetochores (from S phase until metaphase) via its C-terminal tail, and to the central spindle (in anaphase) via its N-terminal kinesin motor domain.’

      (3) Line 87-90. The findings by Li et al., 2008 (KIN-A and KIN-B interacting with Aurora B and epistasis analysis) should be introduced more comprehensively in the Introduction section.

      We added the following sentence in the introduction:

      ‘In addition, two orphan kinesins, KIN-A and KIN-B, have been proposed to transiently associate with Aurora BAUK1 during mitosis (Li et al., 2008; Li, 2012).’

      (4) Figure 1B. The way the Trypanosoma cell cycle is defined should be briefly explained in the main text, rather than just referring to the figure.

      The ‘KN’ annotation of the trypanosome cell cycle is explained in the Figure 1 legend. We now also added a brief description in the main text:

      ‘We next assessed the localization dynamics of fluorescently tagged KIN-A and KIN-B over the course of the cell cycle (Figs. 1, B-E). T. brucei possesses two DNA-containing organelles, the nucleus (‘N’) and the kinetoplast (‘K’). The kinetoplast is an organelle found uniquely in kinetoplastids, which contains the mitochondrial DNA and replicates and segregates prior to nuclear division. The ‘KN’ configuration serves as a good cell cycle marker (Woodward and Gull, 1990; Siegel et al., 2008).’

      (5) Line 118. Throughout the paper, it is not clear why GFP-NLS fusion was used instead of GFP fusion. Please justify the fusion of NLS.

      NLS refers to a short ‘nuclear localization signal’ (TGRGHKRSREQ) (Marchetti et al., 2000), which ensures that the ectopically expressed construct is imported into the nucleus. When we previously expressed truncations of KKT2 and KKT3 kinetochore proteins, many fragments did not go into the nucleus presumably due to the lack of an NLS, which prevented us from determining which domains are responsible for their kinetochore localization. We have since then consistently used this short NLS sequence in our inducible GFP fusions in the past without any complications. We added a sentence in the Materials & Methods section under Trypanosome culture: ‘All constructs for ectopic expression of GFP fusion proteins include a short nuclear localization signal (NLS) (Marchetti et al., 2000).’ To avoid unnecessary confusion, we removed ‘NLS’ from the main text and figures.

      (6) Line 121, "Unexpectedly". It is not clear why this was unexpected.

      To clarify this point, we modified this paragraph in the results section:

      ‘To our surprise, KIN-A-YFP and GFP-KIN-B exhibited a CPC-like localization pattern identical to that of Aurora BAUK1: Both kinesins localized to kinetochores from S phase to metaphase, and then translocated to the central spindle in anaphase (Figs. 1, C-E). Moreover, like Aurora BAUK1, a population of KIN-A and KIN-B localized at the new FAZ tip from late anaphase onwards (Figs. S1, B and C). This was unexpected, because KIN-A and KIN-B were previously reported to localize to the spindle but not to kinetochores or the new FAZ tip (Li et al., 2008). These data suggest that KIN-A and KIN-B are bona fide CPC proteins in trypanosomes, associating with AuroraAUK1, INCENPCPC1 and CPC2 throughout the cell cycle.’

      (7) Line 127-129. Defining homologs and orthologs is tricky - there are many homologs and paralogs of kinesin-like proteins. The method to define the presence or absence of KIN-A/KIN-B homologs should be described in the Materials and Methods section.

      Due to the difficulty in defining true orthologs for kinesin-like proteins, we took a conservative approach: reciprocal best BLAST hits. We first searched KIN-A homologs using BLAST in the TriTryp database or using hmmsearch using manually prepared hmm profiles. When the top hit in a given organism found T. brucei KIN-A in a reciprocal BLAST search in T. brucei proteome, we considered the hit as a true ortholog. We modified the Materials and Methods section as below.

      ‘Searches for homologous proteins were done using BLAST in the TriTryp database (Aslett et al., 2010) or using hmmsearch using manually prepared hmm profiles (HMMER version 3.0; Eddy, 1998). The top hit was considered as a true ortholog only if the reciprocal BLAST search returned the query protein in T. brucei.’

      (8) Line 156. For non-experts of Trypanosoma cell biology, it is not clear how the nucleolar localization is defined.

      The nucleolus in T. brucei is discernible as a DAPI-dim region in the nucleus.

      (9) Fig.2G and Fig.S2F. These data imply that the coiled-coil and C-terminal tail domains of KIN-A/KIN-B are important for anaphase spindle midzone enrichment. However, it is odd that this was not mentioned. This reviewer recommends that the authors quantify the midzone localization data of these constructs and discuss the role of the coiled-coil domains.

      One possibility is that KIN-A and KIN-B need to form a complex (via their coiled-coil domains) to localize to the spindle midzone. Another likely possibility, which is discussed in the manuscript, is that N-terminal tagging of KIN-A impairs motor activity. This is supported by the fact that the central spindle localization is also disrupted in full-length GFP-KIN-A. We decided not to provide a quantification for these data due to low sample sizes for some of the constructs (e.g. expression not observed in all cells).

      (10) Line 288-289, "pLDDT scores improved significantly for KIN-A CD1 in complex with KKT9:KKT11 (>80) compared to KIN-A CD1 alone (~20) (Figs. S3, A and B)." I can see that pLDDT score is about 20 at KIN-A CD1 from Figs S3A, but the basis of pLDDT > 80 upon inclusion go KKT9:KKT11 is missing.

      We added the pLDDT and PAE plots for the AF2 prediction of KIN-A700-800 in complex with KKT9:KKT11 in Fig. S5B.

      (11) Fig. 5A. Since there is no supporting biochemical data for KIN-A-KKT9-KKT11 interaction, it is important to assess the stability of AlphaFold-based structural predictions of the KIN-A-KKT9-KKT11 interaction. Are there significant differences among the top 5 prediction results, and do these interactions remain stable after the "simulated annealing" process used in the AlphaFold predictions? Are predicted CD1-interacting regions/amino residues in KKT9 and KKT11 evolutionarily conserved?

      See above. The interaction was predicted in all 5 predictions as shown in Fig. S5B. Conservation of the CD1-interacting regions in KKT9 and KKT11 are shown below:

      Author response image 2.

      KKT9 (residues ~53 – 80 predicted to interact with KIN-A in T. brucei)

      Author response image 3.

      KKT11 (residues 61-85 predicted to interact with KIN-A in T. brucei)

      (12) Line 300, Fig. S5D and E, "failed to localize at kinetochores". From this resolution of the microscopy images, it is not clear if these proteins fail to localize at kinetochores as the KKT and KIN-A310-716 signals overlap. Perhaps, "failed to enrich at kinetochores" is a more appropriate statement.

      We changed this sentence according to the reviewer’s suggestion.

      (13) Line 309 and Fig 5D and F, "predominantly localized to the mitotic spindle". From this image shown in Fig 5D, it is not clear if KIN-A∆CD1-YFP and Aurora B are predominantly localized to the spindle or if they are still localized to centromeres that are misaligned on the spindle. Without microtubule staining, it is also not clear how microtubules are distributed in these cells. Please clarify how the presence or absence of kinetochore/spindle localization was defined.

      As shown in Fig. S5E and S5F, deletion of CD1 clearly impairs kinetochore localization of KIN-A (kinetochores marked by tdTomato-KKT2). Moreover, misalignment of kinetochores, as observed upon expression of the KIN-AG210A rigor mutant, would result in an increase in 2K1N cells and proliferation defects, which is not the case for the KIN-A∆CD1 mutant (Fig. 5H, Fig. S5I). KIN-A∆CD1-YFP appears to localize diffusely along the entire length of the mitotic spindle, whereas we still observe kinetochore-like foci in the rigor mutant. Unfortunately, we do not have suitable antibodies that would allow us to distinguish spindle microtubules from the vast subpellicular microtubule array present in T. brucei and hence need to rely on tagging spindle-associated proteins such as MAP103.

      (14) Fig. 5F, G, S5F. Along the same lines, it would be helpful to show example images for each category - "kinetochores", "kinetochores + spindle", and "spindle".

      As suggested by the reviewer, we have now included example images for each category (‘kinetochores’, ‘kinetochores + spindle’, ‘spindle’) along with a schematic illustration in Fig. 5F.

      (15) Line 332 and Fig. S6A. The experiment may be repeated in the presence of ATP or nonhydrolyzable ATP analogs.

      We thank the reviewer for the suggestion. We envisage such experiments for an in-depth follow-up study.

      (16) Line 342, "motor activity of KIN-A". Until KIN-A is shown to have motor activity, the result based on the rigor mutant does not show that the motor activity of KIN-A promotes chromosome congression. The result suggests that the ATPase activity of KIN-A is important.

      We changed that sentence as suggested by the reviewer.

      (17) Line 419 -. The authors base their discussion on the speculation that KIN-A is a plus-end directed motor. Please justify this speculation.

      Indeed, the notion that KIN-A is a plus-end directed motor remains a hypothesis, which is based on sequence alignments with other plus-end directed motors and the observation that the KIN-A motor domain is involved in translocation of the CPC to the central spindle in anaphase. We have modified the corresponding section in the discussion as follows:

      ‘It remains to be investigated whether KIN-A truly functions as a plus-end directed motor. The role of the KIN-B in this context is equally unclear. Since KIN-B does not possess a functional kinesin motor domain, we deem it unlikely that the KIN-A:KIN-B heterodimer moves hand-over-hand along microtubules as do conventional (kinesin-1 family) kinesins. Rather, the KIN-A motor domain may function as a single-headed unit and drive processive plus-end directed motion using a mechanism similar to the kinesin-3 family kinesin KIF1A (Okada and Hirokawa, 1999).’

      (18) Line 422-423, "plus-end directed motion using a mechanism similar to kinesin-3 family kinesins (such as KIF1A)." Please cite a reference supporting this statement.

      See above. We cited a paper by (Okada and Hirokawa, 1999).

      Reviewer #2 (Recommendations For The Authors):

      Please provide a quantification of data shown in Figure 2F-H and described in lines 151-166.

      We decided not to provide a quantification for these data due to low sample sizes for some of the constructs (e.g. expression not observed in all cells).

      It appears as if the paper more or less follows a chronological order of the experiments that were performed before AF multimer enabled the insightful and compelling structural analysis. That is a matter of style, but in some cases, the writing could be updated, shortened, or re-arranged into a more logical order. Concrete examples:

      (i) Line 144: "we did not include CPC2 for further analysis in this study" Although CPC2 features at a prominent and interesting position in the predicted structures of the kinetoplastid CPC, shown in later main figures.

      We attempted RNAi-mediated depletion of CPC2 using two different shRNA constructs. However, we cannot exclude the possibility that the knockdown of CPC2 was less efficient compared with the other CPC subunits. For this reason, we decided to remove all the data on CPC2 from Fig. S2.

      (ii) The work with the KIN-A motor domain only and KIN-A ∆motor domain (Fig 2) begs the question about a more subtle mutation to interfere with the motor domain. Which is ultimately presented in Fig 6. I think that the final paragraph and Figure 6 follow naturally after Figure 2.

      We appreciate the suggestion. However, we would like to keep Figure 6 there.

      (iii) The high-confidence structural predictions in Fig 3 and Fig 4 are insightful. The XL-MS descriptions that precede them are not so helpful (Fig 3A and 4G and in the text). To emphasize their status as experimental support for the predicted structures, which is very important, it would be good to discuss the XL-MS after presenting the models.

      As suggested, we have re-arranged the text and/or figures such that the AF2 predictions are discussed first and the CLMS data are brought in afterwards.

      Figure 1A prominently features an arbitrary color code and a lot of protein IDs without a legend. That is not a very convincing start. Figure S1 is more informative, containing annotated protein names and results of the KIN-A and KIN-B IPs. Please improve Figure 1A, for example by presenting a modified version of Figure S1. In all these types of figures, please list both protein names and gene IDs.

      We agree with the reviewer that the IP-MS data in Fig. S1 is more informative and hence decided to swap the heatmaps in Fig. 1A and Fig. S1A. We further annotated the heatmap corresponding to the Aurora BAUK1 IP-MS (now presented in Fig. S1) as suggested by the reviewer.

      The visualization of the structural predictions is not consistent among figures:

      (i) The structure in Fig 4I is important and could be displayed larger. The pLDDT scores, and especially those of the non-displayed models, do not add much information and should not be a main panel. If the authors want to display the pLDDT scores, I recommend a panel (main or supplement) of the structure colored for local prediction confidences, as in Fig 5A.

      (ii) In Figure 5A itself, it is hard to follow the chains in general, and KIN-A in particular, since the structure is pLDDT-coloured. Please present an additional panel colored by chain (consistent with Fig 4I, as mentioned above).

      (iii) The summarizing diagram, currently displayed as Fig 4J, should be placed after Fig 5A and take the discovered KIN-A - KKT9-11 connection into account. Ideally, it also covers the suspected importance of the motor domain and serves as a summarising diagram.

      We thank the reviewer for the constructive comments. For each structure prediction, we now present two images side by side; one coloured by chain and one colored by pLDDT. We recently re-ran AF2 for the full CPC and also for the KKT7N-KKT8 complex, and got improved predictions. Hence some of the models in Fig. 3/S3 and Fig. 4/S4 have been updated accordingly. For the CLMS plots, we also decided to colour the cross-links according to whether the 30 angstrom distance constraints were fulfilled or not in the AF2 prediction. We also increased the size of the structures shown in Fig. 4. Furthermore, we decided to remove the summarizing diagram from Fig. 4 and instead made a new main Fig. 7, which shows a more detailed schematic, which also takes into account the proposed function of the KIN-A motor domain, as suggested by the reviewer, and other points addressed in the Discussion.

      The methods section for the structural predictions lacks essential information. Predictions can only be reproduced if the version of AF2 multimer v2.x is specified and key parameters are mentioned.

      As suggested, we have added the details in the Materials and Methods section as follows.

      ‘Structural predictions of KIN-A/KIN-B, KIN-A310-862/KIN-B317-624, CPC1/CPC2/KIN-A300-599/KIN-B 317-624, and KIN-A700-800/KKT9/KKT11 were performed using ColabFold version 1.3.0 (AlphaFold-Multimer version 2), while those of AUK1/CPC1/CPC2/KIN-A1-599/KIN-B, KKT71-261/KKT9/KKT11/KKT8/KKT12, KKT9/KKT11/KKT8/KKT12, and KKT71-261/KKT9/KKT11 were performed using ColabFold version 1.5.3 (AlphaFold-Multimer version 2.3.1) using default settings, accessed via https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.3.0/AlphaFold2.ipynb and https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.5.3/AlphaFold2.ipynb.’

      Line 121, please explain the "Unexpectedly" by including a reference to the work from Li and colleagues. A statement with some details would be useful, as the difference between both studies appears to be crucial for the novelty of this paper. Alternatively, refer to this being covered in the discussion.

      To clarify this point, we modified this paragraph in the results section:

      ‘To our surprise, KIN-A-YFP and GFP-KIN-B exhibited a CPC-like localization pattern identical to that of Aurora BAUK1: Both kinesins localized to kinetochores from S phase to metaphase, and then translocated to the central spindle in anaphase (Figs. 1, C-E). Moreover, like Aurora BAUK1, a population of KIN-A and KIN-B localized at the new FAZ tip from late anaphase onwards (Figs. S1, B and C). This was unexpected, because KIN-A and KIN-B were previously reported to localize to the spindle but not to kinetochores or the new FAZ tip (Li et al., 2008). These data suggest that KIN-A and KIN-B are bona fide CPC proteins in trypanosomes, associating with AuroraAUK1, INCENPCPC1 and CPC2 throughout the cell cycle.’

      Line 285 refers to "conserved" regions in the C-terminal part of KIN-A, referring to Figure 5. Please expand the MSA in Figure 5B to get an idea about the conservation/variation outside CD1 and CD2.

      We now present the full MSA for KIN-A proteins in kinetoplastids in Fig. S5A.

      Please specify what is meant by Line 367-369 for someone who is not familiar with the work by Komaki et al. 2022. Either clarify in the text or clarify in the text with data to support it.

      We updated the corresponding section in the discussion as follows:

      ‘Komaki et al. recently identified two functionally redundant CPC proteins in Arabidopsis, Borealin Related Interactor 1 and 2 (BORI1 and 2), which engage in a triple helix bundle with INCENP and Borealin using a conserved helical domain but employ an FHA domain instead of a BIR domain to read H3T3ph (Komaki et al., 2022).’

      Data presented in Figure S6A, the microtubule co-sedimentation assay, is not convincing since a substantial amount of KIN-A/B is pelleted in the absence of microtubules. Did the authors spin the proteins in BRB80 before the assay to continue with soluble material and reduce sedimentation in the absence of microtubules? If the authors want to keep the wording in lines 331-332, the MT-binding properties of KIN-A and KIN-B need to be investigated in more detail, for example with a titration and a quantification thereof. Otherwise, they should change the text and replace "confirms" with "is consistent with". In any case, the legend needs to be expanded to include more information.

      To address the point above, we have added the following text in the legend corresponding to Fig. S6:

      ‘Microtubule co-sedimentation assay with 6HIS-KIN-A2-309 (left) and 6HIS-KIN-B2-316 (right). S and P correspond to supernatant and pellet fractions, respectively. Note that both constructs to some extent sedimented even in the absence of microtubules. Hence, lack of microtubule binding for KIN-B may be due to the unstable non-functional protein used in this study.’

      We have also updated the main text in the results section:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).’

      Details:

      The readability of the pAE plots could be improved by arranging sequences according to their position in the structure. For example in Fig4I, KKT8 could precede KKT12. If it is easy to update this, the authors might want to do so.

      We re-ran the AF2 predictions for the KKT7N – KKT8 complex in Fig. 4/S4 and changed the order according to the reviewer’s suggestion (KKT9:KKT11:KKT8:KKT12).

      The same paper is referred to as Je Van Hooff et al. 2017 and as Van Hooff et al. 2017

      Thank you for pointing this out. We have corrected the citation.

      Reviewer #3 (Recommendations For The Authors):

      (1) Please state at the end of the introduction/start of the results section that this work was performed in procyclic trypanosomes. Given that the cell cycles of procyclic and bloodstream forms differ, this is important.

      We added this information at the end of the introduction:

      ‘Here, by combining biochemical, structural and cell biological approaches in procyclic form T. brucei, we show that the trypanosome CPC is a pentameric complex comprising Aurora BAUK1, INCENPCPC1, CPC2 and the two orphan kinesins KIN-A and KIN-B.’

      (2) Please define NLS at first use (line 118), and for clarity, explain the rationale for using GFP with an NLS.

      NLS refers to a short ‘nuclear localization signal’ (TGRGHKRSREQ) (Marchetti et al., 2000), which ensures that the ectopically expressed construct is imported into the nucleus. When we previously expressed truncations of KKT2 and KKT3 kinetochore proteins, many fragments did not go into the nucleus presumably due to the lack of an NLS, which prevented us from determining which domains are responsible for their kinetochore localization. We have since then consistently used this short NLS sequence in our inducible GFP fusions in the past without any complications. We added a sentence in the Materials & Methods section under Trypanosome culture: ‘All constructs for ectopic expression of GFP fusion proteins include a short nuclear localization signal (NLS) (Marchetti et al., 2000).’ To avoid unnecessary confusion, we removed ‘NLS’ from the main text and figures.

      (3) Lines 148-150 - it would strengthen this claim if KIN-A/B protein levels were assessed by Western blot.

      We now present a Western blot in Fig. S2C, showing that bulk KIN-B levels are clearly reduced upon KIN-A RNAi. The same is true also to some extent for KIN-A levels upon KIN-B RNAi, although this is less obvious, possibly due to the lower efficiency of KIN-B compared to KIN-A RNAi as judged by fluorescence microscopy (quantified in Fig. 2D and 2E).

      (4) Line 253 - the text mentions the removal of both KKT9 and KKT11, which is not consistent with the figure (Fig 4H) - do you mean the removal of either KKT9 or KKT11?

      Yes, we thank the reviewer for pointing out this mistake in the text, which has now been corrected.

      (5) Line 337 - please include a reference for the G209A ATPase-defective rigor mutant - has this been shown to result in KIN-A being inactive previously?

      Please see above our answer in public review.

      (6) It is not always obvious when fluorescent fusion proteins are being expressed endogenously or ectopically, or when they are being expressed in an RNAi background or not without tracing the cell lines in Table S1 - please ensure this is clearly stated throughout the manuscript.

      We now made sure that this is clearly stated in the main text as well as in the figure legends.

      (7) Line 410 - 'KIN-A C-terminal tail is stuffed full of conserved CDK1CRK3 sites' - what does 'stuffed full' really mean (this is rather imprecise) and what are the consensus sites - are these CDK1 consensus sites that are assumed to be conserved for CRK3? I'm not aware of consensus sites for CRK3 having been determined, but if they have, this should be referenced.

      We have modified the corresponding section in the discussion as follows:

      ‘In support of this, the KIN-A C-terminal tail harbours many putative CRK3 sites (10 sites matching the minimal S/T-P consensus motif for CDKs) and is also heavily phosphorylated by Aurora BAUK1 in vitro (Ballmer et al. 2024). Finally, we speculate that the interaction of KIN-A motor domain with microtubules, coupled to the force generating ATP hydrolysis and possibly plus-end directed motion, eventually outcompetes the weakened interactions of the CPC with the kinetochore and facilitates the extraction of the CPC from chromosomes onto spindle microtubules during anaphase. Indeed, deletion of the KIN-A motor domain or impairment of its motor function through N-terminal GFP tagging causes the CPC to be trapped at kinetochores in anaphase. Central spindle localization is additionally dependent on the ATPase activity of the KIN-A motor domain as illustrated by the KIN-A rigor mutant.’

      (8) Lines 412-416: this proposal is written rather definitively - given no motor activity has been demonstrated for KIN-A, please make clear that this is still just a theory.

      See above.

      (9) Fig 1: KKT2 is not highlighted in Fig 1A - given this has been used for colocalization in Fig 1C-E, was it recovered, and if not, why not? Fig 1B-E: the S phase/1K1N terminology is somewhat misleading. Not all S phase cells will have elongated kinetoplasts - usually an asterisk is used to signify replicated DNA, not kinetoplast shape. If it is to be used here for elongation, then for consistency, N should be used for G2/mitotic cells.

      Fig. 1A (now Fig. S1A) only shows the tip 30 hits. KKT2 was indeed recovered with Aurora BAUK1 (see Table S2) and is often used as a kinetochore marker in trypanosomes by our lab and others since the signal of fluorescently tagged KKT2 is relatively bright and KKT2 localizes to centromeres throughout the cell cycle.

      (10) A general comment for all image figures is that these do not have accompanying brightfield images and it is therefore difficult to know where the cell body is, or sometimes which nuclei and kinetoplasts belong to which cell where DNA from more than one cell is within the image. It would be beneficial if brightfield images could be added, or alternatively, the cell outlines were traced onto DAPI or merged images. Also, brightfield images would allow the stage of cytokinesis (pre-furrowing/furrowing/abscission) in anaphase cells to be determined.

      Since this study primarily addresses the recruitment mechanism of the CPC to kinetochores and to the central spindle from S phase to metaphase and in anaphase, respectively, and CPC proteins are not observed outside of the nucleus during these cell cycle stages, we did not present brightfield images in the figures. However, this point is particularly valid for discerning the localization of KIN-A and KIN-B to the new FAZ tip from late anaphase onwards. Hence, we acquired new microscopy data for Fig. S1B and S1C, which now includes phase contrast images, and have chosen representative cells in late anaphase and telophase. We hope that the signal of Aurora BAUK1, KIN-A and KIN-B at the anterior end of the new FAZ can be now distinguished more clearly.

      (11) Fig 2A: legend should state that the micrographs show the localisation of the proteins within the nucleus as whole cells are not shown. 2C: can INCENP not be split into 2 lines - the 'IN' looks like 1N at first glance, which is confusing.

      We have applied the suggested change in Fig. 2.

      (12) Fig 3 (and other AF2 figures): Could the lines for satisfied & not satisfied in the key be thicker so they more closely resemble the lines in the figure and are less likely to be confused with the disordered regions of the CPC components?

      We have now made those lines thicker.

      (13) Why were different E value thresholds used in Fig 3 and Fig 4?

      The CLMS data in Fig. 3 and Fig. 4 now both use the same E value threshold of E-3 (previously E-4 was used in Fig. 4). To determine a sensible significance threshold, we included some yeast protein sequences (‘false positives’) in the database used in pLink2 for identification of crosslinked peptides. Note that we recently also re-ran AF2 for the full CPC and for the KKT7N-KKT8 complex and got improved predictions. Hence some of the models in Fig. 3/S3 and Fig. 4/S4 have been updated accordingly. For the CLMS plots, we also decided to colour the cross-links according to whether the 30 angstrom distance constraints were fulfilled or not in the AF2 prediction.

      (14) Fig 4H legend - please give the expected sizes of these recombinant proteins & check the 3rd elution panel (see public review comments).

      See above response in public review.

      (15) Fig 4I - please explain what the colours of the PAE plot and the values in the key signify, as well as how the Scored Residue values are arrived at. Please also define the pIDDT in the legend.

      We have cited DeepMind’s 2021 methods paper, in which the outputs of AlphaFold are explained in detail. We also added a short description of the pLDDT and PAE scores and the corresponding colour coding in the legends of Fig. 3 and Fig. 4, respectively.

      From figure 3 legend:

      ‘(B) Cartoon representation showing two orientations of the trypanosome CPC, coloured by protein on the left (Aurora BAUK1: crimson, INCENPCPC1: green, CPC2: cyan, KIN-A: magenta, and KIN-B: yellow) or according to their pLDDT values on the right, assembled from AlphaFold2 predictions shown in Figure S3. The pLDDT score is a per-residue estimate of the confidence in the AlphaFold prediction on a scale from 0 – 100. pLDDT > 70 (blue, cyan) indicates a reasonable accuracy of the model, while pLDDT < 50 (red) indicates a low accuracy and often reflects disordered regions of the protein (Jumper et al., 2021). BS3 crosslinks in (B) were mapped onto the model using PyXlinkViewer (blue = distance constraints satisfied, red = distance constraints violated, Cα-Cα Euclidean distance threshold = 30 Å) (Schiffrin et al., 2020).’

      From Figure 4 legend:

      ‘(G) AlphaFold2 model of the KKT7 – KKT8 complex, coloured by protein (KKT71-261: green, KKT8: blue, KKT12: pink, KKT9: cyan and KKT11: orange) (left) and by pLDDT (center). BS3 crosslinks in (H) were mapped onto the model using PyXlinkViewer (Schiffrin et al., 2020) (blue = distance constraints satisfied, red = distance constraints violated, Cα-Cα Euclidean distance threshold = 30 Å). Right: Predicted Aligned Error (PAE) plot of model shown on the left (rank_2). The colour indicates AlphaFold’s expected position error (blue = low, red = high) at the residue on the x axis if the predicted and true structures were aligned on the residue on the y axis (Jumper et al., 2021).’

      (16) Fig 6 legend - Line 730 should say (F) not (C).

      Thank you for pointing out this typo.

      (17) Fig S1A - a key is missing for the colours. Fig S1B/C - cell outlines or a brightfield image are really needed here - see earlier comment. Fig S1D - there doesn't seem to be a method for how this tree was generated.

      See above response in public review regarding Fig. S1A and S1B/C. The tree in Fig. S1D is based on (Butenko et al., 2020).

      (18) Fig S2: A: how was protein knockdown validated (especially for CPC2 where there was little obvious phenotype)? Fig S2B: the y-axis should read proportion of cells, not percentage. Fig S2E - NLS should be labelled.

      Thank you for pointing out the mistake in the labelling.

      (19) Fig S3: PAE plots should be labelled with protein names, not A-E. Similarly, the pIDDT plots should be labelled as in Fig 4I.

      We have corrected the labelling in Fig. S3.

      (20) Fig S5A-D - cell cycle stage labels are missing from images.

      Thank you for pointing out the missing cell cycle stage labels.

      Addition by editor:

      In line 126 the statement that KIN-A and KIN-B "associate with Aurora-AUK1, INCENP-CPC1 and CPC2 throughout the cell cycle" seems too strong. There is no direct evidence for this. Please re-phrase as "likely associate" or "suggest... that ... may...".

      We have modified that sentence according to the editor’s suggestion.

      References:

      Akiyoshi, B., and K. Gull. 2014. Discovery of Unconventional Kinetochores in Kinetoplastids. Cell. 156. doi:10.1016/j.cell.2014.01.049.

      Butenko, A., F.R. Opperdoes, O. Flegontova, A. Horák, V. Hampl, P. Keeling, R.M.R. Gawryluk, D. Tikhonenkov, P. Flegontov, and J. Lukeš. 2020. Evolution of metabolic capabilities and molecular features of diplonemids, kinetoplastids, and euglenids. BMC Biology 2020 18:1. 18:1–28. doi:10.1186/S12915-020-0754-1.

      Cormier, A., D.G. Drubin, and G. Barnes. 2013. Phosphorylation regulates kinase and microtubule binding activities of the budding yeast chromosomal passenger complex in vitro. J Biol Chem. 288:23203–23211. doi:10.1074/JBC.M113.491480. Endow, S.A., F.J. Kull, and H. Liu. 2010. Kinesins at a glance. J Cell Sci. 123:3420. doi:10.1242/JCS.064113.

      Fink, S., K. Turnbull, A. Desai, and C.S. Campbell. 2017. An engineered minimal chromosomal passenger complex reveals a role for INCENP/Sli15 spindle association in chromosome biorientation. J Cell Biol. 216:911–923. doi:10.1083/JCB.201609123.

      van der Horst, A., M.J.M. Vromans, K. Bouwman, M.S. van der Waal, M.A. Hadders, and S.M.A. Lens. 2015. Inter-domain Cooperation in INCENP Promotes Aurora B Relocation from Centromeres to Microtubules. Cell Rep. 12:380–387. doi:10.1016/J.CELREP.2015.06.038.

      Ishii, M., and B. Akiyoshi. 2020. Characterization of unconventional kinetochore kinases KKT10/19 in Trypanosoma brucei. J Cell Sci. doi:10.1242/jcs.240978.

      Jeyaprakash, A.A., C. Basquin, U. Jayachandran, and E. Conti. 2011. Structural Basis for the Recognition of Phosphorylated Histone H3 by the Survivin Subunit of the Chromosomal Passenger Complex. Structure. 19:1625–1634. doi:10.1016/J.STR.2011.09.002.

      Jeyaprakash, A.A., U.R. Klein, D. Lindner, J. Ebert, E.A. Nigg, and E. Conti. 2007. Structure of a Survivin–Borealin–INCENP Core Complex Reveals How Chromosomal Passengers Travel Together. Cell. 131. doi:10.1016/j.cell.2007.07.045.

      Jumper, J., R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S.A.A. Kohl, A.J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A.W. Senior, K. Kavukcuoglu, P. Kohli, and D. Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 2021 596:7873. 596:583–589. doi:10.1038/s41586-021-03819-2.

      Kang, J.S., I.M. Cheeseman, G. Kallstrom, S. Velmurugan, G. Barnes, and C.S.M. Chan. 2001. Functional cooperation of Dam1, Ipl1, and the inner centromere protein (INCENP)-related protein Sli15 during chromosome segregation. J Cell Biol. 155:763–774. doi:10.1083/JCB.200105029.

      Klein, U.R., E.A. Nigg, and U. Gruneberg. 2006. Centromere targeting of the chromosomal passenger complex requires a ternary subcomplex of Borealin, Survivin, and the N-terminal domain of INCENP. Mol Biol Cell. 17:2547–2558. doi:10.1091/MBC.E05-12-1133.

      Komaki, S., E.C. Tromer, G. De Jaeger, N. De Winne, M. Heese, and A. Schnittger. 2022. Molecular convergence by differential domain acquisition is a hallmark of chromosomal passenger complex evolution. Proc Natl Acad Sci U S A. 119. doi:10.1073/PNAS.2200108119/-/DCSUPPLEMENTAL.

      Li, Z. 2012. Regulation of the Cell Division Cycle in Trypanosoma brucei. Eukaryot Cell. 11:1180. doi:10.1128/EC.00145-12.

      Li, Z., J.H. Lee, F. Chu, A.L. Burlingame, A. Günzl, and C.C. Wang. 2008. Identification of a Novel Chromosomal Passenger Complex and Its Unique Localization during Cytokinesis in Trypanosoma brucei. PLoS One. 3. doi:10.1371/journal.pone.0002354.

      Mackay, A.M., D.M. Eckley, C. Chue, and W.C. Earnshaw. 1993. Molecular analysis of the INCENPs (inner centromere proteins): separate domains are required for association with microtubules during interphase and with the central spindle during anaphase. J Cell Biol. 123:373–385. doi:10.1083/JCB.123.2.373.

      Marchetti, M.A., C. Tschudi, H. Kwon, S.L. Wolin, and E. Ullu. 2000. Import of proteins into the trypanosome nucleus and their distribution at karyokinesis. J Cell Sci. 113 ( Pt 5):899–906. doi:10.1242/JCS.113.5.899.

      Nakajima, Y., A. Cormier, R.G. Tyers, A. Pigula, Y. Peng, D.G. Drubin, and G. Barnes. 2011. Ipl1/Aurora-dependent phosphorylation of Sli15/INCENP regulates CPC-spindle interaction to ensure proper microtubule dynamics. J Cell Biol. 194:137–153. doi:10.1083/JCB.201009137.

      Noujaim, M., S. Bechstedt, M. Wieczorek, and G.J. Brouhard. 2014. Microtubules accelerate the kinase activity of Aurora-B by a reduction in dimensionality. PLoS One. 9. doi:10.1371/JOURNAL.PONE.0086786.

      Okada, Y., and N. Hirokawa. 1999. A processive single-headed motor: Kinesin superfamily protein KIF1A. Science (1979). 283:1152–1157. doi:10.1126/SCIENCE.283.5405.1152.

      Rice, S., A.W. Lin, D. Safer, C.L. Hart, N. Naber, B.O. Carragher, S.M. Cain, E. Pechatnikova, E.M. Wilson-Kubalek, M. Whittaker, E. Pate, R. Cooke, E.W. Taylor, R.A. Milligan, and R.D. Vale. 1999. A structural change in the kinesin motor protein that drives motility. Nature 1999 402:6763. 402:778–784. doi:10.1038/45483.

      Sablin, E.P., F.J. Kull, R. Cooke, R.D. Vale, and R.J. Fletterick. 1996. Crystal structure of the motor domain of the kinesin-related motor ncd. Nature 1996 380:6574. 380:555–559. doi:10.1038/380555a0.

      Samejima, K., M. Platani, M. Wolny, H. Ogawa, G. Vargiu, P.J. Knight, M. Peckham, and W.C. Earnshaw. 2015. The Inner Centromere Protein (INCENP) Coil Is a Single α-Helix (SAH) Domain That Binds Directly to Microtubules and Is Important for Chromosome Passenger Complex (CPC) Localization and Function in Mitosis. J Biol Chem. 290:21460–21472. doi:10.1074/JBC.M115.645317.

      Schiffrin, B., S.E. Radford, D.J. Brockwell, and A.N. Calabrese. 2020. PyXlinkViewer: A flexible tool for visualization of protein chemical crosslinking data within the PyMOL molecular graphics system. Protein Sci. 29:1851–1857. doi:10.1002/PRO.3902.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Sang et al. proposed a pair of IR60b-expressing pharyngeal neurons in Drosophila use IR25a, IR76b, and IR60b channels to detect high Na+ and limit its consumption. Some of the key findings that support this thesis are: 1) animals that lacked any one of these channels - or with their IR60b-expressing neurons selectively silenced - showed much reduced rejection of high Na+, but restored rejection when these channels were reintroduced back in the IR60b neurons; 2) animals with TRPV artificially expressed in their IR60b neurons rejected capsaicin-laced food whereas WT did not; 3) IR60b-expressing neurons exhibited increased Ca2+ influx in response to high Na+ and such response went away when animals lacked any of the three channels.

      Strengths:

      The experiments were thorough and well designed. The results are compelling and support the main claim. The development and the use of the DrosoX two-choice assay put forward for a more quantitative and automatic/unbiased assessment for ingestion volume and preference.

      Weaknesses:

      There are a few inconsistencies with respect the the exact role by which IR60b neurons limit high salt consumption and the contribution of external (labellar) high-salt sensors in regulating high salt consumption. These weaknesses do not significantly impact the main conclusion, however.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Sang et al. set out to identify gustatory receptors involved in salt taste sensation in Drosophila melanogaster. In a two-choice assay screen of 30 Ir mutants, they identified that Ir60b is required for avoidance of high salt. In addition, they demonstrate that activation of Ir60b neurons is sufficient for gustatory avoidance using either optogenetics or TRPV1 to specifically activate Ir60b neurons. Then, using tip recordings of labellar gustatory sensory neurons and proboscis extension response behavioral assays in Ir60b mutants, the authors demonstrate that Ir60b is dispensable for labellar taste neuron responses to high salt and the suppression of proboscis extension by high salt. Since external gustatory receptor neurons (GRNs) are not implicated, they look at Poxn mutants, which lack external chemosensory sensilla but have intact pharyngeal GRNs. High salt avoidance was reduced in Poxn mutants but was still greater than Ir60b mutants, suggesting that pharyngeal gustatory sensory neurons alone are sufficient for high salt avoidance. The authors use a new behavioral assay to demonstrate that Ir60b mutants ingest a higher volume of sucrose mixed with high salt than control flies do, suggesting that the action of Ir60b is to limit high salt ingestion. Finally, they identify that Ir60b functions within a single pair of gustatory sensory neurons in the pharynx, and that these neurons respond to high salt but not bitter tastants.

      Strengths:

      A great strength of this paper is that it rigorously corroborates previously published studies that have implicated specific Irs in salt taste sensation. It further introduces a new role for Ir60b in limiting high salt ingestion, demonstrating that Ir60b is necessary and sufficient for high salt avoidance and convincingly tracing the action of Ir60b to a particular subset of gustatory receptor neurons. Overall, the authors have achieved their aim by identifying a new gustatory receptor involved in limiting high salt ingestion. They use rigorous genetic, imaging, and behavioral studies to achieve this aim, often confirming a given conclusion with multiple experimental approaches. They have further done a great service to the field by replicating published studies and corroborating the roles of a number of other Irs in salt taste sensation. An aspect of this study that merits further investigation is how the same gustatory receptor neurons and Ir in the pharynx can be responsible for regulating the ingestion of both appetitive (sugar) and aversive tastants (high salt).

      A previous report published in eLife from John Carlson’s lab (Joseph et al, 2017) showed that the Ir60b GRN in the pharynx responds to sucrose resulting in sucrose repulsion. Thus, stimulation of this pharyngeal GRN results in gustatory avoidance only, not both attraction and avoidance. (lines 205-207)

      Weaknesses:

      There are several weaknesses that, if addressed, could greatly improve this work.

      (1) The authors combine the results and discussion but provide a very limited interpretation of their results. More discussion of the results would help to highlight what this paper contributes, how the authors interpret their results, and areas for future study.

      We agree and have now separated the Results and Discussion, and in so doing have greatly expanded discussion of the results.

      (2) The authors rename previously studied populations of labellar GRNs to arbitrary letters, which makes it difficult to understand the experiments and results in some places. These GRN populations would be better referred to according to the gustatory receptors they are known to express.

      One of the corresponding authors (Craig Montell) introduced this alternative GRN nomenclature in a review in 2021: Montell, C. (Drosophila sensory receptors—a set of molecular Swiss Army Knives. Genetics 217, 1-34) (Montell, 2021). We are not fans of referring to different classes of GRNs based on the receptors that they express since it is not obvious which receptors to use. For example, the GRNs that respond to bitter compounds all express multiple GR co-receptors. The same is true for the GRNs that respond to sugars. The former system of referring to GRNs simply as sugar, bitter, salt and water GRNs is also not ideal since the repertoire of chemicals that stimulates each class is complex. For example, the Class A GRNs (formerly sugar GRNs) are also activated by low Na+, glycerol, fatty acids, and acetic acid, while the B GRNs (former bitter GRNs) are also stimulated by high Na+, acids, polyamines, and tryptophan. In addition, there are five classes of GRNs. At first mention of the Class A—E GRNs, we mention the most commonly used former nomenclature of sugar, bitter, salt and water GRNs. In addition, for added clarify, we now also include a mention of one of the receptors that mark each class. (lines 51-59)

      (3) The conclusion that GRNs responsible for high salt aversion may be inhibited by those that function in low salt attraction is not well substantiated. This conclusion seems to come from the fact that overexpression of Ir60b in salt attraction and salt aversion sensory neurons still leads to salt aversion, but there need not be any interaction between these two types of sensory neurons if they act oppositely on downstream circuits.

      We did not make this claim.

      (4) The authors rely heavily on a new Droso-X behavioral apparatus that is not sufficiently described here or in the previous paper the authors cite. This greatly limits the reader's ability to interpret the results.

      We expanded the description of the apparatus in the Droso-X assay section of the Materials and Methods. (lines 588-631)

      Reviewer #3 (Public Review):

      Summary:

      Sang et al. successfully demonstrate that a set of single sensory neurons in the pharynx of Drosophila promotes avoidance of food with high salt concentrations, complementing previous findings on Ir7c neurons with an additional internal sensing mechanism. The experiments are well-conducted and presented, convincingly supporting their important findings and extending the understanding of internal sensing mechanisms. However, a few suggestions could enhance the clarity of the work.

      Strengths:

      The authors convincingly demonstrate the avoidance phenotype using different behavioral assays, thus comprehensively analyzing different aspects of the behavior. The experiments are straightforward and well-contextualized within existing literature.

      Weaknesses:

      Discussion

      While the authors effectively relate their findings to existing literature, expanding the discussion on the surprising role of Ir60b neurons in both sucrose and salt rejection would add depth. Additionally, considering Yang et al. 2021's (https://doi.org/10.1016/j.celrep.2021.109983) result that Ir60b neurons activate feeding-promoting IN1 neurons, the authors should discuss how this aligns with their own findings.

      Yang et al. demonstrated that the activation of Ir60b neurons can trigger the activation of IN1 neurons akin to pharyngeal multimodal (PM) neurons, potentially leading to enhanced feeding (Yang et al, 2021). However, our research reveals a specific pattern of activation for Ir60b neurons. Instead of being generalists, they are specialized for certain sugars, such as sucrose and high salt. Consequently, while Ir60b GRNs activate IN1 neurons, we contend that there are other neurons in the brain responsible for inhibiting feeding. (lines 412-417)

      Lines 187: The discussion primarily focuses on taste sensillae outside the labellum, neglecting peg-type sensillae on the inner surface. Clarification on whether these pegs contribute to the described behaviors and if the Poxn mutants described also affect the pegs would strengthen the discussion.

      We added the following to the Discussion section. “We also found that the requirement for Ir60b appears to be different when performing binary liquid capillary assay (DrosoX), versus solid food binary feeding assays. When we employed the DrosoX assay to test mutants that were missing salt aversive GRNs in labellar bristles but still retained functional Ir60b GRNs, the flies behaved the same as wild-type flies (e.g. Figure 3J and 3L). However, using solid food binary assays, Poxn mutants, which are missing labellar taste bristles but retain Ir60b GRNs (LeDue et al, 2015), displayed repulsion to high salt food that was intermediate between control flies and the Ir60b mutant (Figure 2J). Poxn mutants retain taste pegs (LeDue et al., 2015), and these hairless taste organs become exposed to food only when the labial palps open. We suggest that there are high-salt sensitive GRNs associated with taste pegs, which are accessed when the labellum contacts a solid substrate, but not when flies drink from the capillaries used in DrosoX assays. This explanation would also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but prefers 1 mM sucrose alone over 300 mM NaCl and 5 mM sucrose in the solid food binary assay (Figure 1B).”. (lines 430-444)

      In line 261 the authors state: "We attempted to induce salt activation in the I-type sensilla by ectopically expressing Ir60b, similar to what was observed with Ir56b 8; however, this did not generate a salt receptor (Figures S6A)"

      An obvious explanation would be that these neurons are missing the identified necessary co-receptors Ir76b and Ir25a. The authors should discuss here if the Gr33a neurons they target also express these co-receptors, if yes this would strengthen their conclusion that an additional receptor might be missing.

      We clarified this point in the Discussion section as follows, “An open question is the subunit composition of the pharyngeal high Na+ receptor, and whether the sucrose/glucose and Na+ receptors in the Ir60b GRN are the same or distinct. Our results indicate that the high salt sensor in the Ir60b GRN includes IR25a, IR60b and IR76b since all three IRs are required in the pharynx for sensing high levels of NaCl. I-type sensilla do not elicit a high salt response, and we were unable to induce salt activation in I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. This indicates that IR25a, IR60b and IR76b are insufficient for sensing high Na+. The inability to confer a salt response by ectopic expression of Ir60b was not due to absence of Ir25a and Ir76b in Gr33a GRNs since Gr33a and Gr66a are co-expressed (Moon et al, 2009), and Gr66a GRNs express Ir25a and Ir76b (Li et al, 2023). Thus, the high salt receptor in Ir60b GRNs appears to require an additional subunit. Given that Na+ and sugars are structurally unrelated, we suggest that the Na+ and sucrose/glucose receptors do not include the identical set of subunits, or that that they activate a common receptor through disparate sites”. (lines 464-477)

      Methods

      The description of the Droso-X assay seems to be missing some details. Currently, it is not obvious how the two-choice is established. Only one capillary is mentioned, I assume there were two used? Also, the meaning of the variables used in the equation (DrosoX and DrosoXD) are not explained.

      We expanded the description of the apparatus in the Droso-X assay section of the Materials and Methods. (lines 588-631)

      The description of the ex-vivo calcium imaging prep. is unclear in several points:

      (1) It is lacking information on how the stimulus was applied (was it manually washed in? If so how was it removed?).

      We expanded the description of the apparatus in the ex vivo calcium imaging section of the Materials and Methods. (lines 682-716)

      (2) The authors write: "A mild swallow deep well was prepared for sample fixation." I assume they might have wanted to describe a "shallow well"?

      We deleted the word “deep.”.(line 691)

      (3) "...followed by excising a small portion of the labellum in the extended proboscis region to facilitate tastant access to pharyngeal organs." It is not clear to me how one would excise a small portion of the labellum, the labellum depicts the most distal part of the proboscis that carries the sensillae and pegs. Did the authors mean to say that they cut a part of the proboscis?

      Yes. We changed the sentence to “…followed by excising a small portion of the extended proboscis to facilitate tastant access to the pharyngeal organs.”.(lines 693)-695

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In this manuscript, Sang et al. proposed a pair of IR60b-expressing pharyngeal neurons in Drosophila use IR25a, IR76b, and IR60b channels to detect high Na+ and limit its consumption. Some of the key findings that support this thesis are: 1) animals that lacked any one of these channels - or with their IR60b-expressing neurons selectively silenced - showed much reduced rejection of high Na+, but restored rejection when these channels were reintroduced back in the IR60b neurons; 2) animals with TRPV artificially expressed in their IR60b neurons rejected capsaicin-laced food whereas WT did not; 3) IR60b-expressing neurons exhibited increased Ca2+ influx in response to high Na+ and such response went away when animals lacked any of the three channels. In general, I find the collective evidence presented by the authors convincing. But I feel the MS can benefit from having a discussion session and a few simple experiments. Below I listed some inconsistencies I hope the authors can address or at least discuss.

      We have now added a Discussion section, and expanded the discussion.

      (1) The role of IR60b neurons on suppressing PER appeared inconsistent. On the one hand, optogenetic activation of these neurons suppressed PER (Fig 1D), on the other hand, IR60b mutants were as competent to suppress PER in response to high salt as WT (Fig 2G). Are pharyngeal neurons expected to modulate PER? It might be worth including a retinal-free or genotype control to ascertain the PER suppression exhibited by IR60b>CsChrimson is genuine.

      Please note that Figure 2G is now Figure 2H.

      Our interpretation is that activation of aversive GRNs by high salt either in labellar bristles or in the pharynx is sufficient to inhibit repulsion to high salt. Consistent with this conclusion, optogenetic activation of Ir60b GRNs, which are specific to the pharynx, is sufficient to reduce the PER to sucrose containing food (Figure 1D). However, mutation of Ir60b has no impact on the PER to sucrose plus high (300 mM) NaCl since the high-salt activated GRNs in labellar bristles are not impaired by the Ir60b mutation. In contrast, Ir25a and Ir76b are required in both labellar bristles and in the pharynx to reject high salt. As a consequence, mutation of either Ir25a or Ir76b impairs the repulsion to high salt. Thus, there is no inconsistency between the optogenetics and PER results. We clarified this point in the Discussion section. In terms of controls for IR60b>CsChrimson, we show that UAS-CsChrimson alone or UAS-CsChrimson in combination with the Gr5a driver has no impact on the PER (Figure 1D). In addition, we now include a retinal free control (Figure 1D). These findings provide the key genetic controls and are described in the Results section. (lines 167-170)

      (2) The role of labellar high-salt sensors in regulating salt intake appeared inconsistent. On the one hand, they appeared to have a role in limiting high salt consumption because poxn mutants were significantly more receptive to high salt than WT (Fig. 2J). On the other hand, selectively restoring IR76b or IR25a in only the IR60b neurons in these mutants - thus leaving the labellar salt sensors still defective - reverted the flies to behave like WT when given a choice between sucrose vs. sucrose+high salt (Fig 3J, L).

      We now offer an explanation for these seemingly conflicting results in the Discussion section. When we employed the DrosoX assay with mutants with functional Ir60b GRNs, but were missing salt aversive GRNs in labellar bristles, the flies behaved the same as control flies (e.g. Figure 3J and L). However, using solid food binary assays, Poxn mutants, which are missing labellar taste bristles but retain Ir60b GRNs (LeDue et al., 2015), display aversion high salt food intermediate between control and Ir60b mutant flies (Figure 2J). Poxn mutants retain taste pegs (LeDue et al., 2015), which are exposed to food substrates only when the labial palps open. We suggest that the taste pegs harbor high salt sensitive GRNs, and they may be exposed to solid substrates, but not to the liquid in capillary tubes used in the DrosoX assays. This explanation would also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but prefers 1 mM sucrose alone over 300 mM NaCl and 5 mM sucrose in the solid food binary assay (Figure 1B). (lines 433-444)

      (3) The behavior sensitivity of IR60b mutant to high salt again appeared somewhat inconsistent when assessed in the two different choice assays. IR60b mutant flies were indifferent to 300 mM NaCl when assayed with DrosoX (Fig 3A, B) but were clearly still sensitive to 300 mM NaCl when assayed with "regular" assay - they showed much reduced preference for 5 mM sucrose over 1 mM sucrose when the 5 mM sucrose was adulterated with 300 mM NaCl (Fig 1B).

      The explanation provided above may also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but not when selecting between 300 mM NaCl and 5 mM sucrose versus 1 mM sucrose in the solid food binary assay (Figure 1B). Alternatively, the different behavioral responses might be due to the variation in sucrose concentrations in each of these two assays, which employed 5 mM sucrose in the solid food binary assay, as opposed to 100 mM sucrose in the DrosoX assay. This disparity in attractive valence between these two concentrations of sucrose might consequently impact feeding amount and preference. This point is now also included in the Discussion section. (lines 441-449)

      (4) Given the IR60b neurons exhibited clear IR60b/IR25a/IR76b-dependent sucrose sensitivity, too, I am curious how the various mutant animals behave when given a choice between 100 mM sorbitol vs. 100 mM sorbitol + 300 mM NaCl, a food choice assay not complicated by the presence of sucrose. Similarly, I am curious if the Ca2+ response of IR60 neurons differs significantly when presented with 100 mM sucrose vs. when presented with 100 mM sucrose + 300 mM NaCl. In principle, the magnitude for the latter should be significantly larger than the former as animals appeared to be capable of discriminating these two choices solely relying on their IR60b neurons.

      To investigate the aversion induced by high salt in the absence of a highly attractive sugar, such as sucrose, we combined 300 mM salt with 100 mM sorbitol, which is a tasteless but nutritive sugar (Burke & Waddell, 2011; Fujita & Tanimura, 2011). Using two-way choice assays, we found that the Ir25a, Ir60b, and Ir76b mutants exhibited substantial reductions in high salt avoidance (Figure 3—figure supplement 2A). In addition, we performed DrosoX assays using 100 mM sorbitol alone, or sorbitol mixed with 300 mM NaCl. Sorbitol alone provoked less feeding than sucrose since it is a tasteless sugar (Figure 3—figure supplement 2B and C). Nevertheless, addition of high salt to the sorbitol reduced food consumption (Figure 3—figure supplement 2B and C). (lines 300-308)

      We also conducted a comparative analysis of the Ca2+ responses within the Ir60b GRN, examining its reaction to various stimuli, including 100 mM sucrose alone, 300 mM NaCl alone, and a combination of 100 mM sucrose and 300 mM NaCl. We found that the Ca2+ responses were significantly higher when we exposed the Ir60b GRN to 300 mM NaCl alone, compared with the response to 100 mM sucrose alone (Figure 4—figure supplement 1D). However, the GCaMP6f responses was not higher when we presented 100 mM sucrose with 300 mM NaCl, compared with the response to 300 mM NaCl alone (Figure 4—figure supplement 1D). (lines 360-367)

      Minor issues

      (1) The labels of sucrose concentration on Figure 2D were flipped.

      This has been corrected.

      (2) The phrasing of the sentence that begins in line 196 (i.e., "This suggests the internal sensor ...") is not as optimal.

      We changed the sentence to, “We found that the aversive behavior to high salt was reduced in the Poxn mutants relative to the control (Figure 2J), consistent with previous studies demonstrating roles for GRNs in labellar bristles in high salt avoidance (Jaeger et al, 2018; McDowell et al, 2022; Zhang et al, 2013).”. (lines 217-219)

      (3) In Line 231, I am not sure why the authors think ectopic expressing IR60b in labellar neurons would allow them to become activated by Na+. It seems highly unlikely to me, especially given IR60b also plays a role in sensing sugar.

      We added the following paragraph to the Discussion addressing this point, “An open question is the subunit composition of the pharyngeal high Na+ receptor, and whether the sucrose/glucose and Na+ receptors in the Ir60b GRN are the same or distinct. Our results indicate that the high salt sensor in the Ir60b GRN includes IR25a, IR60b and IR76b since all three IRs are required in the pharynx for sensing high levels of NaCl. I-type sensilla do not elicit a high salt response, and we were unable to induce salt activation in I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. This indicates that IR25a, IR60b and IR76b are insufficient for sensing high Na+. The inability to confer a salt response by ectopic expression of Ir60b was not due to absence of Ir25a and Ir76b in Gr33a GRNs since Gr33a and Gr66a are co-expressed (Moon et al., 2009), and Gr66a GRNs express Ir25a and Ir76b (Li et al., 2023). Thus, the high salt receptor in Ir60b GRNs appears to require an additional subunit. Given that Na+ and sugars are structurally unrelated, we suggest that the Na+ and sucrose/glucose receptors do not include the identical set of subunits, or that that they activate a common receptor through disparate sites.”. (lines 464-477)

      Reviewer #2 (Recommendations For The Authors):

      Line 41, acutely excessive salt ingestion can lead to death, not just health issues

      We now state that, “consumption of excessive salt can contribute to various health issues in mammals, including hypertension, osteoporosis, gastrointestinal cancer, autoimmune diseases, and can lead to death.”. (lines 41-43)

      Line 46, delete the comma after flies

      Done. (line 47)

      Lines 51-56: This description is unnecessarily confusing and does not cite proper sources. Renaming these GRNs arbitrarily can only create confusion, plus this description lacks nuance. If E GRNs are Ir94e positive, this description is out of date. Furthermore, If D GRNs are ppk23 and Gr66a positive then they will respond to both bitter and high salt.

      Papers to consult: https://elifesciences.org/articles/37167 10.1016/j.cell.2023.04.038

      We have now added citations. We prefer the A—E nomenclature, which was introduced in a 2021 Genetics review by one of the authors of this manuscript (Montell) (Montell, 2021) since naming different classes of GRNs on the basis of markers or as sweet, bitter, salt and water GRNs is misleading and an oversimplification. We cite the Genetics 2021 review, and for added clarity include both types of former names (markers and sweet, bitter, salt and water). Class D GRNs are not marked by Gr66a. The eLife reference cited above provided the initial rationale for stating that Class E GRNs are marked by Ir94e and activated by low salt. According to the Taisz et al reference (Cell 2023), the Class E GRNs, which are marked by Ir94e, are also activated by pheromones, which we now mention (Taisz et al, 2023). (lines 51-59)

      Line 62, E GRNs are not required for low salt behaviors

      We do not state that E GRNs are required for low salt behaviors, only that they sense low Na+ levels. (line 58)

      Line 70-81 - Great deal of emphasis on labellar GRNs but then no mention of how pharyngeal GRNs fit into categories A-E

      We devote the following paragraph to pharyngeal GRNs. We do not mention how they fit in with the A—E categories because it is not clear.

      “In addition to the labellum and taste bristles on other external structures, such as the tarsi, fruit flies are endowed with hairless sensilla on the surface of the labellum (taste pegs), and three internal taste organs lining the pharynx, the labral sense organ (LSO), the ventral cibarial sense organ (VCSO), and the dorsal cibarial sense organ (DCSO), which also function in the decision to keep feeding or reject a food (Chen & Dahanukar, 2017, 2020; LeDue et al., 2015; Nayak & Singh, 1983; Stocker, 1994). A pair of GRNs in the LSO express a member of the gustatory receptor family, Gr2a, and knockdown of Gr2a in these GRNs impairs the avoidance to slightly aversive levels of Na+ (Kim et al, 2017). Pharyngeal GRNs also promote the aversion to bitter tastants, Cu2+, L-canavanine, and bacterial lipopolysaccharides (Choi et al, 2016; Joseph et al., 2017; Soldano et al, 2016; Xiao et al, 2022). Other pharyngeal GRNs are stimulated by sugars and contribute to sugar consumption (Chen & Dahanukar, 2017; Chen et al, 2021; LeDue et al., 2015). Remarkably, a pharyngeal GRN in each of the two LSOs functions in the rejection rather the acceptance of sucrose (Joseph et al., 2017).”. (lines 74-89)

      Line 89, aversive --> aversion

      We changed this part.

      Line 90, gain of aversion capsaicin avoidance suggests they are sufficient for avoidance, not essential for avoidance.

      We changed “essential” to “sufficient.”. (line 100)

      Line 104, what are you recording from here? Labellar or pharyngeal GRNs

      We added “S-type and L-type sensilla” to the sentence. (line 119)

      Line 107, How are A GRNS marked with tdTomato? It is important to mention how you are defining A GRNs.

      We modified the sentence as follows: “Using Ir56b-GAL4 to drive UAS-mCD8::GFP, we also confirmed that the reporter was restricted to a subset of Class A GRNs, which were marked with LexAop-tdTomato expressed under the control of the Gr64f-LexA (Figure 1—figure supplement 1D—F).”. (lines 120-123)

      Line 124, should read "concentrated as sea water."

      We made the change. (line 142)

      Line 125, I am not sure what is meant by "alarm neurons"

      We changed “additional pain or alarm neurons” to “nociceptive neurons.”. (line 144)

      Line 141, Are you definitely A GRNs as only labellar GRNs, i.e. the Gr5a-GAL4 pattern with labellar plus few pharyngeal GRNs? Or are the defining it as Gr64f-GAL4 (i.e. labellar plus many pharyngeal GRNs)

      We refer to the Class A—E GRNs as labellar GRNs. Therefore, in this instance, we removed the reference to A GRNs and B GRNs, and simply mention the drivers that we used (Gr5a-GAL4 and Gr66a-GAL4) to express UAS-CsChrimson. The modified sentence is, “As controls we drove UAS-CsChrimson under control of either the Gr5a-GAL4 or the Gr66a-GAL4.”. (lines 51-59, 160-161)

      Line 180, labellar hairs--> labellar taste bristles

      We made the change. (line 204)

      Line 190, possess only --> only possess

      We made the change. (line 216)

      Line 202, Should this read increased?

      Yes. We changed “reduced” to “increased.”. (line 225)

      Line 206, The information provided here and in reference 47 was not sufficient for me to understand how the Droso-X system works and whether it has been validated. Better diagrams and much more description is required for the reader to understand this system and assess its validity

      We now explain that the DrosoX “system consists of a set of five separately housed flies, each of which is exposed to two capillary tubes with different liquid food options. One capillary contained 100 mM sucrose and the other contained 100 mM sucrose mixed with 300 mM NaCl. The volume of food consumed from each capillary is then monitored automatically over the course of 6 hours and recorded on a computer.”. (lines 238-243)

      Line 218-219, It would be helpful to expand on this to explain how the previous paper detected no difference. Is this because the contact time with the food is the same but the rate of ingestion is slower?

      Yes. This is correct. We now clarify this point by stating that, “In a prior study, it was observed that the repulsion to high salt exhibited by the Ir60b mutant was indistinguishable from wild-type (Joseph et al., 2017). Specifically, the flies were presented with drop of liquid (sucrose plus salt) at the end of a probe, and the Ir60b mutant flies fed on the food for the same period of time as control flies (Joseph et al., 2017). However, this assay did not discern whether or not the volume of the high salt-containing food consumed by the Ir60b mutant flies was reduced relative to control flies. Therefore, to assess the volume of food ingested, we used the DrosoX system, which we recently developed (Figure 3—figure supplement 1A) (Sang et al, 2021). This system consists of a set of five separately housed flies, each of which is exposed to two capillary tubes with different liquid food options. One capillary contained 100 mM sucrose and the other contained 100 mM sucrose mixed with 300 mM NaCl. The volume of food consumed from each capillary was then monitored automatically over the course of 6 hours and recorded on a computer. We found that control flies consuming approximately four times more of the 100 mM sucrose than the sucrose mixed with 300 mM NaCl (Figure 3A). In contrast, the Ir25a, Ir60b, and Ir76b mutants consumed approximately two-fold less of the sucrose plus salt (Figure 3A). Consequently, they ingested similar amounts of the two food options (Figure 3B; ingestion index). Thus, while the Ir60b mutant and control flies spend similar amounts of time in contact with high salt-containing food when it is the only option (Joseph et al., 2017), the mutant consumes considerably less of the high salt food when presented with a sucrose option without salt.”. (lines 226-251)

      Lines 231-235, Is this evidence for this, that Ir60b expression in the Ir25a or Ir76b pattern will induce high salt responses in the labellum? You should elaborate on this to clearly state what you mean rather than implying it. I do not think that overexpression of one Ir is enough evidence for this sweeping conclusion.

      We agree. We eliminated this point. (lines 227-232)

      Lines 261-263, Please elaborate here, how did you target the I-type sensilla and where are these neurons? So they already express Ir76b and Ir25a?

      We now explain in the Results that, “We attempted to induce salt activation in the I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. Gr33a is co-expressed with Gr66a (Moon et al., 2009), which has been shown to be co-expressed Ir25a and Ir76b (Li et al., 2023). When we performed tip recordings from I7 and I10 sensilla, we did not observe a significant increase in action potentials in response to 300 mM NaCl (Figure 4—figure supplement 1A), indicating that ectopic expression of Ir60b in combination with Ir25a and Ir76b is not sufficient to generate a high salt receptor.”. (lines 324-330)

      Lines 300-303, The discussion needs to be greatly expanded. What is the proposed mechanism by which the same neurons/receptors can inhibit sucrose and high salt feeding? What is the author's interpretation of what this study adds to our understanding of taste aversion?

      We have now added a Discussion section and greatly expanded the discussion.

      Reviewer #3 (Recommendations For The Authors):

      In line 73 there is a typo in "esophagus"

      We changed this part.

      In line 331, the use of a mixture of sucrose and "saponin" seems to be a mistake; "NaCl" is likely intended.

      We made the correction. (lines 546 and 640)

      On several occasions, the authors refer to the pharynx as a taste organ (for example 1st sentence of the abstract). I am not sure this is correct, the actual pharyngeal taste organs are the LSO, DSCO, and VSCO which are located in the pharynx.

      We made the corrections. (lines 24, 90, 92, 93, and 356)

      In line 155 the authors refer to Ir25a and Ir76b as "broadly tuned". I think it is not correct to refer to co-receptors this way, I'd suggest to just call them co-receptors.

      We made the correction. (lines 177-178)

      In line 182, stating "Gr2a is also expressed in the proboscis" is unclear. Clarify whether it refers to sensillae, pharyngeal taste organs, etc.

      We clarified it refers to pharyngeal taste organs. (lines 206-207)

      Line 253: "These finding imply that all three Irs are coexpressed in the pharynx." "The pharynx" is very unspecific, did the authors mean to say "the same neuron"?

      We now clarify by saying “in the Ir60b GRN in the pharynx.”. (line 317)

      Figures & Legends

      I found it confusing that the same color scale is being reused for different panels with different meanings repeatedly and in inconsistent ways. For example in Figure 2, red and blue are being used for Ir25a² mutants, while blue is also being used for Gr64f-Gal4 and S type sensilla. It is also not easily visible nor mentioned in the caption which of the 3 color scales presented belong to which panels.

      We modified the colors in the figures so that they are used in a consistent way. We now also define the colors in the legends.

      In Figure 2 F-I, indicating the stimulus sequence in each panel would enhance clarity. The color scale in Figure 3 could benefit from explicit explanations of different shades in the caption for easier interpretation.

      For example: "The ingestion of (a, dark color) 100 mM sucrose alone and (b, light color) in combination with 300 mM"

      We made the suggested modification.

      In Figure 4a the authors highlight that Ir76b and Ir25a label 2 neurons in the LSO. Did the imaging in 4c also capture the second cell, and if so did it respond to their stimulation?

      No, the focal plane differs, and the signal in Figure 4C is considerably weaker compared to the immunohistochemistry shown in Figure 4A. Notably, the other neuron did not exhibit a response to NaCl.

      In Figure 4f a legend for the color scale is missing, or the color might not be necessary at all. Also, the asterisks seem to be shifted to the right.

      We fixed the shifted asterisks and eliminated the color.

      Figure 4i is mislabeled 4f

      We made the correction.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __Summary __

      Geng et al. explore the molecular mechanisms underlying the role of KIF1C in RNA transport, focusing on how it interacts with RNA. KIF1C is shown to form dynamic puncta when overexpressed in COS-7 cells that do not appear to colocalise with organelle markers. An IDR in the tail of the kinesin is necessary and sufficient for the formation of these structures and FRAP experiments show that they can exchange their contents with proteins in the cytosol and that their formation can can be reversibly modulated by hypotonic shock, consistent with LLPS. In vitro, the IDR and flanking regions can undergo phase separation at physiologically relevant concentrations and salt conditions. In cells, KIF1C puncta enrich for RNAs and support their transport, and depletion of RNA modulates KIF1C LLPS properties. A model is proposed whereby KIF1C mediated RNA transport to the cell periphery promotes the formation of a protein-RNA condensate that may act to fine tune local RNA activity.

      __Major comments __

      In general, the claims made here are well-supported by the data. However, I think that some exploration of the extent of LLPS at different KIF1C expression levels in cells is important but missing. The authors carefully estimate the endogenous concentration of KIF1C in COS-7 cells (at around 25 nm), but is isn't clear how this compares to that observed in transient transfection experiments. Although this is partly addressed in the vitro assays, I am still left with some questions over the extent of this phenomenon in a cellular context. Can the authors provide some experimental evidence to support the proposition that LLPS occurs (perhaps in a more localised fashion?, as Fig.9) at lower KIF1C expression levels? One way to address this might be a GFP-knock-in (although how feasible this is may depend on the genomic context), alternatively, the authors could generate cell lines that express KIF1C-GFP from a very weak promoter, demonstrate LLPS using their established assays, show that this is comparable to endogenous expression.

      Response: We thank the reviewer for this suggestion. We have carried out additional experiments to explore the extent of KIF1C LLPS at endogenous levels. We used antibody against KIF1C to stain WT and KIF1C knockout (KO) cells. Although the antibody shows a high background of non-specific signal in the cytoplasm and nucleoplasm of both WT and KO cells, we were able to observe small puncta of KIF1C at the periphery of WT but not KO cells (new Figure 8). This finding supports our hypothesis that endogenous KIF1C undergoes LLPS upon reaching a high local concentration at the periphery of cells. Two lines of evidence support that these puncta of endogenous KIF1F protein are RNA-containing biomolecular condensates formed by LLPS (new Figure 8). First, these small puncta of endogenous KIF1C incorporate RAB13 mRNA, suggesting that they are RNA granules. Second, the puncta do not form in cells stably expressing KIF1C DIDR at near-endogenous levels.

      Minor comments

      Lines 107-109 and Figure 1B on localisation of other kinesin-3s. The authors state that they localise to certain organelle but don't show co-staining for those organelles.

      Response: The localization of other kinesin-3s to certain organelles has been shown in the cited literature. In response to the reviewer's request, we now verify these findings by staining cells expressing the other kinesin-3s for specific organelles (new Figure S1 A).

      Lines 172-183 and Figure 3. Evidence is provided through FRAP experiments that KIF1C puncta exchange with the cytosolic pool. However, the extent of recovery appears to saturate at Response: We agree that the data suggest the existence of an immobile pool of KIF1C within the condensates. We have added this information to the main text (lines 178-182). We note that these findings are consistent with recent studies demonstrating membrane-less organelles with at least partially solid-like properties, including nucleoli and stress granules as well as microtubule associated proteins (see references, reviewed in Van Treeck & Parker 2019).

      Line 238 - Fig. S5C is cited as data on endogenous concentration of KIF1C - this should be Fig. S6C.

      Response: Thank you. We have corrected this (now Fig S8 C).

      Line 331-332 - I did not fully follow the logic here the RNAse A injection experiment supports the idea that KIF1C interaction with RNA is sequence selective. Could the authors expand on this.

      Response: We thank the reviewer for this comment. We have rewritten the text (lines 235-238, 246-248).

      __Reviewer #1 (Significance (Required)): __

      This study introduces a new and exciting concept to motor protein biology: that some cytoskeletal motors and motor-cargo complexes can undergo phase separation, and that this is important for their function. The experiments are logical, progressive, and form a clear and compelling case. The main limitation is that demonstration of LLPS in cells is limited to over-expressed protein. Some exploration/demonstration of LLPS properties of KIF1C in cells at near to endogenous expression levels would enhance the study.

      The work should be of interest to a broad range of readers, from the cytoskeletal motor community, those interested in mRNA regulation, as well as scientists studying phase separation more generally.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This paper investigates mRNA transport by the kinesin Kif1C and tests the hypothesis that liquid condensation of the disordered C terminal region is important for mRNA recruitment. It is based on prior work from other labs showing that Kif1C recruits and transports a set of mRNAs to the periphery of cells. The mechanism of the KifC1-mRNA interaction was not investigated in the prior work, so the proposal that a liquid condensate is involved is novel. It is also topical, since there is intense current interest in transport and regulation of mRNAs by condensate-mediated mechanisms. The most useful part of this paper to the field may be the identification of IDR2 as required for mRNA binding in Fig 7.

      __Major comments __

      A major concern is reliance on expression of tagged KifC1 in Cos cells in several figures. The expression level in these experimental probably far exceeds normal, though this comparison is not reported. It is possibly justified to use over-expression to reveal a condensate mechanism, but it is concerning and the authors needs to strongly qualify their conclusions. One way to moderate this concern would be to examine condensation as a function of expression level.

      Response: We thank the reviewer for this suggestion. We have carried out additional experiments to explore the extent of KIF1C LLPS at endogenous levels. We used antibody against KIF1C to stain WT and KIF1C knockout (KO) cells. Although the antibody shows a high background of non-specific signal in the cytoplasm and nucleoplasm of both WT and KO cells, we were able to observe small puncta of KIF1C at the periphery of WT but not KO cells (new Figure 8). This finding supports our hypothesis that endogenous KIF1C undergoes LLPS upon reaching a high local concentration at the periphery of cells. Two lines of evidence support that these puncta of endogenous KIF1F protein are RNA-containing biomolecular condensates formed by LLPS (new Figure 8). First, these small puncta of endogenous KIF1C incorporate RAB13 mRNA, suggesting that they are RNA granules. Second, the puncta do not form in cells stably expressing KIF1C DIDR at near-endogenous levels.

      Another significant concern is that the biochemical reconstitution figure tests protein alone, not protein + RNA. Disordered RNA binding proteins usually phase separate better in the presence of RNA. The best reconstitution papers evaluate specificity of RNA recruitment to condensates. Specificity testing in a reconstituted system may not be required for a first paper, but testing the effect of some kind of RNA seems important.

      Response: The purified CC4+IDR and IDR constructs form condensates at low mM concentrations and in the absence of RNA or crowding agents, thus we did not test whether they would phase separate better in the presence of RNA. In response to the reviewer's comments, we now evaluate the specificity of RNA recruitment to the KIF1C condensates. We utilized the purified CC4+IDR protein and added the same GU-rich and polyA RNAs used in cells (now Fig 4 B) at different concentrations. Interestingly, there is selective incorporation of GU-rich oligos in condensates at low RNA concentrations, incorporation of both RNAs into condensates at medium concentrations, and an inhibition of condensate formation at high RNA concentrations (new Fig 7 E,F).

      A final concern is that the specificity of mRNA recruitment to Kif1C puncta in cells is not critically evaluated. Among endogenous mRNAs, only one (Rab13) is tested. The paper would be stronger with a second positive mRNA and a negative control mRNA.

      Response: We have now tested whether the specificity of mRNA recruitment to KIF1C puncta applies to additional mRNAs. We carried out single-molecule FISH (smFISH) experiments for two additional mRNAs. Based on the literature showing KIF1C-dependent localization of specific RNAs, we chose NET1 as a second positive mRNA and CAM1 as a negative control mRNA (Pichon et al., 2021). We first show that NET1 mRNA is mislocalized in KIF1C KO cells whereas CAM1 mRNA is not (new Fig S7 C,D). We then rescued the KO cells with FL or DIDR constructs and show that the FL protein rescues NET1 mRNA localization to the cell periphery whereas the DIDR construct does not (new Fig S7 E,F).

      __Reviewer #2 (Significance (Required)): __

      The mechanism of the KifC1-mRNA interaction was not investigated in the prior work, so the proposal that a liquid condensate is involved in novel. It is also topical, since there is intense current interest in transport and regulation of mRNAs by condensate-mediated mechanisms. The most useful part of this paper to the field may be the identification of IDR2 as required for mRNA binding.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      KIF1C is a member of the kinesin-3 family, which is responsible for fast organelle transport in cells. The cargos of KIFIC are diverse, such as Golgi apparatus, Rab6 vesicles, exon junction complex (EJC), integrins, and RNA. Mutations in the KIF1C coding sequence leads to neurodegenerative diseases, such as hereditary spastic paraparesis (HSP). In addition, as an RNA transporter, KIF1C transports various types of mRNAs (e. g., APC-dependent mRNAs, KIF1C's own mRNA) along the microtubules and clusters them to cytoplasmic protrusions to fulfill certain biological functions.

      In the current manuscript, Gen et.al., investigated the intracellular behaviors of the kinesin-3 member KIF1C. The study revealed that the KIF1C can form dynamic condensates both in cells and in vitro via an unstructured domain within the tail of the motor. KIF1C was found to also interact with synthesized RNA and other RNA granules in cells. In addition, the authors also show the KIFIC participates intracellular transport of endogenous mRNA, Rab13mRNA, identified a 47aa fragment in the KIF1C's IDR is critical for the KIF1C- Rab13mRNA interaction. Finally, as well as other prion-like proteins, the PPLS of KIF1C is buffered by the non-specific RNA pool in the cytoplasm.

      In summary, this is an interesting work in the field, and reveals novel results about the mechanisms of motor protein transport that will be broadly interesting. The assays are generally well performed, and the results and discussion are well described, but some descriptions in the article should be more rigorous and objective. The article is very long, and I think it would benefit from streamlining and reducing the number of figures to make it more accessible for non-specialists in the field.

      Here are some concerns:

      __Major: __

      Fig. 1A shows the domain organization of all kinesin-3 members, but Figure 1B only represents KIF1Bβ, KIF13B and KIF16B as controls. Generally, the KIF1Bα has the highest sequence similarity with KIF1C in kinesin-3 family (very high sequence similarity before aa 992 in KIF1C, which locates in IDR, probably contains IDR2a from Fig. S10A). In addition, both KIF1C and KIF1Bα contain a PLD from the prediction in this paper (Figure S2C). Although the authors show the phenotype of KIF1Bα in the Fig. S9, it might be better to put some descriptions up front, as readers may consider why the authors did not use KIF1Bα as a control. Actually, I kept thinking about this concern before I got to the discussion.

      Response: We thank the reviewer for this suggestion. We have moved the descriptions of KIF1Ba phenotypes to earlier in the manuscript. We show that KIF1Ba forms puncta in cells but unlike KIF1C, the KIF1Ba puncta do not colocalize with known RNA granules P-bodies or stress granules (now in Fig S5 B,C). We show that, unlike KIF1C, the KIF1Ba puncta do not incorporate GU-rich or polyA RNA (now in Fig S6 B).

      It would be better if the authors can combine the Fig. 2B and 2C, since the article did not mention Fig. 2B at all. In addition, Fig. S3 does not help this article too much. Probably it would be better if the authors could take the ΔIDR-mNG data from the Fig. S3 and put into the Fig. 2. as a negative control, especially for Fig. 2D an 2F. As for whether the phenotype of the ΔIDR-mNG construct is "similar to a constitutively active KIF1C construct containing only the motor domain (amino acids 1-348) (Fig. S3 C)", I do not think it is important here, since in this part, the authors are aiming to confirm the IDR is critical for KIF1C phase separation.

      Response: We have combined Figures 2B and 2C as suggested. We prefer to leave Figure S3 intact since, as the reviewer mentioned, the article is already long and these data are not critical for the story.

      The description "the condensate properties can be modulated by adjacent coiled-coil segments" in the abstract and the sentence "However, the coiled-coil segments in the stalk domain appear to facilitate puncta formation as the addition of increasing amounts of coiled coil resulted in increased KIF1C enrichment in puncta as compared to the IDR alone" in the article are not accurate, since there is no direct evidence in this manuscript that shows that. In Fig. 2D, as well as Fig. 6A and Fig. S7, it is manifest at a glance there are lots of IDR-mNG localized in nucleus, which decreases the concentration of this construct in cytoplasm which in turn may lower its capability to form puncta. This is important, as the results in Fig.4 show that the concentration of protein directly affects the formation of phase separated puncta in cells. From my view, the words "modulate", "tune" ... usually describe active processes, and these words may be confusing unless there are enough evidence support direct regulation. But the data presented in this article suggests to us that it is likely a passive process, such as the coiled coil region preventing the CC4-IDR construct from entering the nucleus (Fig. 2D, Fig. 6A and Fig. S7). Moreover, CC4 does affect the critical concentration of IDR in vitro (Fig. 5E), but that could be attributed to the coiled coil domain increasing its solubility. I like the word "influence" used in a subtitle in the discussion portion.

      Response: We have removed this from the text.

      In addition, the in vitro study of this paper in Fig. 5 did not show any significant difference of the puncta formation between IDR-mNG and CC4 - IDR-mNG (Diameter: 0.43 {plus minus} 0.22 μm (mean {plus minus} STD) for IDR-mNG vs 0.48 {plus minus} 0.27 μm (mean {plus minus} STD) for CC4-IDR-mNG. Roundness: No value was show in the article). So, a stricter assay or a more accurate description is required here to avoid any misleading to the readers.

      Response: We now include p values showing that the differences in diameter and roundness are statistically significant (data moved to Fig S8 B).

      The description for Fig. 5 "At 2 uM protein concentration and 100 mM NaCl, the KIF1C(IDR) droplets were smaller [diameter 0.43 {plus minus} 0.22 μm (mean {plus minus} STD)] than KIF1C(CC4+IDR) droplets [0.48 {plus minus} 0.27 μm (mean {plus minus} STD)] (Fig. 5 C)" does not appear accurate as well, since there is no significant difference between the value 0.43 {plus minus} 0.22 μm and the value 0.48 {plus minus} 0.27 μm, so it should not be descripted as "smaller". In addition, the article mentioned that "The KIF1C(IDR) puncta were also less round than those of KIF1C(CC4+IDR) (Fig. 5 C)", but there is no corresponding value from the quantification show the KIF1C(IDR) is less round.

      Response: We now include p values showing that the differences in diameter and roundness are statistically significant (data moved to Fig S8 B).

      The description in sentence "We thus tested whether ... LLPS is mutually exclusive (Fig. S5 A)" may not be accurate. Results in Fig. S5 only show there is no direct interaction between KIF1C and CLIP-170 or these two proteins do not colocalize. The words "mutually exclusive" means two proteins competent each other in the same location from my understanding.

      Response: We have replaced the words "mutually exclusive" with "no colocalization" (line 204).

      In addition, is it necessary to put Fig. S5 into this article? Since from my side, it does not help too much for the whole story. In cells, kinesin motors are autoinhibited in the cytoplasm. For this KIF1C, most of motors appear autoinhibited as well, even when the authors removed the IDR based on Fig. S3C (ΔIDR-mNG vs. MD-mNG). In this case, it is hard to investigate the potential interaction between the KIF1C (or its ΔIDR mutant) with the microtubules or with the tubulin due to the autoinhibition of constructs used in Fig. S5. It would be better to use other active versions of KIF1C, such as ΔP (Soppina et. al., PNAS, 2014) or other mutants (Ren et. al., PNAS, 2018; Wang et. al., Nat. Commun., 2022) if the authors want to show this part in the article.

      Response: We agree that this data is not essential for the story, however, it may be of interest and benefit to others in the field studying LLPS of microtubule-associated proteins and we prefer to leave Figure S5 (now Figure S4) in the supplementary information.

      The conclusion "This result suggests that the IDR- driven LLPS of KIF1C does not depend on mRNA incorporation, but is strongly affected by it" may not be accurate, there is no direct evidence that shows that mRNA, at least Rab13mRNA incorporation strongly affects the IDR- driven LLPS of KIF1C. Perhaps a knock out of Rab13mRNA would alter the formation of condensates, which would support a direct effect on LLPS.

      Response: We have changed the text (line 306).

      In addition, the sentence "These results also show that the LLPS is resistant to truncations of large portions of IDR" may not accurate, from my view, except IDR2a, the rest of the IDR may not participate or contribute too much to the formation of puncta, but that doesn't mean LLPS is resistant to the truncation of these portions in IDR, these are different logics. The quantification from Fig. 7E also show there is no significant difference between the ST and truncations except ΔIDR2 and ΔIDR2a in statistics, such as ST (21.8 {plus minus} 12.0 puncta per cell, 2.06 {plus minus} 0.83 μm diameter), ΔPLD (20.1 {plus minus} 13.7 puncta per cell, 1.62 {plus minus} 0.94 μm diameter), ΔIDR1 (23.1 {plus minus} 14.3 puncta per cell, 2.13 {plus minus} 1.04 μm diameter), ΔIDR3 (18.4 {plus minus} 8.1 puncta per cell, 1.71 {plus minus} 0.98 μm diameter).

      Response: We have changed the text (line 307).

      I am not sure I agree with the author's interpretation of their FRAP data in Fig. 3. It appears to me that there is a large immobile population of molecules, as the bleached areas recover less than 50% of their initial intensity. However, the authors conclude that there is rapid exchange of molecules in the puncta. The authors need to further analyze and discuss both the exchange rate of the population of molecules that exchange, but also the fraction of apparently immobile molecules that do not recover in their experiments. These data appear to suggest that a large percentage of the molecules in the KIF1C puncta in fact do not exchange with the cytoplasm and undermine their argument for a liquid-like phase of the puncta.

      Response: We agree that the data suggest the existence of an immobile pool of KIF1C within the condensates. We have added this information to the main text (lines 178-182). We note that these findings are consistent with recent studies demonstrating membrane-less organelles with at least partially solid-like properties, including nucleoli and stress granules as well as microtubule associated proteins (see references, reviewed in Van Treeck & Parker 2019).

      __Minor: __

      As mentioned above, Fig. 2 F needs a negative control, since the values of FL and IDR are lower than other constructs, maybe use the Δ IDR-mNG protein is better. In addition, from my view, the lower value of IDR construct does not represent this construct has lower capability to form puncta, but more likely because most of this protein localizes in nucleus, thus dramatically lowering the cytoplasmic concentration.

      Response: We have changed the text as suggested (lines 152-154).

      Fig. 6A probably need a negative control as well, maybe use the same construct ΔIDR in Fig. S7 is better.

      Response: We have now included KIF1Ba as a negative control (Fig S6 B).

      Although I guess the reason for using hTERT-RPE1 cells in Rab13mRNA rescue assay (Fig. 6D-G) probably is easier to get KIF1C knock out cells (if I am correct), it would be better if there is a brief introduction for the reason to use hTERT-RPE1 here, since all previous assay in the article used COS-7 cells.

      Response: You are correct and we have added text introducing the use of hTERT-RPE1 cells (line 269).

      Is there any specific reason to use the construct ST in Fig. 7? Since in Fig. 6, the authors used FL-length KIFIC, if the authors want to avoid any effects caused by motor domain, the construct CC4-IRD also could be a simpler candidate.

      Response: No specific reason other than to be consistent as most experiments that we carried out in cells used the ST construct (e.g. FRAP assay in Fig 3, hypotonic assay in Fig 3, RNaseA injection in Fig 4, RNA incorporation in Fig 4). (Note that Fig 7 is now Fig 6).

      This article is a great case for motor-cargo interaction, since the RNA binding site of KIF1C is within its tail domain. This left me curious about if the interaction between the KIF1C and the membrane-less RNA granule is sufficient to release the KIF1C motor from autoinhibition? I guess the binding of RNA is not enough to release the KIF1C from autoinhibition. From Fig. S3C and Fig. 6D, seems the motor still in autoinhibition, even remove the Rab13mRNA binding region.

      Response: We believe the question of whether the RNA binding relieves autoinhibition of KIF1C is beyond the scope of this manuscript and we plan to address this in the future with recombinant full-length KIF1C and RAB13 mRNAs.

      There are some grammar mistakes, e.g., There should be a "is" between "IDR" and "critical" in the title "A subregion of the KIF1C IDR critical for enrichment of Rab13mRNA in condensates".

      Response: Thank you. We have corrected this (line 289).

      There should be a definition for the full names of the abbreviate "RBD" mentioned in the article although the readers may guess that is an RNA binding domain, if possible, it would be better but not necessary if the authors could show the residues or the region in IDR.

      Response: RBD is defined at the beginning to the section "KIF1C condensates display properties of RNA granules" (line 219) but in response to the reviewer's comment, we now include this definition a second time in the Discussion section (line 420).

      In the results (line 126), the authors refer to the KIF1C IDR without first defining this region in the introduction. I would re-word this sentence for clarity by first defining what an IDR is and how it's assessed in the current study.

      Response: The IDR is defined at the end of the Introduction (lines 94-95).

      What is the significance of the roundness measurement in Fig. 5? This should be described for the reader.

      Response: Roundness refers to the shape of the droplet and this is now included in the text (line 323, data moved to Fig S8 B).

      The authors state several times that this is the first kinesin shown to undergo LLPS. However, is this true? What about the recent work showing that the yeast Tea2 kinesin undergoes LLPS with other +TIP components (Maan et al. NCB 2023).

      Response: We thank the reviewer for this comment. The recent work from the Dogterom lab (Maan et al., 2023) demonstrates that the end binding (EB) protein Mal3 forms condensates alone and with the kinesin-7 family member Tea2 and its cargo Tip1 for enrichment at microtubule plus ends. The authors show images of Mal 3 droplets and the requirement of the IDR domain and the crowding agent polyethylene glycol for droplet formation. The authors state that "Tea2 and Tip1 formed condensates under similar crowding conditions and concentrations on their own (Extended Data Fig. 5)." However, Extended Data Fig 5 reports on the fluorescence intensity of Mal3-EGFP colocalizing with Tea2 or Tip1. No images of Tea2-only droplets are shown and no information is provided on the Tea2 and/or PEG concentrations required for droplet formation or the liquid nature of Tea2 droplets. Thus, we do not feel comfortable stating that Tea2 on its own undergoes LLPS. We do reference the Maan et al., 2023 work in the Discussion listing microtubule-associated proteins shown to undergo LLPS (line 403) and when comparing the mM concentrations of KIF1C required for LLPS to the mM concentrations of these other microtubule-associated proteins (line 417).

      The authors don't discuss KIF5A, but their analysis reveals it also contains a low complexity region that may undergo LLPS (Fig. S2D). This would fit with recent reports that KIF5A tends to oligomerize more than other KIF5 isoforms, and that mutations in KIF5A that impact the tail domain may lead to aberrant oligomerization. I feel that it would be useful to the field for the authors to discuss these results in light of their own.

      Response: We thank the reviewer for this suggestion. Although it is intriguing that KIF5A is predicted to contain an IDR, there is, however, no data to suggest that KIF5A undergoes LLPS. Rather, the current literature suggests that KIF5A undergoes higher-order oligomerization and accumulation at the cell periphery, especially for the isoform lacking exon 27 (Nakano et al., 2022, Baron et al., 2022, Pant et al., 2023, Soustelle et al., 2023). It thus does not seem prudent for us to speculate on whether or not KIF5A undergoes LLPS.

      __Reviewer #3 (Significance (Required)): __

      The study is novel and interesting and will be impactful for the cytoskeletal and RNA biology communities. The experiments are of high quality and controls are appropriate. The finding that motor proteins can participate in LLPS will be of high interest for a variety of fields and provides a very interesting advance over current knowledge in the field.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      KIF1C is a member of the kinesin-3 family, which is responsible for fast organelle transport in cells. The cargos of KIFIC are diverse, such as Golgi apparatus, Rab6 vesicles, exon junction complex (EJC), integrins, and RNA. Mutations in the KIF1C coding sequence leads to neurodegenerative diseases, such as hereditary spastic paraparesis (HSP). In addition, as an RNA transporter, KIF1C transports various types of mRNAs (e. g., APC-dependent mRNAs, KIF1C's own mRNA) along the microtubules and clusters them to cytoplasmic protrusions to fulfill certain biological functions.

      In the current manuscript, Gen et.al., investigated the intracellular behaviors of the kinesin-3 member KIF1C. The study revealed that the KIF1C can form dynamic condensates both in cells and in vitro via an unstructured domain within the tail of the motor. KIF1C was found to also interact with synthesized RNA and other RNA granules in cells. In addition, the authors also show the KIFIC participates intracellular transport of endogenous mRNA, Rab13mRNA, identified a 47aa fragment in the KIF1C's IDR is critical for the KIF1C- Rab13mRNA interaction. Finally, as well as other prion-like proteins, the PPLS of KIF1C is buffered by the non-specific RNA pool in the cytoplasm.

      In summary, this is an interesting work in the field, and reveals novel results about the mechanisms of motor protein transport that will be broadly interesting. The assays are generally well performed, and the results and discussion are well described, but some descriptions in the article should be more rigorous and objective. The article is very long, and I think it would benefit from streamlining and reducing the number of figures to make it more accessible for non-specialists in the field.

      Here are some concerns:

      Major:

      1. Fig. 1A shows the domain organization of all kinesin-3 members, but Figure 1B only represents KIF1Bβ, KIF13B and KIF16B as controls. Generally, the KIF1Bα has the highest sequence similarity with KIF1C in kinesin-3 family (very high sequence similarity before aa 992 in KIF1C, which locates in IDR, probably contains IDR2a from Fig. S10A). In addition, both KIF1C and KIF1Bα contain a PLD from the prediction in this paper (Figure S2C). Although the authors show the phenotype of KIF1Bα in the Fig. S9, it might be better to put some descriptions up front, as readers may consider why the authors did not use KIF1Bα as a control. Actually, I kept thinking about this concern before I got to the discussion.
      2. It would be better if the authors can combine the Fig. 2B and 2C, since the article did not mention Fig. 2B at all. In addition, Fig. S3 does not help this article too much. Probably it would be better if the authors could take the ΔIDR-mNG data from the Fig. S3 and put into the Fig. 2. as a negative control, especially for Fig. 2D an 2F. As for whether the phenotype of the ΔIDR-mNG construct is "similar to a constitutively active KIF1C construct containing only the motor domain (amino acids 1-348) (Fig. S3 C)", I do not think it is important here, since in this part, the authors are aiming to confirm the IDR is critical for KIF1C phase separation.
      3. The description "the condensate properties can be modulated by adjacent coiled-coil segments" in the abstract and the sentence "However, the coiled-coil segments in the stalk domain appear to facilitate puncta formation as the addition of increasing amounts of coiled coil resulted in increased KIF1C enrichment in puncta as compared to the IDR alone" in the article are not accurate, since there is no direct evidence in this manuscript that shows that. In Fig. 2D, as well as Fig. 6A and Fig. S7, it is manifest at a glance there are lots of IDR-mNG localized in nucleus, which decreases the concentration of this construct in cytoplasm which in turn may lower its capability to form puncta. This is important, as the results in Fig.4 show that the concentration of protein directly affects the formation of phase separated puncta in cells. From my view, the words "modulate", "tune" ... usually describe active processes, and these words may be confusing unless there are enough evidence support direct regulation. But the data presented in this article suggests to us that it is likely a passive process, such as the coiled coil region preventing the CC4-IDR construct from entering the nucleus (Fig. 2D, Fig. 6A and Fig. S7). Moreover, CC4 does affect the critical concentration of IDR in vitro (Fig. 5E), but that could be attributed to the coiled coil domain increasing its solubility. I like the word "influence" used in a subtitle in the discussion portion.

      In addition, the in vitro study of this paper in Fig. 5 did not show any significant difference of the puncta formation between IDR-mNG and CC4 - IDR-mNG (Diameter: 0.43 {plus minus} 0.22 μm (mean {plus minus} STD) for IDR-mNG vs 0.48 {plus minus} 0.27 μm (mean {plus minus} STD) for CC4-IDR-mNG. Roundness: No value was show in the article). So, a stricter assay or a more accurate description is required here to avoid any misleading to the readers.

      The description for Fig. 5 "At 2 uM protein concentration and 100 mM NaCl, the KIF1C(IDR) droplets were smaller [diameter 0.43 {plus minus} 0.22 μm (mean {plus minus} STD)] than KIF1C(CC4+IDR) droplets [0.48 {plus minus} 0.27 μm (mean {plus minus} STD)] (Fig. 5 C)" does not appear accurate as well, since there is no significant difference between the value 0.43 {plus minus} 0.22 μm and the value 0.48 {plus minus} 0.27 μm, so it should not be descripted as "smaller". In addition, the article mentioned that "The KIF1C(IDR) puncta were also less round than those of KIF1C(CC4+IDR) (Fig. 5 C)", but there is no corresponding value from the quantification show the KIF1C(IDR) is less round. 4. The description in sentence "We thus tested whether ... LLPS is mutually exclusive (Fig. S5 A)" may not be accurate. Results in Fig. S5 only show there is no direct interaction between KIF1C and CLIP-170 or these two proteins do not colocalize. The words "mutually exclusive" means two proteins competent each other in the same location from my understanding.

      In addition, is it necessary to put Fig. S5 into this article? Since from my side, it does not help too much for the whole story. In cells, kinesin motors are autoinhibited in the cytoplasm. For this KIF1C, most of motors appear autoinhibited as well, even when the authors removed the IDR based on Fig. S3C (ΔIDR-mNG vs. MD-mNG). In this case, it is hard to investigate the potential interaction between the KIF1C (or its ΔIDR mutant) with the microtubules or with the tubulin due to the autoinhibition of constructs used in Fig. S5. It would be better to use other active versions of KIF1C, such as ΔP (Soppina et. al., PNAS, 2014) or other mutants (Ren et. al., PNAS, 2018; Wang et. al., Nat. Commun., 2022) if the authors want to show this part in the article. 5. The conclusion "This result suggests that the IDR- driven LLPS of KIF1C does not depend on mRNA incorporation, but is strongly affected by it" may not be accurate, there is no direct evidence that shows that mRNA, at least Rab13mRNA incorporation strongly affects the IDR- driven LLPS of KIF1C. Perhaps a knock out of Rab13mRNA would alter the formation of condensates, which would support a direct effect on LLPS.

      In addition, the sentence "These results also show that the LLPS is resistant to truncations of large portions of IDR" may not accurate, from my view, except IDR2a, the rest of the IDR may not participate or contribute too much to the formation of puncta, but that doesn't mean LLPS is resistant to the truncation of these portions in IDR, these are different logics. The quantification from Fig. 7E also show there is no significant difference between the ST and truncations except ΔIDR2 and ΔIDR2a in statistics, such as ST (21.8 {plus minus} 12.0 puncta per cell, 2.06 {plus minus} 0.83 μm diameter), ΔPLD (20.1 {plus minus} 13.7 puncta per cell, 1.62 {plus minus} 0.94 μm diameter), ΔIDR1 (23.1 {plus minus} 14.3 puncta per cell, 2.13 {plus minus} 1.04 μm diameter), ΔIDR3 (18.4 {plus minus} 8.1 puncta per cell, 1.71 {plus minus} 0.98 μm diameter). 6. I am not sure I agree with the author's interpretation of their FRAP data in Fig. 3. It appears to me that there is a large immobile population of molecules, as the bleached areas recover less than 50% of their initial intensity. However, the authors conclude that there is rapid exchange of molecules in the puncta. The authors need to further analyze and discuss both the exchange rate of the population of molecules that exchange, but also the fraction of apparently immobile molecules that do not recover in their experiments. These data appear to suggest that a large percentage of the molecules in the KIF1C puncta in fact do not exchange with the cytoplasm and undermine their argument for a liquid-like phase of the puncta.

      Minor:

      1. As mentioned above, Fig. 2 F needs a negative control, since the values of FL and IDR are lower than other constructs, maybe use the Δ IDR-mNG protein is better. In addition, from my view, the lower value of IDR construct does not represent this construct has lower capability to form puncta, but more likely because most of this protein localizes in nucleus, thus dramatically lowering the cytoplasmic concentration.
      2. Fig. 6A probably need a negative control as well, maybe use the same construct ΔIDR in Fig. S7 is better.
      3. Although I guess the reason for using hTERT-RPE1 cells in Rab13mRNA rescue assay (Fig. 6D-G) probably is easier to get KIF1C knock out cells (if I am correct), it would be better if there is a brief introduction for the reason to use hTERT-RPE1 here, since all previous assay in the article used COS-7 cells.
      4. Is there any specific reason to use the construct ST in Fig. 7? Since in Fig. 6, the authors used FL-length KIFIC, if the authors want to avoid any effects caused by motor domain, the construct CC4-IRD also could be a simpler candidate.
      5. This article is a great case for motor-cargo interaction, since the RNA binding site of KIF1C is within its tail domain. This left me curious about if the interaction between the KIF1C and the membrane-less RNA granule is sufficient to release the KIF1C motor from autoinhibition? I guess the binding of RNA is not enough to release the KIF1C from autoinhibition. From Fig. S3C and Fig. 6D, seems the motor still in autoinhibition, even remove the Rab13mRNA binding region.
      6. There are some grammar mistakes, e.g., There should be a "is" between "IDR" and "critical" in the title "A subregion of the KIF1C IDR critical for enrichment of Rab13mRNA in condensates".
      7. There should be a definition for the full names of the abbreviate "RBD" mentioned in the article although the readers may guess that is an RNA binding domain, if possible, it would be better but not necessary if the authors could show the residues or the region in IDR.
      8. In the results (line 126), the authors refer to the KIF1C IDR without first defining this region in the introduction. I would re-word this sentence for clarity by first defining what an IDR is and how it's assessed in the current study.
      9. What is the significance of the roundness measurement in Fig. 5? This should be described for the reader.
      10. The authors state several times that this is the first kinesin shown to undergo LLPS. However, is this true? What about the recent work showing that the yeast Tea2 kinesin undergoes LLPS with other +TIP components (Maan et al. NCB 2023).
      11. The authors don't discuss KIF5A, but their analysis reveals it also contains a low complexity region that may undergo LLPS (Fig. S2D). This would fit with recent reports that KIF5A tends to oligomerize more than other KIF5 isoforms, and that mutations in KIF5A that impact the tail domain may lead to aberrant oligomerization. I feel that it would be useful to the field for the authors to discuss these results in light of their own.

      Significance

      The study is novel and interesting and will be impactful for the cytoskeletal and RNA biology communities. The experiments are of high quality and controls are appropriate. The finding that motor proteins can participate in LLPS will be of high interest for a variety of fields and provides a very interesting advance over current knowledge in the field.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript by DeHaro-Arbona et al., the authors wish to understand how a signaling pathway (Notch) is dynamically decoded to elicit a specific transcriptional output. In particular, they investigate the kinetic properties of Notch-responsive nuclear complexes (the DNA binding factor CSL and its co-activator Mastermind (mam) along with several candidate interacting partners). Their experimental model is the polytene chromosome of the Drosophila salivary gland, in which the naturally inactive Notch can be artificially induced through the expression of a constitutively active form of Notch.

      The authors develop a series of CRISPR and transgenic lines enabling the live imaging of these complexes at a specific locus and in various backgrounds (genetic perturbations/drug treatments). This quantitative live imaging data suggests that Notch nuclear complexes form hubs, and the authors characterize their binding dynamics. Interestingly, they elegantly demonstrate that the content of these hubs and their kinetic properties can evolve, even within Notch ON cells. Hence, they propose the existence of distinct hubs, distinguishing an open (CSL), engaged (CSK-Mam), or active (CSL-Mam-Med-PolII) configuration in Notch ON cells and an inactive hub (in Notch OFF having previously been exposed to Notch) state, that would explain the surprising transcriptional memory that the authors observe hours after Notch withdrawal.

      We thank the reviewer for this constructive summary of our work

      Reviewer #2 (Public Review):

      The manuscript from deHaro-Arbona et al, entitled "Dynamic modes of Notch transcription hubs conferring memory and stochastic activation revealed by live imaging the co-activator Mastermind", uses single molecule microscopy imaging in live tissues to understand the dynamics and molecular determinants of transcription factor recruitment to the E(spl)-C locus in Drosophila salivary gland cells under Notch-ON and -OFF conditions. Previous studies have identified the major players that are involved in transcription regulation in the Notch pathway, as well as the importance of general transcriptional coregulators, such as CBP/P300 and the Mediator CDK module, but the detailed steps and dynamics involved in these processes are poorly defined. The authors present a wealth of single molecule data that provides significant insights into Notch pathway activation, including:

      (1) Activation complexes, containing CSL and Mam, have slower dynamics than the repressor complexes, containing CSL and Hairless.

      (2) Contribution of CSL, NICD, and Mam IDRs to recruitment.

      (3) CSL-Mam slow-diffusing complexes are recruited and form a hub of high protein concentrations around the target locus in Notch-ON conditions.

      (4) Mam recruitment is not dependent on transcription initiation or RNA production.

      (5) CBP/P300 or its associated HAT activity is not required for Mam recruitment.

      (6) Mediator CDK module and CDK8 activity are required for Mam recruitment, and vice-versa, but not CSL recruitment.

      (7) Mam is not required for chromatin accessibility but is dependent on CSL and NICD.

      (8) CSL recruitment and increased chromatin accessibility persist after NICD removal and loss of Mam, which confers a memory state that enables rapid re-activation in response to subsequent Notch activation.

      (9) Differences in the proportions of nuclei with both Pol II and with Mam enrichment, which results in transcription being probabilistic/stochastic. These data demonstrate that the presence of Mamcomplexes is not sufficient to drive all the steps required for transcription in every Notch-ON nucleus.

      (10) The switch from more stochastic to robust transcription initiation was elicited when ecdysone was added.

      Overall, the manuscript is well written, concise, and clear, and makes significant contributions to the Notch field, which are also important for a general understanding of transcription factor regulation and behavior in the nucleus. I recommend that the authors address my relatively minor criticisms detailed below.

      We thank the reviewer for their thorough and constructive summary of our work. We are glad that they overall found it insightful and interesting. Below we have addressed the points they have raised.

      Page 7, bottom. The authors speculate, "It is possible therefore that, once recruited, Mam can be retained at target loci independently of CSL by interactions with other factors so that it resides for longer." Is it possible that another interpretation of that data is that Mam is a limiting factor?

      As indicated our comment is a speculation and is based on the observations summarized in the paragraph. We are not entirely sure what the reviewer is proposing as an alternate model. However, if it relates to the relative concentrations of the different factors, this would not account for the differences in trajectory durations. And for most aspects of our analysis, K[off] has the most profound influence on the results. Furthermore, differences persist even when CSL levels are considerably reduced (as in conditions with Hairless RNAi).

      Page 9. The authors write, "A very low level of enrichment was evident for... for the CSL Cterminus..". The recruitment of CSL ct IDR does not appear to be statistically significant or there is no apparent difference (Figure S2C), suggesting the CSL ct IDR does not play a role in enrichment.

      We agree with the comments of the reviewer and have adjusted the text on page 9 accordingly.

      Page 9. The authors write, "Notably, MamnIDR::GFP fusion was present in droplets, suggesting it can self-associate when present in a high local concentration (Figure S2B)." Is this result only valid for Mam nIDR or does full-length Mam also localize into droplets, as has been previously observed for full-length mammalian Maml1 in transfected cells?

      We agree that the observed foci of MamL1 that have been detected in mammalian cells are interesting. We have not tried to replicate those data because the large size of Mam has made it challenging to produce a full-length form in over-expression. We note however that another portion of Mam, MamIDR, does not make droplets when over-expressed despite it containing a large section of the disordered region of the Drosophila Mam. We have now included a comment about the mammalian data in the text (page 9) to put our findings in context.

      Previous studies in mammalian cells suggest that Maml1 is a high-confidence target for phosphorylation by CDK8, see Poss et al 2016 Cell Reports https://doi.org/10.1016/j.celrep.2016.03.030. By sequence comparison, does fly Mam have similar potential phosphorylation sites, and might these be critical for Mam/CDK module recruitment?

      We thank the reviewer for highlighting this point. Indeed, we were very excited when we learnt that MamL1 was found to be a high confidence CDK8 target and we looked hard in the Mam sequence for potential phosphorylation sites. Sadly, there is very little conservation between the fly and the mammalian proteins beyond the helical region that contacts CSL and NICD. Furthermore, there are no identifiable putative CDK8 phosphorylation sites based on conventional motifs. It therefore remains to be established whether or not Mam is a direct target of the CDK8 kinase activity. We have added an explanatory comment in the text (page 11).

      Page 11: The authors write, "The differences in the effects on Mam and CSL imply that the CDK module is specifically involved in retaining Mam in the hub, and that in its absence other CSL complexes "win-out", either because the altered conditions favour them and/or because they are the more abundant." Are the "other" complexes the authors are referring to Hairless-containing complexes? With the reagents the authors have in hand couldn't this be explicitly shown for CSLcomplexes rather than speculated upon?

      The reviewer is correct that CSL complexes containing Hairless are good candidates to be recruited in these conditions. We have compared the levels of Hairless at E(spl)-C following treatments with Senexin and have not detected a difference. However, it appears that the high proportion of unbound Hairless makes it difficult to detect/quantify the enrichment at E(spl)-C. We have therefore taken a different strategy, which is to measure the recruitment of a mutant form of CSL that is compromised for Hairless binding. Recruitment of the mutant CSL is detected in Notch-ON conditions, but is significantly reduced/absent following Senexin treatment. These data favour the model proposed by the reviewer that in the absence of CDK8 activity, the CSL-Hairless complexes win out. These new data have been added in new Supplementary Figure S3F and S3G (and see text page 11)

      Page 12/13: The authors write, "Based on these results we propose that, after Notch activity decays, the locus remains accessible because when Mam-containing complexes are lost they are replaced by other CSL complexes (e.g. co-repressor complexes)." Again, why not actually test this hypothesis rather than speculate? The dynamics of Hairless complexes following the removal of Notch would be very interesting and build upon previously published results from the Bray lab.

      We thank the reviewer for this comment and we agree it’s possible that the proportion of Hairless complexes increases after Notch withdrawal. However, for the reasons outlined above, it is difficult to quantify changes in Hairless, (and our preliminary experiment did not reveal any large-scale effect) and because of the complexity of the genetics we cannot straightforwardly extend the experiment to analyze the behaviour of the mutant CSL as above. Therefore, at present, we cannot say whether the loss of Mam is compensated by an increase in Hairless. We hope in future to investigate the characteristics of the memory in more depth.

      Page 13: The authors write, "As Notch removal leads to a loss of Mam, but not CSL, from the hub, it should recapitulate the effects of MamDN." While the data in Figure 5B seem to support this hypothesis, it's not clear to me that the loss of Mam and MamDN should phenocopy each other, bc in the case of MamDN, NICD would still be present.

      We apologise that this sentence was a bit misleading. We have now rewritten it to improve accuracy (page 13) “As Notch removal leads to a loss of Mam, but not CSL, from the hub, we hypothesised it would recapitulate the effects of MamDN on chromatin accessibility and transcription of targets.”

      The temporal dynamics for Mam recruitment using the temperature- and optogenetic-paradigms are quite different. For example, in the optogenetic time course experiments, the preactivated cells are in the dark for 4 hours, while in the temperature-controlled experiments, there is still considerable enrichment of Mam at 4 hours. For the preactivated optogenetic experiments, how sure are the authors that Mam is completely gone from the locus, and alternatively, can the optogenetic experimental results be replicated in the temperature-controlled assays? My concern is whether the putative "memory" observation is just due to incomplete Mam removal from the previous activation event.

      We appreciate the concerns of the reviewer. However, we are confident that the 4-hour optogenetic inactivation is much more effective than the equivalent time for temperature shifts. The temperature sensitive experiment involves a longer decay, because not only the protein but also the mRNA has to decay to fully remove NICD activity. The optogenetic experiments, involve only protein decay and so are more acute. Furthermore, we have tested (and we show in Figure 5H) that Mam is fully depleted after 4 hours “Off” in the optogenetic experiments.

      In order to further strengthen the evidence in favour of the memory hub, we have extended the time-frame further to show that CSL is retained at the locus even after 24 hours “Notch OFF” in both the temperature and the optogenetic paradigm. We have also measured the effects on transcription after a 24hr OFF period using the optogenetic paradigm and seen that robust transcription is initiated in cells that have experienced a previous activation (preactivated) compared to those that have not (naïve). These new data have been added to new Figure 5 C-F and strongly support the memory model.

      Reviewer #3 (Public Review):

      Summary:

      DeHaro-Arbona and colleagues investigate the in vivo dynamics of Notch-dependent transcriptional activation with a focus on the role of the Mastermind (MAM) transcriptional co-activator. They use GFP and HALO-tagged versions of the CSL DNA-binding protein and MAM to visualize the complex, and Int/ParB to visualize the site of Notch-dependent E(Spl)-C transcription. They make several conclusions. First, MAM accumulates at E(Spl)-C when Notch signaling is active, just like CSL. Second, MAM recruits the CDK module of Mediator but does not initiate chromatin accessibility. Third, after signaling is turned off, MAM leaves the site quickly but CSL and chromatin accessibility are retained. Fourth, RNA pol II recruitment, Mediator recruitment, and active transcription were similar and stochastic. Fifth, ecdysone enhances the probability of transcriptional initiation.

      Strengths:

      The conclusions are well supported by multiple lines of extensive data that are carefully executed and controlled. A major strength is the strategic combination of Drosophila genetics, imaging, and quantitative analyses to conduct compelling and easily interpretable experiments. A second major strength is the focus on MAM to gain insights into the dynamics of transcriptional activation specifically.

      We thank the reviewer for their positive comments about the strengths of our work.

      Weaknesses:

      Weaknesses are minor. There were no p-values reported for data presented in Figure S1D and no indication of how variable measurements were. In addition, the discussion of stochasticity was not integrated optimally with relevant literature.

      We thank the reviewer for noting these points. The statistical tests have now been included for Figure S1D (now Figure S1F). We have amplified the discussion about stochasticity, to include more reference to the literature and to make clear also the distinction with transcription bursting (page 19, 20).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have an elegant series of manipulations that provide strong evidence for their hypotheses and conclusions. Their exploitation of a unique biological system amenable to imaging in the larval salivary gland is well-considered and well-performed. Most of the conclusions are supported by the data. I only have the concerns below.

      (1) One of the main findings is the composition of Notch nuclear complexes and their interactions within a 'hub'. Yet most of the data showing hubs focus on labeling one protein component (+the locus or transcription), but multi-color imaging is rarely used to show how CSL-Mam, Mam-Med... protein signals coalescence to form a hub. Given the powerful tool developed, it would be important to show these multi-state hubs. Related to this, if the authors expect that hubs are formed independently of transcription or Notch pathway activation, do the authors see clustering at other non-specific loci in the nucleus? If not, can the authors comment on why they think that is the case? If so, do they demonstrate consistent residence time profiles with the tracked E(spl) locus?

      We apologise that it was not evident from the data shown that the proteins co-localize. First we stress that all the experiments are multicolor and most rely on very powerful methods to measure co-recruitment at a chromosomal locus- something that is very rarely achieved by others studying hubs. Second, we have in all cases confirmed that the proteins do colocalize. We have modified the diagram of our analysis pipeline to make more clear that this relies on multi-colour imaging, and adjusted all the figure labels to indicate the position of E(spl)-C. We have also added panels to new supplementary Figure S1C with examples of the co-localization between CSL and Mam and a plot confirming their levels of recruitment are correlated across multiple nuclei.

      We would like to clarify that our data show that the hubs do require Notch activation for their establishment. Other regions of enrichment are detected in Notch-ON conditions, but these are less prominent and, with no independent method for identifying them, can’t be compared between nuclei. In SPT experiments, other clusters with consistent residence are detected as reported in our recent paper which expanded on the SPT data (Baloul et al, 2023). We also detect co-localizations and “hubs” in other tissues, but those analyses are ongoing and beyond the scope of this paper.

      (2) The authors convincingly show that Notch hub complexes exhibit a memory. While the data showing rapid hub reformation upon Notch withdrawal are solid and convincing (Figure 5, in particular, F), the claim that this memory fosters rapid transcriptional reactivation is less clear. Yet in order to invoke transcriptional memory, it's necessary to solidify this transcriptional response angle. The authors should consider quantifying the changes in transcription activity (at the TS and not in the cytoplasm as currently shown), as well as the timing of transcriptional reactivation (with the MS2 system or smFISH). Manipulating the duration of the activation and dark recovery periods could help to draw a better correlation between the timing of hub reformation and that of transcriptional response and would also help determine how persistent this phenomenon is.

      We thank the reviewer for these suggestions. We have carried out several new experiments to probe further the persistence of memory and to show the effects on transcription when Notch is inactivated/reactivated. First, we have extended the time period for Notch inactivation by temperature control and show that the CSL hub persists even at 24 hours and that no transcription from the target E(spl)m3 is detected –neither at the transcription start-site nor in the cytoplasm. Second, we have extended the Notch OFF time period to 24 hours using the optogenetic approach and show that transcription is robustly reinitiated in preactivated nuclei when Notch is re-activated with 30 mins light treatment while little if any E(spl)m3 transcription is detected in naïve nuclei with the same treatment. These new data are included in new Figure 5 C-F and see page 13-14. Both these new experiments substantiate the model that the nuclei retain transcriptional memory.

      (3) The manuscript ends with the finding that the presence of a Mam hub does not always correlate with transcription. They conclude that transcription is initially stochastic. The authors find this surprising and even state that this could not be observed without their in vivo live imaging approaches. I don't understand why this result is surprising or unexpected, as we now know that transcription is generally a stochastic process and that most (if not all) loci are transcribed in a bursting manner. The fact that E(spl)-C locus is bursty is already obvious from the smFISH data. The fact that active nascent transcription does not correlate with local TF hubs was already observed in early Drosophila embryos (with Zelda hubs and two MS2 reporters, hb-MS2, sna-MS2). If, in spite of the inherent stochasticity of transcription (bursting), the data are surprising for other reasons, the authors should explain it better.

      We apologise that we had not made clear the reasons why the results were unexpected. We have substantially rewritten this section, and the discussion section, to clarify. We have also moderated the language used to better reflect the overall context of our results. We briefly summarise here. As the reviewer correctly states, it is well known that transcription is inherently bursty. Indeed the MS2 transcription profiles in “ON” nuclei are bursty, which likely reflects the switching of the promoter. However, in other contexts where we have monitored transcription although it is bursty it has nevertheless been initiated synchronously in response to Notch in all nuclei in a manner that was fully penetrant. What we observe in our current conditions, is that some nuclei never initiate transcription over the time-course of our experiments (2-3 hours), and those that are ON rarely switch off. This implies that there is another rate-limiting step. Supplying a second signal can modulate this so that it occurs with much higher frequency/penetrance. We consider this to be a second tier of regulation above the fundamental transcriptional bursting.

      The fact that Mam is recruited in all nuclei, whether or not they are actively transcribing was surprising because recruitment of the activation complex has been considered as the limiting step. This is somewhat different from Zelda, which is thought to be permissive and needed at an early step to prime genes for later activation rather than to be the last step needed to fire transcription. We note also that we are not monitoring the position of the hub with respect to the promoter, as in the Zelda experiments (Zelda hubs may still persist, but they are not overlapping with the nascent RNA), we are monitoring the presence or absence of Mam hub in proximity to a genomic region.

      Minor suggestions:

      (1) The genotypes of the samples should be indicated in the figure legends.

      We thank the reviewer for this suggestion. We have provided a table (new Table S3) where all of the genetic combinations are provided in detail for each figure. We considered that this approach would be preferable because it would be quite cumbersome to have the genotypes in each legend as they would become very long and repetitive.

      (2) While the schematic Fig1A explains how the locus is detected, the presence of ParS/ParB is never indicated in subsequent panels and Figure. I assume that all panels depicting enrichment profiles, use a given radius from the ParS/ParB dot to determine the zero of the x-axis (grey zone). This should be clearly stated in all panels/figure legends concerned.

      We apologies if this was not made explicit. Yes, all panels depicting enrichment profiles, use immunofluorescence signal from ParA/ParB recruitment to determine the zero of the x-axis. We have now marked this more clearly In all figures (grey bar, grey shading or labelled 0). All images where the locus is indicated by an arrowhead, by a coloured bar above the intensity plots or by grey shading in the graphs have been captured with dual colour and the signal from ParA/B recruitment used to define its location. This is now clearly stated in the analysis methods and in the legend. We have also modified the diagram in new supplementary Figure S1B, showing our analysis pipeline, to make that more explicit.

      (3) FRAP/SPT experiments: the author should provide more details. How many traces? Are traces showing bleaching removed?

      P7: does the statement ' The residences are likely an underestimation because bleaching and other technical limitations also affect track durations' imply that traces showing bleaching have not been removed from the analysis?

      The authors could justify the choice of the model for fitting FRAP/Spt experiments and be cautious about their interpretation. For example, interpreting a kinetic behavior as a DNA-specific binding event can be accurate, only if backed up with measurements with a mutant version of the DNA binding domain.

      We apologise if some of this information was not evident. The number of trajectories is provided in new Figure S1F, which indicates the number of trajectories analyzed for each condition in Figure 1.

      We have now added also the numbers of trajectories analyzed for the ring experiments.

      The comments on page 7 about bleaching refer to the technical limitations of the SPT approach. However, as bleached particles cannot be distinguished from those that leave the plane of imaging, they have not been filtered or removed. We have not sought to make claims about absolute residence times for that reason. Rather the point is to make a comparison between the different molecules. As the same fluorescent ligand and imaging conditions are used in all the experiments, all the samples are equivalently affected by bleaching. We subdivide trajectories according to their properties and infer that those which are essentially stationary are bound to chromatin, as is common practice in the field. We note that we have previously shown that a DNA binding mutant of CSL does not produce a hub at E(spl)-C in Notch-ON conditions and has a markedly more rapid recovery in FRAP experiments (Gomez-Lamarca et al, 2018) consistent with the slow recovery being related to DNA binding. This point has been added to the text (page 8).

      (4) The authors should quantify their RNAi efficiency for Hairless-RNAi, Med13-RNAi, white-RNAi, yellow-RNAi, CBP-RNAi, and CDK8-RNAi.

      We thank the reviewer for this comment. We have made sure that we are using well validated RNAis in all our experiments and have included the references in Table S2 where they have been used. We have now evaluated the knock-down in the precise conditions used in our experiments by quantitative RT-PCR and added those data, which show efficient knock-down is occurring, to new Supplementary Figure S1D and Figure S3J. We note also that the RNAi experiments are complemented by experiments inhibiting the complexes with specific drugs and that these yield similar results.

      (5) Figure 3 A: could the author show that transcription is indeed inhibited upon triptolide treatment with smFISH (with for example m3 probes)? Why not use alpha-amanitin?

      We thank the reviewer for this suggestion. We had omitted the smFISH data from this experiment in error. These data have now been added to new Supplementary Figure S3A and clearly show that transcription is inhibited following 1 hour exposure to triptolide. Triptolide is a very fast acting and very efficient inhibitor of transcription that acts at a very early step in transcription initiation. In our experience it is much more efficient than alpha-amanitin and is now the inhibitor of choice in many transcription studies.

      (6) Figure 4 typo: panel B should be D and vice versa. Accessibility panels are referred to as Figure 4D, D' in the text but presented as panel B in the Figure.

      We thank the reviewer for noting this mistake, it is now changed in the main text.

      (7) The authors must add their optogenetic manipulation protocol to their methods section.

      The method is described in detail in a recently published paper that reports its design and use. We have now also added a section explaining the paradigm in the methods (Page 31) as requested.

      (8) Figure 3G needs a Y-axis label.

      Our apologies, this has now been added.

      (9) The authors should note why there was a change of control in Figure 3D compared to 3E and G (yellow RNAi vs white RNAi).

      This is a pragmatic choice that relates to the chromosomal site of the RNAis being tested. Controls were chosen according to the chromosome that carries the UAS-RNAi: for the second chromosome this was yellow RNAi and for the third white RNAi. This is explained in the methods.

      (10) Figure 1 would benefit from a diagram describing the genomic structure of the E(spl) locus and the relative position of the labelled locus within it.

      We thank the reviewer for this suggestion and have added a diagram to Supplementary Figure S1A .

      Reviewer #2 (Recommendations For The Authors):

      Minor criticisms and typos:

      Pet peeve: in some of the figure panels they are labeled Notch ON or OFF, but in others they are not, albeit that info is included in the figure legend. For the ease of the reader/reviewer, would it be possible to label all relevant figure panels either Notch ON or OFF for clarity?

      We thank the reviewer for this suggestion and have modified the figures accordingly.

      Page 7, top. "In comparison to their average distribution across the nucleus, both CSL and Mam trajectories were significantly enriched in a region of approximately 0.5 μm around the target locus in Notch-ON conditions, reflecting robust Notch dependant recruitment to this gene complex." Are the authors referring to Figure 1D here?

      Thank you, this figure call-out has been added in the text.

      Page 9. "...reported to interact with p300 and other factors (Figure S2B)." I believe the authors mean Figure S2C and not S2B.

      Thank you, this has been corrected in the text.

      Page 9. There is no Figure S2D.

      Apologies, this was referring to Figure S1D, and is now corrected in the text.

      Page 11: "...were at very reduced levels in nuclei co-expressing MamDN (Figure 4B).." Should be Figure 4CD.

      Thank you, this has been corrected in the text.

      Page 12: "...which was maintained in the presence of MamDN (Figure 4D, D')." Should be Figure 4B.

      Thank you, this has been corrected in the text.

      Reviewer #3 (Recommendations For The Authors):

      In the Results section on Hub, the paragraph starting with "Third, we reasoned . ." the callout to Figure S2D should be Fig S1D.

      Thank you, this has been corrected in the text

      Figures: The font size in the Figures is so small that most words and numbers cannot be read on a printout. One has to go to the electronic version and increase the size to read it. This reviewer found that inconvenient and often annoying.

      We apologise for this oversight, the font size has now been adjusted on all the graphs etc.

      Figure legends: the legends are terse and in some cases leave explanations to the imagination (e.g. "px" in Figure 2E). It would be useful to go through them and make sure those who are not a Drosophila Notch person and not a transcription biochemist can make sense of them.

      Our apologies for the lack of clarity in the legends. We have gone over them to make them more accessible and less succinct.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-01938R

      Corresponding author(s): Ilan, Davis

      1. General Statements

      We thank all four reviewers for their helpful and constructive comments. We have gone through each and every comment and proposed how we would address each point raised by the reviewers. We are confident our proposed revisions are feasible within a reasonable and expected time frame. Some of the comments regarding minor typo/aesthetics and extra references have already been addressed in the transferred manuscript. The changes are highlighted in yellow in the transferred manuscript.

      2. Description of the planned revisions

      Reviewer #1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      Major points:

      1. The presented work itself (Figures 1-4) does not need significant adjustments prior to publication, in my view, with only a few points to address. However, the work in Figure 5- doesn't really support the claims the authors make on its own, and would require some additional experiments or at the very least discussion of the caveats to its current form.

      We thank the reviewer for these comments and will follow the reviewer’s suggestion by discussing the caveats regarding the interpretation of Figure 5. We will also add to the discussion to suggest future research approaches beyond the scope of this manuscript that would address the functional importance of localised mRNA translation. We will briefly mention in the discussion methods such as the quantification of the mRNA foci and the disruption of the mRNA localisation signals to disrupt localised translation and the use of techniques such as Sun-Tag (Tanenbaum et al, 2014) and FLARIM (Richer et al, 2021) to visualise local translation directly.


      Tanenbaum et al, 2014 DOI: 10.1016/j.cell.2014.09.039

      Richer et al, 2021 DOI: 10.1101/2021.08.13.456301

      * __ Localized glia transcripts, are they "glial/CNS/PNS" significant or are they similar to other known datasets of protrusion transcriptomes? The authors compared their 4801 "total" localized to a local transcriptome dataset from the Chekulaeva lab finding that a significant fraction are localized in both. As the authors note, this is in good agreement with a recent paper from the Talifarro lab showing conservation of localization of mRNAs across different cell types. What the authors haven't done here, is further test this by looking at other non-neuronal projection transcriptomic datasets (for example Mardakheh Developmental Cell 2015, among others). If the predicted glia-localized processes are similar to non-neuronal processes transcriptomes, this would further strengthen this claim and rule out some level of CNS/PNS derived linage driving the similarities between glia and neuronal localized transcripts. __*

      This is a good point and we thank the review for pointing out this interesting cancer data set. We will do as the reviewer suggests and intersect our data with Mardakheh Dev Cell 2015 to test the further generality of localisation in neurons and glia, in other cell types. Specifically, we plan to intersect both glial (this study) and neuronal (von Kuegelgen & Chekulaeva, 2020) dataset with protrusive breast cancer cells (Mardakeh et al, 2015).

      • *

      von Kuegelgen & Chekulaeva, 2020 DOI: 10.1002/wrna.1590

      Mardakeh et al, 2015 DOI: 10.1016/j.devcel.2015.10.005

      * __ The presentation/discussion around Figure 3 is a bit weaker than other parts of the manuscript, and it doesn't really contribute to the story in its current form. Notably there is no discussion about the significance of glia in neurological disorders until the very end of the manuscript (page 21), meaning when its first brought up.. it just sits there as a one off side point. The authors might consider strengthening/tightening up the discussion here, if they really want to keep it as a solo main figure rather than integrating it somewhere else/putting it into supplemental. In my view, Figures 2 & 3 should be merged into something a bit more streamlined. __*

      This is a good point. We plan to strengthen the presentation of Figure 3 and discussion of the significance of glia in neurological disorders by adding a description of the Figure in the Results section and highlighting the significance of glia in nervous system disorders in the Discussion section.

      * __ Why aren't there more examples of different mRNAs in Figure 4? Seems a waste to kick them all to supplemental. __*

      We agree that it could be helpful to show different expression patterns in the main figure. To address this point we will add Pdi (Fig. S4D), which shows mRNA expression in both the glia and the surrounding muscle cell. This pattern is in contrast to Gs2, which is highly specific to glial cells. We will also note that although pdi mRNA is present in both the glia and muscle, Pdi protein is only abundant in the glia, suggesting that translation of pdi mRNA to protein is regulated in a cell-specific manner.

      The plasticity experiments, while creative, I think need to be approached far more cautiously in their interpretation. Given that the siRNAs will completely deplete these mRNAs- it really needs to be stressed any/all of the effects seen could just be the result of "defective" or "altered" states in this glial population- which has spill over effects on plasticity in at the NMJ. Without directly visualizing if these mRNAs are locally translated in these processes and assessing if their translation is modulated by their plasticity paradigm, all these experiments can say is that these RNAs are needed in glia to modulate ghost bouton formation in axons. This represents the weakest part of this manuscript, and the part that I feel does not actually backup the claims currently being made. Without any experiments to A. quantify how much of these transcripts are localized vs in the cell body of these glia, B. visualize/quantify the translation of these mRNAs during baseline and during plasticity; the authors cannot use these data to claim that localized mRNAs are required for synaptic plasticity.

      We are grateful to the reviewer for pointing out that we were not precise enough in defining our interpretation of the structural plasticity assay. We did not intend to claim that our results show that local translation of these transcripts is necessary for plasticity, only that these transcripts are localized and are required in the glia for plasticity in the adjacent neuron (in which the transcript levels are not disrupted in the experiment). Definitively proving that these transcripts are required locally and translated in response to synaptic activity would require genetic/chemical perturbations and imaging assays that would require a year or more to complete, so are beyond the scope of this manuscript. To address this point, we will clarify that the results do not show that localized transcripts are required, only that the transcripts are required somewhere specifically in the glial cell (without affecting the neuron level), and we can indeed show in an independent experiment that there are localized transcripts.

      Reviewer #2 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      Major points:

      1. * __ The authors analyse the 1700 shortlisted genes for Gene Ontology and associations with austism spectrum disorder, leading to interesting results. However, it is not clear to what extent the enrichments they observe are driven by their presumptive localization or if the associations are driven to a significant extent by the presence of these genes in the selected cell types in the Fly Cell Atlas. One way to address this would be to perform the GO and SFARI analysis on genes that are expressed in the same cells in the Fly Cell Atlas but were not shortlisted from the mammalian cell datasets - the results could then be compared to those obtained with the 1700 localized transcripts. __* This is a fair point raised by the reviewer as genes involved in neurological disease such as Autism Spectrum Disorder may be enriched in CNS/PNS cell types. We will follow the reviewer’s suggestion to perform GO and SFARI gene enrichment analysis in genes that were not shortlisted for presumptive glial localisation.

      Although the authors attempt to justify its inclusion, I'm not convinced why it was important to use the whole cell transcriptome of perisynaptic Schwann cells as part of the selection process for localizing transcripts. Including this dataset may reduce the power of the pipeline by including mRNAs that are not localized to protrusions. How many of the shortlisted 1700 genes, and how many of the 11 glial localized mRNAs in Table 5, would be lost if the whole cell transcriptome were excluded. More generally, what is the distribution of the 11 validated localizing transcripts in each dataset in Table 4? This information might be valuable for determining which dataset(s), if any, has the best predictive power in this context.

      We thank the reviewer for raising this point, which we will address with further analysis and adding to the discussion. We propose to address the criticism by running our analysis pipeline without the inclusion of the dataset using Perisynaptic Schwann Cells (PSCs) and then intersect with the PSCs-expressed genes, since their functional similarity with polarised Drosophila glial cells is highly relevant. We also agree with the reviewer that it would be a useful control for us to assess the ‘predictive power’ of each glial dataset by calculating their contribution to the shortlisted 1,700 glial localised transcripts and to the 11 experimentally validated transcripts via in situ hybridisation. To address this point, we plan to add this information in the revised manuscript.

      * __ Did the authors check if any of the RNAi constructs are reducing levels of the target mRNA or protein? Doing so would strengthen the confidence in these important results significantly. In any case, the authors should also mention the caveat of potential off-target effects of RNAi. __*

      We thank the reviewer for their useful comment and agree that the extent to which the RNAi expression reduces the levels of mRNA is not specifically known. We will add a FISH experiment on lac, pdi and gs2 RNAi showing very strong reduction in mRNA levels. We will also add an explanation of the caveats of the use of the RNAi system to the discussion.

      Methods: what is the justification for assuming that if the RNAi cross caused embryonic or larval lethality then the 'next most suitable' RNAi line is reporting on a phenotype specific to the gene. If the authors want to claim the effect is associated with different degrees of knockdown they should show this experimentally. An alternative explanation is that the line used for phenotypic analysis in glia is associated with an off-target effect.

      We thank the reviewer for this comment. We agree that off target effects cannot in principle be completely ruled out without considerable additional experimental analysis beyond the scope of this manuscript. To address the criticism we will remove the expression data of the lines that cause lethality and revise the discussion to explain that the level of knockdown in each line is unknown, and would require further experimental exploration.

      Minor points:

      1. It would be helpful to have in the Introduction (rather than the Results, as is currently the case) an operational definition of mRNA localization in the context of the study. And is it known whether or not localization in protrusions is the norm in mammalian glia or the Drosophila larval glia? I ask because it may be that almost all mRNAs diffuse into the protrusion, so this is not a selective process. One interesting approach to test this idea might be to test if the 1700 shortlisted transcripts have a significant underrepresentation of 'housekeeping' functions. We thank the reviewer for this excellent suggestion. To address the comment, we will move our explanation of the operational definition of mRNA localization to the Introduction. We will also perform enrichment analysis of housekeeping genes within 1,700 shortlisted transcripts compared to the transcriptome background, as the reviewer suggested.

      Reviewer #3 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      Major points:

      1. The authors have pooled data from different studies across different type of glial cells performed from in vitro to in vivo. While pooling datasets may reveal common transcripts enriched in processes, this may not be the best approach considering these are completely different types of glial cells with distinct function in neuronal physiology. We thank the reviewer for highlighting the need for us to further justify why we pooled datasets. We will revise the manuscript to better emphasise that the overarching goal of our study was to try to discern a common set of localised transcripts shared between the cells. The problem with analysing and comparing individual data sets is that much of the variation may be due to differences in the methods used and amount of material, rather than differences in the type of cells used. We will revise the discussion to make this point and plan to explain that our approach corresponds well with a previous publication pooling localised mRNA datasets in neurons (von Kugelgen & Chekulaeva 2021).

      von Kuegelgen & Chekulaeva, 2020 DOI: 10.1002/wrna.1590

      It is important to note the limitations of the study. For example, DeSeq2 is biased for highly expressed transcripts. How robust was the prediction for low abundance transcripts?

      The presented 1,700 transcripts were shortlisted based on their presence and expression level (TPM) in glial protrusions rather than their relative enrichment. Nevertheless, the reviewer makes a valid criticism of our use of DESeq2, where we compared enriched transcripts in glial and neuronal protrusions in Figure 1D. To address this point we will discuss this caveat in the relevant section.

      The issue raised regarding low abundance transcript prediction raises an important question: does the likelihood of localisation to cell extremities correlate with mRNA abundance? We have already partially addressed this point, since our analysis of the fraction of localised transcripts per expression level quantiles shows only limited correlation. To address this comment, we will add these results in the revised manuscript as a supplementary figure.

      The authors identify 1,700 transcripts that they classify as "predicted to be present" in the projections of the Drosophila PNS glia. This was based on the comparison to all the mammalian glial transcripts. Since the authors have access to a transcriptomic study from Perisynaptic Schwann cells (PSCs), the nonmyelinating glia associated with the NMJ isolated from mice; it would be more convincing to then validate the extent of overlap between Drosophila peripheral glial with the mammalian PSCs. This may reveal conserved features of localized transcripts in the PNS, particularly associated with the NMJ function.

      Thank you for the valuable suggestion. A similar point was also raised by __[Reviewer #2 - Major point 2] __to re-run our pipeline excluding the PSCs dataset and intersect with the PSC transcriptome post-hoc. Please see the above section for our detailed response.

      Fig 2: What is the extent of overlap between the translating fractions versus the localized fraction? It will be informative to perform the functional annotation of the translating glial transcripts as identified from Fig 1D.

      This is an interesting question. To address this point, we plan to: (i) compare transcripts that are translated vs. localised in glial protrusions, and (ii) perform functional annotation enrichment analysis on the translated fraction of genes.

      "We conclude predicted group of 1,700 are highly likely to be peripherally localized in Drosophila cytoplasmic glial projections". To validate their predictions, the authors test some of these candidates in only one glial cell type. It might be worthy to extend this for other differentially expressed genes localized in another glial type as well.

      The presented in vivo analyses made use of the repo-GAL4 driver, which is active in all glial subtypes, including subperineurial, perineurial and wrapping glia that make distal projection to the larval neuromuscular junction. We agree that subtype-specific analysis would be highly informative, but we believe this is outside the scope of the current work where we aimed to identify conserved localised transcriptomes across all glial subtypes. Nevertheless, to address the comment, we plan to further clarify our use of pan-glial repo-GAL4 driver in the Results and Method section of the revised manuscript.

      Figure 5: The authors perform KD of candidate transcripts to test the effect on synapse formation. However, these are KD with RNAi that spans across the entire cell. To make the claim about the importance of "target" RNA localization in glia stronger, ideally, they should disrupt the enrichment specifically in the glial protusions and test the impact on bouton formation. Do these three RNAs have any putative localization elements?

      We agree with the review, that we would ideally test the effect of disruption of mRNA localization (and therefore localised translation). However, we feel these experiments are beyond the scope of this current study, as they will require a long road of defining localisation signals that are small enough to disrupt without affecting other functions. To address this comment we will revise the Discussion section to mention those difficulties explicitly, and clarify the limitations of the approach used in our study for greater transparency.

      Reviewer #4 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      Major points:

      1. The authors use FISH to validate the glial expression of their target genes, though these experiments are not quantified, and no controls are shown. The authors should provide a supplemental figure with "no probe" controls, and/or validate the specificity of the probe via glial knockdown of the target gene (see point 2). Furthermore, these data should be quantified (e.g. number of puncta colocalized with NMJ glia membrans). Thank you for requesting further information regarding the YFP smFISH probes. We have validated the specificity and sensitivity of the YFP probe in our recent publication (Titlow et al, 2023, Figure 1 and S1). Specifically, we demonstrated the lack of YFP probe signal from wild-type untagged biosamples and showed colocalization of YFP spots with additional probes targeting the endogenous exon of the transcript. Nevertheless, we will address this comment by adding control image panels of smFISH in wild-type (OrR) neuromuscular junction preparations.

      Titlow et al, 2023 DOI: 10.1083/jcb.202205129

      For the most part, the authors only use one RNAi line for their functional studies, and they only show data for one line, even if multiple were used. To rule out potential false negatives, the authors should leverage their FISH probes to show the efficacy of their knockdowns in glia. This would serve the dual purpose of validating the new probes (see point 1).

      Thank you for the suggestion. This point was also raised by [Reviewer #2 - Major point 3]. Please see above for our detailed response.

      In Figure 5 E, given the severe reduction in size in the stimulated Pdi KD animals, the authors should show images of the unstimulated nerve as well. Do the nerve terminals actually shrink in size in these animals following stimulation, rather than expand? The NMJ looks substantially smaller than a normal L3 NMJ, though their quantification of neurite size in F suggests they're normal until stimulation.

      We share the same interpretation of the data with the reviewer that the neurite area is reduced post-potassium stimulation in pdi knockdown animals. We will follow the reviewer’s suggestion and add an image showing unstimulated neuromuscular junctions.

      Minor points:

      The authors claim that there is an enrichment of ASD-related genes in their final list of ~1400 genes that are enriched in glial processes. It is well-appreciated that synaptically-localized mRNAs are generally linked to ASDs. Can the authors comment on whether the transcripts localized to glial processes are even more linked to ASDs and neurological disorders than transcripts known to be localized to neuronal processes?

      This is an interesting point. To address the comment, we will add a comparison of the degree of enrichment of ASD-related genes in neurite vs. glial protrusions in the revised manuscript.

      __*

      *__

      • *

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1


      1. The use of blue/green or blue/green/magenta is difficult to resolve in some places. Swapping blue for cyan would greatly aid in visualizing their data.
      2. *

      This comment is much appreciated. We have swapped blue for cyan in Figures 4 and S4. We have also changed Figure S1 to increase contrast and visibility as per reviewer’s comment.

      Make the colouring/formatting of the tables more consistent, its distracting when its constantly changing (also there is no need for a blue background.. just use a basic white table).

      This comment is much appreciated. We have applied a consistent colour palette to the Tables without background colourings and made the formatting uniform.

      • *

      Reviewer #2

      • *

      Introduction: 'Asymmetric mRNA localization is likely to be as important in glia, as it is in neurons,...'. Remove commas

      Thank you for pointing this mistake out. We have made the corresponding edits.

      • *

      Reviewer #3

      RNA localization in oligodendrocytes has been well studied and characterized. The authors should cite and discuss those papers (PMID: 18442491; PMID: 9281585).

      We thank the reviewer for this useful suggestion. We have added these references to the paper.

      • *

      • *

      Reviewer #4

      • *

      • In Figure 5D, the authors should include a label to indicate that these images are from an unstimulated condition. We thank the reviewer for pointing this out. We have added the label as requested.

      The authors are missing a number of key citations for studies that have explored the functional significance of mRNA trafficking in glia, and those that have validated activity-dependent translation:

      - ____https://pubmed.ncbi.nlm.nih.gov/18490510____/

      -____https://pubmed.ncbi.nlm.nih.gov/7691830____/

      -____https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001053

      -____https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7450274____/

      -_https://pubmed.ncbi.nlm.nih.gov/36261025_*/

      *__

      We thank the reviewer for the comment. We have added these references to the text.

      • *

      4. Description of analyses that authors prefer not to carry out

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank the Reviewers for their helpful and constructive comments. In response to these suggestions we have performed new experiments and amended the manuscript, as we describe in our detailed response below.

      2. Point-by-point description of the revisions

      Reviewer #1

      1. The Reviewer notes that while our analysis of centrosome size was comprehensive, we provided no analysis of centrosomal MTs, pointing out that while centrosome size declines as the embryos enter mitosis, the ability of centrosomes to organise MTs might not. This is a good point, and we now provide an analysis of centrosomal-MT behaviour (Figure 2). We find that there is a dramatic decline in centrosomal MT fluorescence at NEB, although the pattern of centrosomal MT recruitment prior to NEB is surprisingly complex.

      The Reviewer questions how PCM client proteins can be recruited in different ways by the same Cdk/Cyclin oscillator. We apologise for not explaining this properly. It is widely accepted that Cdk/Cyclins drive cell cycle progression, in part, by phosphorylating different substrates at different activity thresholds (e.g. Coudreuse and Nurse, Nature, 2010; Swaffer et al., Cell, 2016). Moreover, it is also clear that Cdk/Cyclins can phosphorylate the same protein at different sites at different activity thresholds (e.g. Koivomagi et al., Nature, 2011; Asafa et al., Curr. Biol., 2022; Ord et al., Nat. Struct. Mol. Biol., 2019). Thus, we hypothesise that rising Cdk/Cyclin cell cycle oscillator (CCO) activity phosphorylates multiple proteins at different times and/or at different sites to generate the complicated kinetics of centrosome growth. We now explain this point more clearly throughout the manuscript.

      The Reviewer is puzzled as to how we conclude that Cdk/Cyclins phosphorylate Spd-2 and Cnn at all the potential Cdk/Cyclin phosphorylation sites we mutate in our study. The Reviewer is right that we cannot make this conclusion, and we did not intend to make this claim. As we now clarify (p11, para.1), although it is unclear if Cdk/Cyclins phosphorylate Spd-2 or Cnn on all, some, or none of these sites, if either protein can be phosphorylated by Cdk/Cyclins, then these mutants should not be able to be phosphorylated in this way—allowing us to address the potential significance of any such phosphorylation. We now also note that several of these sites have been shown to be phosphorylated in embryos in Mass Spectroscopy screens (Figure S6).

      The Reviewer highlights differences in how Spd-2 and Cnn help recruit γ-tubulin to centrosomes (Figure 6). They ask for a more detailed description, and are puzzled as to how this is compatible with direct regulation by a single oscillator. We now explain our thinking on this important point in much more detail. It appears that Spd-2 helps recruit γ-tubulin throughout S-phase, while Cnn has a more prominent role in late S-phase (Figure 6). This is consistent with our overall hypothesis of CCO regulation, as we postulate that low-level CCO activity promotes the Spd-2/γ-tubulin interaction in early S-phase, while higher CCO activity promotes the Cnn/γ-tubulin interaction in late-S-phase, potentially explaining the increase in the rate of γ-tubulin (but not γ-TuRC) recruitment we observe at this point (see minor comment #1, below, for an explanation of the various γ-tubulin complexes in flies). This is consistent with recent literature showing that CCO activity promotes γ-tubulin (but not γ-TuRC) recruitment by Cnn/SPD-5 in worms and flies (Ohta et al., 2021; Tovey et al., 2021).

      The Reviewer was not convinced by our model (Figure 8, now Figure 9), raising two major concerns. First, they were unsure how a single oscillator could generate different patterns of protein recruitment. We addressed this in point #2 and #4, above, where we explain how different thresholds of CCO activity trigger different events, so there is no expectation that we should observe steady changes in recruitment over time as CCO activity rises. Second, they questioned how modest levels of Cdk/Cyclin activity can promote recruitment, while high levels of activity can inhibit recruitment. In point #1, above, we cite several examples where such positive and negative regulation by different Cdk/Cyclin activity levels have been described. We also now explain throughout the manuscript why this hypothesis provides a plausible explanation for our results: with moderate CCO activity promoting Spd-2-dependent PCM-client recruitment in early S-phase; higher CCO activity promoting a decrease in Spd-2 recruitment in mid-late-S-phase (so centrosomal Spd-2 levels decline); and even higher levels of CCO activity leading to a decrease in the interactions between the client proteins and the Spd-2/Cnn scaffold as the embryos enter mitosis (so the client proteins are rapidly released from the centrosome).

      The Reviewer also raised the important point here that our model does not explain why the mutant forms of Spd-2 and Cnn accumulate to higher levels at the start of S-phase, and not just at the end of S-phase/entry into mitosis. We apologise for not explaining this properly. The accumulation of the mutant proteins (particularly Spd-2, Figure 5C) in early-S-phase occurs because the excess mutant protein that accumulates at centrosomes in late-S-phase/mitosis is not removed properly from centrosomes during mitosis (presumably because there is insufficient time). Thus, centrosomes still have too much mutant Spd-2 at the start of the next S-phase. We show this in Reviewer Figure 1 (attached to this letter), which tracks Spd-2 behaviour further into mitosis, and now explain this in more detail in the text (p12, para.1).

      The Reviewer questions how the CCO can both induce centrosome growth and also switch it off, as it is unclear how an oscillator that only phosphorylates sites to decrease centrosome binding could also promote growth. They ask if we can identify and mutate any Cdk/Cyclin sites in centrosome proteins that promote centrosome recruitment. As we now clarify, we did not intend to claim that the CCO only phosphorylates sites that decrease the centrosome binding of proteins, although we do hypothesise that such phosphorylation is important for switching off centrosome growth in mitosis. In addition, we hypothesise that moderate levels of CCO initially promote centrosome growth, and our data suggests that the CCO does this, at least in part, by promoting Polo recruitment (Figure 8). We speculate that the CCO phosphorylates specific Polo-box-binding sites in Ana1 and Spd-2, the main proteins that recruit Polo to centrioles. We agree that identifying these sites is an important next step, but it is complicated as our studies indicate that multiple sites contribute in a complex manner. Importantly, it is well established that the CCO triggers centrosome growth as cells prepare to enter mitosis, so our hypothesis that moderate levels of CCO activity initiate centrosome growth is not new or controversial.

      Minor Comments

      1. The reviewer asks how we explain the different incorporation profiles we observe for the different subunits of the γ-tubulin ring complex. We apologise for not discussing this point. In flies there is a “core” γ-tubulin-small complex (γ-TuSC) and a larger γ-tubulin-ring complex (γ-TuRC) that contains the Grip71, Grip75 and Grip128 subunits we analyse here (Oegema et al., JCB, 1999). The γ-TuSC functions independently of the γ-TuRC so γ-tubulin and γ-TuRC components can behave differently.

      The Reviewer questions why we claim an “inverse-linear” relationship between S-phase length and the centrosome growth rate when the relationship is not linear (Figure 3, now Figure S3). I was originally confused by this as well but, mathematically, a linear relationship means y is proportional to x, whereas an inverse-linear relationship means y is proportional to 1/x. Thus, an inverse-linear relationship between x and y does not plot as a straight line, but rather as the curves we show on the graphs. We now explain this in text (p9, para.2).

      Reviewer #2

      This Reviewer found the manuscript hard to follow, so we are very grateful that they took the time to try to understand it. We agree that the subject matter is complicated, and that our presentation was not always helpful. The Reviewer’s comments have been very useful in helping us to identify (and hopefully improve) areas of particular difficulty.

      Major points:

      1. The Reviewer highlights that the two experimental approaches underpinning our main conclusions are problematic: (1) Experiments with mutants of Spd-2 and Cnn that theoretically cannot be phosphorylated by Cdk/Cyclins are hard to interpret as these mutations may have other effects; (2) It is unclear whether reducing Cyclin B levels reduces peak CDK activity or simply slows the time it takes to reach peak levels. They suggest a more direct test of our model would be to analyse PCM recruitment in embryos arrested in S-phase or mitosis. (1) We agree that the mutations designed to prevent Cdk/Cyclin phosphorylation could perturb function in other ways, but this is true for any such mutation, and there are many papers that infer a function for Cdk/Cyclin phosphorylation from such experiments. Importantly, the centrosomal accumulation of the phospho-null mutants actually slightly increases compared to WT (Figure 5C and I), and we now show that the centrosomal accumulation of a phosphomimicking Spd-2-Cdk20E mutant slightly decreases (Figure S8). We now acknowledge the potential caveat of a non-specific perturbation of protein function, but feel that the reciprocal behaviour of the phospho-null and phospho-mimicking mutants somewhat mitigates this concern (p12, para.2). (2) Fortunately, and as we now clarify, it has recently been shown that reducing Cyclin levels does not reduce peak Cdk activity, but rather slows the time it takes to reach peak activity (Figure 2A, Hayden et al., Curr. Biol., 2022). Thus, the cyclin half-dose experiments provide an excellent alternative test of our hypothesis as they show that the WT proteins can exhibit similar behaviour to the mutants if the rate of Cdk/Cyclin activation is slowed. We feel the evidence supporting our hypothesis is strong enough that it warrants serious consideration. The suggestion to look at PCM recruitment in embryos arrested in either S-phase or M-phase is a good one, but these experiments produce complicated data. In M-phase arrested embryos, for example, Cnn levels continue to rise (see Figure 1G, Conduit et al., Dev. Cell, 2014), but the other PCM proteins do not (unpublished); in S-phase arrested embryos (arrested by mitotic cyclin depletion) centrosomes continue to duplicate, but now do so asynchronously, greatly complicating the analysis (McCleland and O’Farrell, Curr. Biol.., 2008; Aydogan et al., Cell, 2020). The centrosomes that don’t duplicate, however, reach a constant steady-state size (where the rate of centrosome protein addition is balanced by the rate of loss). These observations are consistent with our recent mathematical modelling of mitotic PCM assembly (Wong et al., 2022) if we additionally account for cell cycle regulation (which was not considered in our original model). We believe such analyses are beyond the scope of the current paper and we plan to publish a second paper incorporating our new hypothesis into our mathematical modelling.

      The Reviewer questions whether our methods accurately measure centrosomal protein accumulation, pointing out that γ-tubulin and Grip128 occupy different centrosomal areas—which should not be possible if they are part of the same complex. They suspect that our use of different transgenes with different promotors could explain these differences. As we should have described (see point #1 in our response to the minor comments of Reviewer #1), γ-tubulin exists in two complexes in flies, only one of which contains Grip128, so γ-tubulin and Grip128 exhibit different localisations. Moreover, as we now show (Figure S2), using different promotors does not seem to make a difference to overall recruitment kinetics. Thus, we are confident that our methods measure centrosome protein recruitment dynamics accurately.

      The Reviewer is concerned that our measurements of centrosome size based on fluorescence intensity (Figure 1) and centrosomal area (Figure S1) do not always match. They suggest a potential reason for this is that proteins are not uniformly distributed within centrosomes, and this may impact our ability to measure protein accumulation based on 2D projections (noting, for example, that Polo and Spd-2 are concentrated at centrioles and in the PCM, potentially explaining the different shape of their growth curves compared to the client proteins). When the centrosome-fluorescence-intensity and centrosome-area recruitment profiles of a protein do not match, the average “centrosome-density” of that protein must be changing over time. In some cases, we understand why density changes. Cnn, for example, stops flaring outwards on the centrosomal MTs during mitosis so its centrosomal area decreases even as its fluorescence intensity increases (leading to an increase in its centrosomal-density). We agree (and now discuss—p19, para.3) that the prominent accumulation of Spd-2 and Polo at centrioles could help to explain why Spd-2 and Polo accumulation dynamics differ from the client proteins.

      Other points:

      The Reviewer suggests it would be good to know how much Polo at the centrosome is active____. We agree, but although commercial antibodies against PLK1 phosphorylated in its activation loop work in cultured fly cells, we cannot get them to work in embryos. Moreover, the recruitment of Polo/PLK1 to its site of action by its Polo-Box Domain is sufficient to partially activate the kinase independently of phosphorylation (Xu et al., NSMB, 2013). Thus, it seems likely that all the Polo/PLK1 recruited to centrosomes will be at least partially activated, even if it is not necessarily phosphorylated on its activation loop.

      The Reviewer asks if it is clear that less Spd-2 and Cnn are recruited to centrosomes in the half gene-dosage embryos. We apologise for not mentioning that this is indeed the case. We showed this previously for Cnn (Conduit et al., Curr. Biol., 2010) and we now state that this is also the case for Spd-2. We do not show the Spd-2 data as we plan to publish a comprehensive dose-response curve of Spd-2 (and Cnn) recruitment in our next modelling paper.

      Would it not be relevant to examine Polo ½ dosage embryos? We do have this data (Reviewer Figure 2), attached to this letter, but it is quite complicated to interpret (as we explain in the legend). We feel it would be more appropriate to include this in our next modelling paper where we can properly explain the behaviours we observe. Publishing this data here would distract from our main message without changing any of our conclusions.

      The Reviewer asks why the non-phosphorylatable Spd-2 protein is also present at higher levels on centrosomes at the start of S-phase (not just the end of S-phase). This was also raised by Reviewer #1 (point #5), so please see the second paragraph of our response there.

      Minor/Discussion Points:

      We thank the Reviewer for highlighting that absolute and relative centrosome size control are different things and we have amended the manuscript accordingly.

      The Reviewer questions whether it is accurate to describe Spd-2 and Polo as scaffold proteins, noting that only Cnn has been shown to have scaffolding properties. There is strong evidence that Spd-2 has Cnn-independent scaffolding properties in flies (e.g. Conduit et al., eLife, 2014), but this is a fair point for Polo. We think it is justified to separate Polo from other client proteins as Polo is essential for scaffold assembly, whereas other client proteins are not. We now define our scaffold/client terminology to avoid confusion (p4, para.3).

      The Reviewer highlights several points related to differences in recruitment kinetics (also touched on in points #2 and #3, above), noting we don’t discuss properly the idea of two different modes of PCM recruitment. These are all good points, largely addressed in our response to points #2 and #3, above. We now discuss much more prominently the two different modes of client protein recruitment throughout the manuscript.

      As we now clarify, in all our experiments we use centrosome separation and nuclear envelope breakdown (NEB) to define the start and end of S-phase, respectively.

      The Reviewer quotes the landmark Woodruff paper (Cell, 2017) as showing that the ability to concentrate client proteins (including ZYG-9, the worm homologue of Msps) is an intrinsic property of the PCM scaffold, so how do we explain that Msps departs prior to NEB while Cnn continues to accumulate? It is indeed a striking observation of our study that all PCM client proteins (not just Msps) start to leave the centrosome prior to NEB, even as Cnn levels continue to accumulate. Our hypothesis is that this ‘leaving’ event is triggered by a threshold level of Cdk/Cyclin activity—explaining why these client proteins all start to leave the PCM at the same time (just prior to NEB) irrespective of nuclear cycle length. This is not incompatible with the Woodruff paper, which did not attempt to reconstitute any potential regulation by Cdk/Cyclins in their in vitro studies.

      The Reviewer questions why Spd-2 that cannot be phosphorylated by Cdk/Cyclins (Spd-2-Cdk20A) accumulates abnormally at centrosomes in late S-phase, yet γ-tubulin (which is recruited by Spd-2) seems to leave centrosomes more slowly in the presence of the mutant protein. As we now explain more clearly, there is no contradiction here. Spd-2-Cdk20A accumulates to abnormally high levels in late-S-phase/early mitosis (Figure 5C), and this reduces the γ-tubulin dissociation rate, as we would predict (Figure 7B, right most graph). It does not “prevent” dissociation, however, (as the Reviewer seems to suggest it should?), but this is probably because these experiments have to be performed in the presence of large amounts of the WT Spd-2 (Figure 5A).

      The referencing error has been corrected.

      The Reviewer asks why in Figure 1 not all of the centrosome proteins could be followed for the full time period (as we mention in the legend, but do not explain). There are different reasons for different proteins: (1) Polo cannot be followed in mitosis as it binds to the kinetochores, making it impossible to accurately track centrosomes (so the data for mitosis is missing for Polo); (2) Cnn exhibits extensive flaring at the end of mitosis/early S-phase (Megraw et al., JCS, 1999), so we cannot track individual separating centrosomes labelled with NG-Cnn in early S-phase until they have moved sufficiently far-apart (so the early S-phase time-points are missing for Cnn); (3) In addition, several of the client proteins bind to the mitotic spindle, so although we can still track and measure the centrosomes in late mitosis in the graphs, we don’t show pictures of these late mitosis centrosomes in the montage in Figure 1A as the images look a bit odd. We now explain these reasons in the Materials and Methods.

      We now indicate that nuclear cycle 12 (NC12) is being analysed in Figures 4-8.

      The reviewer questions why we don’t show the decrease rate for γ-tubulin in Figure 6 (the Spd-2 and Cnn half-dose experiments), when we do show it in Figure 7 (the Spd-2 and Cnn Cdk-mutant experiments), suspecting that it is slowed in both cases. The reviewer is correct and we now show this data for both sets of experiments.

      We have corrected the labelling error in Figure S1.

      The Reviewer suggest moving some of the data from the main Figures, and the entirety of Figures 2 and 3 to the Supplemental Information. We understand this point, and agree that the amount of data presented in Figures 1-3 is somewhat overwhelming. We have played around with the Figures a lot—in particular trying to show a few examples of the data and moving the rest to Supplementary—but it is hard to pick a “typical” example, and the power of comparing the behaviour of so many different centrosome proteins is somewhat lost. We have tidied up several Figures and, as a compromise, we keep Figure 2 (now Figure 3) in the main text, but have moved Figure 3 to Supplementary (now Figure S5).

      The Reviewer suggests that we should repeat the analysis of Spd-2, Polo and Cnn dynamics that we show here, as we already presented this data in a previous publication (Wong et al., EMBO. J, 2022). We understand this point, but feel this would be a less accurate comparison, as essentially all of the data shown in Figure 1 was obtained several years ago during a contiguous ~6month period. Since then, the lasers and software on our microscope system have been updated, so it would probably be less fair of a comparison to obtain new data for a subset of these proteins (and it seems overkill to perform the entire analysis again). We clearly state that this data has been presented previously, so we hope the Reviewer will agree that it is acceptable to present it again here so readers can more easily compare the data.

      Reviewer #3

      This Reviewer is broadly supportive of the manuscript, but to publish in a prestigious journal they think additional experimental evidence will be required to support our hypothesis.

      The Reviewer notes that our only evidence that Cdk/Cyclins directly phosphorylate Spd-2 comes from our analysis of the Spd-2-Cdk20A mutant, as the effect of reducing Cyclin B dosage on WT Spd-2 behaviour is very modest. They request that we analyse the behaviour of a Spd-2-Cdk20E phospho-mimicking mutant. The effect of halving the dose of Cyclin B on Spd-2 behaviour is modest, but this is what we would predict as all we are doing in this experiment is slowing S-phase by ~15%, so Spd-2 should accumulate at centrosomes for a slightly longer time and to a slightly higher level (as we observe, Figure 5E). A great advantage of the early fly embryo system is that we can compare the behaviour of many hundreds of centrosomes, so even subtle differences like this are usually meaningful. To illustrate this point, we have now repeated the Spd-2 analysis in WT and CycB1/2 embryos (but now using a CRISPR/Cas9 Spd-2-NG knock-in line) and we see the same subtle differences (Figure S9). In addition, as requested, we have now analysed the behaviour of a Spd-2Cdk20E mutant protein using an mRNA injection assay (as it would have taken too long to generate and test new transgenic lines). In this assay we injected embryos with mRNA encoding either WT Spd-2-GFP, Spd-2-Cdk20A-GFP or Spd-2-Cdk20E-GFP. The mRNA is quickly translated, and we computationally measured the fluorescence intensity of the centrosomes in mid-S-phase (i.e. at the Spd-2 peak) (Figure S8). This analysis confirms that Cdk20A accumulates to slightly higher levels, and reveals that Cdk20E accumulates to slightly lower levels, than the WT protein. Together, these new experiments strongly support our original conclusions.

      The Reviewer notes that we propose that the CCO initially promotes centrosome growth by stimulating Polo recruitment to centrosomes, but states that we only provide indirect evidence for this by showing that centrosomal Polo levels are strongly reduced in Cyclin B half-dose embryos. They suggest we determine Spd-2 levels in Polo half-dose embryos, and/or the centrosome levels of mutant forms of Spd-2 that cannot be phosphorylated by Polo. We believe the Cyclin B half-dose experiment provide direct support for our hypothesis that Cdk/Cyclin activity influences Polo recruitment (Figure 8), although, clearly, we have not identified the mechanism. We do, however, suggest a plausible mechanism: Ana1 and Spd-2 are largely responsible for recruiting Polo to centrosomes, and we have previously shown that several of the potential phosphorylation sites in these proteins that help recruit Polo to centrosomes are Cdk/Cyclin or Polo phosphorylation sites (Alvarez-Rodrigo et al., eLife, 2020 and JCS, 2021; Wong et al., EMBO J., 2022). We are currently testing this hypothesis, but progress is slow as it is clear that multiple sites in both proteins can influence this process.

      As the Reviewer requests, we have now also examined how Spd-2 and Cnn behave in Polo half-dose embryos (Reviewer Figure 2, attached to this letter). As we describe in the Figure legend, this data is informative, but is complicated. With relatively minor, but mechanistically important, tweaks to our previous mathematical modelling we can explain these behaviours, but introducing such a significant mathematical modelling element would be beyond the scope of this paper. As described above, these findings will form the basis of a follow-up paper that is more mathematically oriented.

      It is a great idea to look at mutant forms of Spd-2 that cannot be phosphorylated by Polo, but the consensus Polo phosphorylation site (N/D/E-X-S, with the N/D/E at -2 and the S at 0 being preferences, rather than a strict rule) is less well-defined than the consensus Cdk/Cyclin phosphorylation site (where the Pro at -1 is essentially invariant). Thus, we cannot accurately predict which sites would need to be mutated to generate such a mutant.

      The Reviewer requests that we analyse the behaviour of TACC in embryos expressing the Spd-2-Cdk20A and Cnn-Cdk6A (as we do in Figure 7 for γ-tubulin). This is a reasonable request, but we prefer not to show this data as we have recently identified an interesting interaction between TACC, Spd-2 and Aurora A that will be the subject of another paper we hope to submit shortly. This data is hard to interpret without explaining these interactions properly, which is beyond the scope of the current manuscript.

      We hope the Reviewers will agree that these changes have improved the manuscript substantially, and that it is now suitable for publication. We would like to thank them again for taking the time to read this rather complicated paper so thoroughly.

      We look forward to hearing from you.

      Yours sincerely,

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank the reviewers for their appreciation of the interest, novelty and quality of our study, and their useful feedback to improve its presentation.

      We have revised the manuscript addressing all the points they made, as detailed below, section by section, following the organization in the reviews. The corresponding changes are highlighted in yellow (new text) or crossed out (deleted text) in our revised manuscript.

      In case it is useful for the editor to check how each individual point was addressed, we also have extracted from the reviews each individual reviewer’s comment and our direct response, listed as bullet points at the end of this text.

      2. Point-by-point description of the revisions

      I - General criticisms

      Reviewer #1: My main criticism is unfortunately inherent to the approach: comparative studies are absolutely critical, but they can only provide a very sparse sampling of diversity. Fortunately, thanks to high-throughput sequencing, bioinformatic analyses can now be performed on a large number of species, but experimental validation is typically restricted to two or three species. The consequence of this for the present manuscript is that while the functional conservation of the Gwl site is convincingly shown, the exact mechanisms responsible for the reduced effect of PKA phosphorylation remain relatively vaguely defined. Indeed, in their Discussion the authors list a number of experimental approaches to address this - but I understand that these would all involve substantial efforts to address. In particular, testing chimeric constructs around the consensus PKA site and from multiple species could be very informative.

      We completely agree with the reviewer that comparative approaches are critical to understanding biological mechanisms, and are excited by the increasing possibilities to perform not only sequence and descriptive comparisons but functional studies across a range of emerging model organisms. We hope that more and more researchers in cell and molecular biology will profit from experimental tools and techniques now available in such species, and to pioneer new ones. Of course, and he/she rightly points out, conclusions are currently limited by the number of species studied, but comparisons between two judiciously chosen species can already be very informative. Thus, in our study, the use of Xenopus and Clytia allowed us to make significant progress towards our main objective of understanding the cAMP-PKA paradox in the control of oocyte maturation; specifically by showing both that PKA phosphorylation of Clytia ARPP19 is lower in efficiency and that the phosphorylated protein has a lower effect on oocyte maturation than the Xenopus protein. As the reviewer points out, unravelling the exact mechanisms underlying these differences will require a large amount of additional work and is beyond the scope of the current study. Actually, we have embarked on several series of experiments to this end using some of the approaches listed in the Discussion. Specifically, we are testing the biochemical and functional properties of chimeric constructs containing the consensus PKA site from various species. This is a substantial undertaking which will require one to two years to complete, but is already giving some very interesting findings.

      Reviewer #1: The figures and text could be slightly condensed down to about 6 figures.

      We have reduced the number of figure panels but we prefer to maintain the number of figures, because the experimental data presented in them is essential to the interpretation of our results and the overall conclusions of the article. If the journal editor would like us to reduce the number of figures, we could do this by displacing Figure 4 and some panels of other figures (to then fuse some of them) to supplementary material, but this would be a pity.

      ______________II - Abstract__

      As recommended by Reviewer #2, we have reworked the Abstract to make it more accessible to new readers, attempting to bring out more clearly and simply the main results and conclusions of the study. We correspondingly simplified and shortened the title of the article. Changes: Page 2.

      ____________III- Introduction points__

      Reviewer #2: I believe that it would be interesting to include some time-references when introducing the prophase arrest of Clytia and Xenopus oocytes. How long is prophase arrest in Xenopus compared to Clytia or other organisms? How can this affect the prophase arrest mechanisms? It seems that the prophase arrest in Xenopus oocytes is found to be significantly more prolonged compared to Clytia and various other organisms, and also meiotic maturation proceeds much more rapidly in Clytia than in Xenopus. This should be indicated in the introduction with a short introduction of why, and not others, were these species chosen for this study.

      Differences in timing of oocyte prophase arrest and in maturation kinetics across animals are indeed highly relevant in relation to the underlying biochemical mechanisms. Unfortunately, not enough information is currently available concerning the duration of the successive phases of oocyte prophase arrest across species to make any meaningful correlations with PKA regulation of maturation initiation. We have nevertheless expanded the Introduction to cover this issue as follows:

      • We start the introduction by mentioning how the length of the prophase arrest varies across species. __Changes: Page 3, lines 5-11. __
      • We have added examples of species which likely have similar durations of prophase arrest but show cAMP-stimulated vs cAMP-inhibited release. Changes:____ Page 4, lines 28-35.
      • We have specified the temporal differences in meiotic maturation in Xenopus (3-7 hrs) and Clytia (10-15 min). Changes: Page 5, lines 32-33. Reviewer #2: why, and not others, were these species [Xenopus, Clytia] chosen for this study. A brief justification is included in lines 1-page 5 "..a laboratory model hydrozoan species well suited to oogenesis studies", but it does not explain why this and not other hydrozoan species like Hydra, that has also been used for meiosis studies.

      As requested by Reviewer #2, fuller details are now included about the advantages of Clytia compared to other hydrozoan species, citing several articles and recent reviews here and also in the Discussion. Changes: Page 5, lines 21-32 & 37-39.

      Hydra is a classic cnidarian experimental species and has proved an extremely useful model for regeneration and body patterning, but is not suitable for experimental studies on oocyte maturation because spawning is hard to control and fully-grown oocytes cannot easily be obtained, manipulated or observed. In contrast many hydromedusae (including Clytia, Cytaeis, and Cladonema) have daily dark/light induced spawning and accessible gonads, so provide great material for studying oogenesis and maturation. Of these, Clytia has currently by far the most advanced molecular and experimental tools.

      Reviewer #2: The proteins MAPK is not introduced properly, as it is first mentioned in the results section in line 12. Given the importance of the results provided with it, it should be presented in the introduction prior to the results section.

      As requested by Reviewer #2, the involvement of MAPK activation during Xenopus oocyte meiotic maturation is now introduced, explaining how its phosphorylation serves as a marker of Cdk1 activation. Changes: Page 5, lines 1-5.

      Reviewer #2: *These sentences need a more elaborate explanation: Page 4 Lines 16-17 "... no role for cAMP has been detected in meiotic resumption, which is mediated by distinct signaling pathways" Which pathways? *

      We now give the example of the well-characterized pathway Gbg-PI3K pathway for oocyte maturation initiation in the starfish. Changes: Page 4, lines 1-15.

      Reviewer #2: Page 4 line 34-39. Introduction indicates that the phosphorylation of ARPP19 on S67 by Gwl is a poorly understood molecular signaling cascade (line 34). However, the positive role of ARPP19 on Cdk1 activation, through the S67 phosphorylation by Gwl, appears to be widespread across all eukaryotic mitotic and meiotic divisions studied (lines 36-37). These two sentences seem a little contradictory. If the general pathway has been identified but the signaling cascade is still not well described, please indicate that in a clearer way.

      We apologise that the wording we used was not clear and implied that the mechanisms of PP2A inhibition by Gwl-phosphorylated ARPP19 were poorly understood. On the contrary, they are very well studied. The part that remains mysterious concerns the upstream mechanisms. We have reworded the paragraph to make this point unambiguous. Changes: Page 5, lines 1-8.

      ______________IV - Results__

      Reviewer #2: The text of the results is generally well described; however, all the sections start with a long introductory paragraph. I believe this facilitates the contextualization of the experiments, but please try to summarize when possible. For example, in page 5 lines 12-25, or page 7 lines 30-37, are all introduction information.

      As requested by Reviewer #2, we have shortened or removed the introductory passages of the Results section paragraphs, which were redundant with the information given in the introduction. We did not restrict to the two examples cited by the reviewer, but have shortened all the Results passages that repeat information already provided in the Introduction. Changes: Page 7, lines 3-4 & 14-16 & 36-37 - Page 8, lines 12-15 - Page 8, lines 37-40 & Page 9, lines 1-6.

      Reviewer #2: Page 7, Lines 14-19 present a general conclusion of the findings explained in lines 20-27. I think these results are important and they should be explained better, in my opinion they are slightly poorly described.

      We have followed the reviewer's recommendation. The explanation of the experiments and the results are more detailed and the paragraph ends with a general conclusion which came too early in the previous version. Changes: Page 8, lines 22-24 & 32-34.

      Reviewer #2: Page 8, lines 16-17: "It was not possible to increase injection volumes or protein concentrations without inducing high levels of non-specific toxicity". What are the non-specific toxicity effects? How was this addressed? What fundaments this conclusion?

      Clytia oocytes are relatively fragile. Sensitivity of oocytes to injection varies between batches, while in general increasing injection volumes or protein concentrations increases the levels of lysis observed. We do not know exactly what causes this but lysis can happen either immediately following injection or during the natural exaggerated cortical contraction waves that accompany meiotic maturation, suggesting that it relates to mechanical trauma. We have expanded this paragraph and the legend of Fig. 3C to explain these injection experiments more fully in the text and to clarify these issues. Changes: Page 9, lines 16-29 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      Same paragraph: Lines 25-27 of page 8. Text reads, "These results suggest that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia, although we cannot rule out that the quantity of OA or Gwl thiophosphorylated ARPP proteins delivered was insufficient to trigger GVBD.". Please provide evidence if higher concentrations of OA or Gwl were tested to state this conclusion.

      As explained above, we could not increase the concentrations of ARPP19 protein beyond 4mg/ml. It is important to note that at the same concentration, both Clytia and Xenopus proteins induce activation of Cdk1 and GVBD in the Xenopus oocyte.

      Concerning OA, it is well documented in many systems including Xenopus, starfish and mouse oocytes as well as mammalian cell cultures, that high concentrations lead to cell lysis/apoptosis as a result of a massive deregulation of protein phosphorylation (Goris et al, 1989; Rime & Ozon, 1990; Alexandre et al, 1991; Boe et al, 1991; Gehringer, 2004; Maton el al, 2005; Kleppe et al, 2015). Specific tests in Xenopus oocytes, have shown that injecting 50 nl of 1 or 2 mM OA specifically inhibits PP2A, while injecting 5 mM also targets PP1 and higher OA concentrations inhibit all phosphatases. For these reasons, we did not increase OA concentrations over 2 mM. When injected in Xenopus oocyte at 1 or 2 mM, OA induces Cdk1 activation, GVBD but then the cell dies because PP2A has multiple substrates essential for cell life. When injected at 2 mM in Clytia oocytes, OA does not induce Cdk1 activation nor GVBD but promotes cell lysis. This supports the conclusion that 2 mM OA is sufficient to inhibit PP2A (and possibly other phosphatases) but that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia.

      We have reworded the relevant text to make these points clearer. The previous statement that “we cannot rule out that the quantity of OA or Gwl thiophosphorylated ARPP proteins delivered was insufficient to trigger GVBD” has been removed because it was unnecessarily cautious in the context of the literature cited above, as now fully explained. Changes: Page 9, lines 31-35 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      References: Alexandre et al, 1991, doi: 10.1242/dev.112.4.971; Boe et al, 1991, doi: 10.1016/0014-4827(91)90523-w; Gehringer, 2004, doi: 10.1016/s0014-5793(03)01447-9; Goris et al, 1989, doi: 10.1016/0014-5793(89)80198-x; Kleppe et al, 2015, doi: 10.3390/md13106505; Maton el al, 2005, doi: 10.1242/jcs.02370; Rime & Ozon, 1990, doi: 10.1016/0012-1606(90)90106-s

      Reviewer #2: Lines 12-13: the sentence "This in vitro assay thus places S81 as the sole residue in ClyARPP19 for phosphorylation by PKA." is overstated. As not all residues had been tested, please indicate that "it is likely that" or "among the residues tested", in contrast to "the sole residue in ClyARPP19".

      We realise that we had not explained clearly enough how the thiophosphorylation assay works. In this assay, γ-S-ATP will be incorporated into any amino acid of ClyARPP19 phosphorylatable by PKA. The observed thiophosphorylation of the wild-type protein, demonstrates that one or more residues are phosphorylated by PKA. This thiophosphorylation was completely prevented by mutation of a single residue, S81. This experiment thus shows that S81 is entirely responsible for phosphorylation by PKA in this assay. We have rewritten this section more clearly. Changes: Page 10, lines 18-28.

      ______________V - Figures and text related to the figures__

      Figure 1A

      Reviewer #2: *Why is mouse not included in Figure 1A? Although it might be very similar to human, given that mouse is the species that is most commonly use as a mammalian model, I believe it could be included. However, this is optional upon decision by the authors. *

      We have replaced the human sequence in Figure 1A with the mouse sequence as suggested. The sequences of each of the mouse and human ENSA/ARPP19 proteins are indeed virtually identical across mammals. Changes: Fig. 1A.

      Figure 1C

      Reviewer #2: *There should be a better explanation in the text of the results sections for the image included in in Fig1 C. Note that Clytia is not a commonly used species, therefore images should be properly explained for general readers. Please indicate in the text that ClyARPP19 mRNA is expressed in previtellogenic oocytes and not in vitellogenic, plus any additional information needed to understand the image. In addition, the detection of ARPP19 in the nerve rings is intriguing. This is mentioned in the discussion section, any idea of its function there? Please include some additional information or additional references, if they exist. *

      We have expanded the explanations of Fig. 1C in the text and in the figure legend. We have also added cartoons to the figure to help readers understand the organisation of the Clytia jellyfish and gonad. As now explained, ClyARPP19 mRNA is detected in oocytes at all stages, but the signal is much stronger in pre-vitellogenic oocytes because all cytoplasmic components including mRNAs are significantly diluted by high quantity of yolk proteins as the oocytes grow to full size. Changes: page 7, line 40 & page 8, lines 1-9 - Fig. 1C - Legend page 31, lines 19-31.

      Nothing is known about the function of ARPP19 in the Clytia nervous system. The only data linking ARPP19 and the nervous system concerns mammalian ARPP16, an alternatively spliced variant of ARPP19. ARPP16 is highly expressed in medium spiny neurons of the striatum and likely mediates effects of the neurotransmitter dopamine acting on these cells (Andrade et al, 2017; Musante et al, 2017). This point is included in the Discussion in relation to the hypothesis that PKA phosphorylation of ARPP19 proteins in animals first arose in the nervous system and only later was coopted into oocyte maturation initiation. Changes: page 16, lines 12-13 & 17-20 - page 19, lines 6-9.

      Figure 2A

      Reviewer #1: Fig. 2A (and similar plots in subsequent figures): is it really necessary to cut the x axis? Would it be possible to indicate the number of oocytes for each experiment (maybe in the legend in brackets)?

      As requested by reviewer #1, the x-axis is no longer cut. The number of oocytes for each experiment is now provided in the legend of Fig. 2A and in similar plots of Fig. 5A and 5D. Changes: Fig. 2A - Legends page 31, lines 37-38 (Fig. 2A), page 33, line 25 (Fig. 5A) - page 33, line 34 (Fig. 5D).

      Figure 2D-E (as well as Figure 6C-D and Figure 8B-C)

      Reviewer #1: *Fig. 2D (and all similar plots below): I am lacking the discrete data points that were measured. Without these it is impossible to evaluate the fits. The half-times shown in 2E are somewhat redundant, and the information could be combined on a single plot. *

      We added all the data points to the concerned plots: 2D, 6C and 8B. As recommended by reviewer #1, we combined on a single plot the phosphorylation levels and the half-times. 2D-E => 2D, 6C-D => 6C and 8B-C => 8B. Changes: Figs 2D, 6C and 8B - Legends page 32, lines 9-14 (Fig. 2D), page 34, lines 24-30 (Fig. 6C) - page 35, lines 13-18 (Fig. 8B).

      Figure 3A and 3B

      Reviewer #1: Fig. 3: why is the blot for PKA substrates cut into 3 pieces? It would be clearer to show the entire membrane.

      In western blot experiments using Clytia oocytes, the amount of material was limited so the membranes were cut into three parts. The central part was incubated sequentially in distinct antibodies. We finally incubated all three parts of the membrane with the anti-phospho-PKA substrate antibody to reveal the full spectrum of proteins recognized by this antibody. The 3 pieces in Fig. 3A therefore together make up the same original membrane. We had separated them on the figure to make it clear that the membrane had been cut. In the new presentation, the 3 pieces are shown next to each other, making it clear that all the membrane is present, with dotted lines indicating the cut zone as explained in the legend. Changes: Fig. 3A and 3B - Legend page 32, lines 22-25 (Fig. 3A), lines 30-33 (Fig. 3B) - Page 24, lines 3-6 (Methods).

      Figure 3C

      Reviewer #2: Fig. 3C needs a better explanation in the text. The way these graphs are presented is somehow confusing. The meaning of the dots is not self-explanted in the graph, and it seems that each experiment was done independently but then the complete set of results is presented. Legend says that "each dot represents one experiment" but this is difficult to read as in every analysis the figure also indicates the average and the total number of oocytes. If authors wish so, they can keep the figure as it is, but then please explain this graph better in the text, and please include statistical analysis. These results are very robust, but a comparison between the number of oocytes that go through spontaneous GVBD of lysis in the different conditions will benefit their understanding.

      This figure is intended to provide an overview of all the Clytia oocyte injection experiments that we performed, for which full details are given in Supplementary Table 1. Since these experiments were not equivalent in terms of exact timing and types of observation (or films) made and oocyte sensitivity to injection -as ascertained by buffer injections-, it is not justified to make statistical comparisons between groups. We apologise that the presentation was misleading in this respect and hope that the new version is easier to understand. We removed from the figure the average percentage of maturation for each condition between experiments to avoid any misunderstanding of the nature of the data, and rather represent the values of each experiment independently. We also now explain the data included in the figure fully in the text and figure legend. Changes: Page 9, lines 16-39 - Fig. 3C and Supplementary Table 1 - Legend page 32, lines 34-41 & page 33, lines 1-11.

      Reviewer #2: Also, please provide in the text a plausible explanation for the cause of oocyte lysis for all experimental conditions (Fig 3C). Given that in the control experiments with buffer this effect is also observed in some oocytes, please explain if this is caused by a mechanical disruption of the oocyte during the injection. In contrast, okadaic acid induces the lysis in all the 14/14 oocytes analyzed, is this due also to the mechanical approach? Or is there other reason more related to the PP2A inhibition? Please explain.

      These points are treated above in the response to this reviewer concerning the Results section.

      Figure 5

      Reviewer #2: In Figure 5 D-F, cited in page 9 lines 35-35. Can you provide an explanation of why the time course of meiosis resumption was delayed?

      The binding partners/effectors of XeARPP19-S109D that are involved in maintaining the prophase arrest have not yet been identified. The most probable explanation of the delay in meiotic maturation induced by ClyARPP19-S109D is that Clytia protein recognizes less efficiently these unknown ARPP19 effectors that mediate the prophase arrest. As a result, maturation would be delayed, but not blocked. This explanation was provided in the Discussion (page 17, lines 14-17) and is now mentioned in the Results section. Changes: page 11, lines 16-19.

      ______________VI - Discussion__

      Reviewer #2: Although it presents highly interesting suggestions, discussion may border on being overly speculative, especially from line 37 of page 15 till the end.

      We agree and have reduced the speculation in this part of the discussion, in particular regrouping and reformulating ideas about evolutionary scenarios in a single paragraph. Changes: page 17, lines 37-41 - page 18, lines 1-41 - page 19, lines 1-18.

      SUMMARY - ____Point by point responses to individual reviewers’ comments in their order of appearance.

      Reviewer 1

      • The figures and text could be slightly condensed down to about 6 figures. We have reduced the number of figure panels but we prefer to maintain the number of figures, because the experimental data presented in them is essential to the interpretation of our results and the overall conclusions of the article. If the journal editor would like us to reduce the number of figures, we could do this by displacing Figure 4 and some panels of other figures (to then fuse some of them) to supplementary material, but this would be a pity.

      • The exact mechanisms responsible for the reduced effect of PKA phosphorylation remain relatively vaguely defined. Indeed, in their Discussion the authors list a number of experimental approaches to address this - but I understand that these would all involve substantial efforts to address. In particular, testing chimeric constructs around the consensus PKA site and from multiple species could be very informative. As the reviewer points out, unravelling these exact mechanisms will require a large amount of additional work and is beyond the scope of the current study.

      • 2A (and similar plots in subsequent figures): is it really necessary to cut the x axis? Would it be possible to indicate the number of oocytes for each experiment (maybe in the legend in brackets)? Fig. 2A has been changed in line with the reviewer's request (as well as similar plots in Fig. 5A and 5D). Changes: Fig. 2A - Legends page 31, lines 37-38 (Fig. 2A), page 33, line 25 (Fig. 5A) - page 33, line 34 (Fig. 5D).

      • 2D (and all similar plots below): I am lacking the discrete data points that were measured. Without these it is impossible to evaluate the fits. The half-times shown in 2E are somewhat redundant, and the information could be combined on a single plot. Fig. 2D has been changed in line with the reviewer's request (as well as similar plots in Figs 6C-D and 8B-C). Changes: Fig. 2D, 6C and 8B - Legends page 32, lines 9-14 (Fig. 2D), page 34, lines 24-30 (Fig. 6C) - page 35, lines 13-18 (Fig. 8B).

      • 3: why is the blot for PKA substrates cut into 3 pieces? It would be clearer to show the entire membrane. In western blot experiments using Clytia oocytes, the amount of material was limited so the membranes were cut into three parts. The central part was incubated sequentially in distinct antibodies. We finally incubated all three parts of the membrane with the anti-phospho-PKA substrate antibody to reveal the full spectrum of proteins recognized by this antibody. The 3 pieces in Fig. 3A therefore together make up the same original membrane. In the new presentation, the 3 pieces are shown next to each other, making it clear that all the membrane is present, with dotted lines indicating the cut zone as explained in the legend. Changes: Fig. 3A and 3B - Legend page 32, lines 22-25 (Fig. 3A), lines 30-33 (Fig. 3B) - Page 24, lines 3-6 (Methods).

      Reviewer 2

      • Abstract needs to be simplified if wants to reach a broader range of readers. We have reworked the Abstract to make it more accessible to new readers. Changes: Page 2.

      • It would be interesting to include some time-references when introducing the prophase arrest of Clytia and Xenopus oocytes. This should be indicated in the introduction with a short introduction of why, and not others, were these species chosen for this study. We have expanded the Introduction to cover the issue of time-references. Fuller details are now included about the advantages of Clytia compared to other hydrozoan species. Changes: Page 3, lines 5-11, page 4, lines 28-35, page 5, lines 32-33, page 5, lines 21-32 & 37-39.

      • The proteins MAPK is not introduced properly, as it is first mentioned in the results section. The involvement of MAPK activation during Xenopus oocyte meiotic maturation is now introduced. Changes: Page 5, lines 1-5.

      • Page 4 Lines 16-17 "... no role for cAMP has been detected in meiotic resumption, which is mediated by distinct signaling pathways" Which pathways? We now give the example of the well-characterized pathway Gbg-PI3K pathway for oocyte maturation in starfish, also mentioning that in many species the pathways are still unknown. Changes: Page 4, lines 1-15.

      • Page 4 line 34-39. Introduction indicates that the phosphorylation of ARPP19 on S67 by Gwl is a poorly understood molecular signaling cascade (line 34). However, the positive role of ARPP19 on Cdk1 activation, through the S67 phosphorylation by Gwl, appears to be widespread across all eukaryotic mitotic and meiotic divisions studied (lines 36-37). These two sentences seem a little contradictory. The mechanisms of PP2A inhibition by Gwl-phosphorylated ARPP19 are very well studied. The part that remains mysterious concerns the upstream mechanisms. We have reworded the paragraph to make this point unambiguous. Changes: Page 5, lines 1-8.

      • Why is mouse not included in Figure 1A? We have replaced the human sequence in Figure 1A with the mouse sequence. Changes: Fig. 1A.

      • 1C: There should be a better explanation in the text of the results sections for the image included in in Fig1 C. Please indicate in the text that ClyARPP19 mRNA is expressed in previtellogenic oocytes and not in vitellogenic. We have expanded the explanations of Fig. 1C in the text. We have also added cartoons to the figure to help readers understand the organisation of the Clytia jellyfish and gonad. As now explained, ClyARPP19 mRNA is detected in oocytes at all stages, but the signal is much stronger in pre-vitellogenic oocytes because all cytoplasmic components are significantly diluted by high quantity of yolk proteins. Changes: page 7, line 40 & page 8, lines 1-9 - Fig. 1C - Legend page 31, lines 19-31.

      • In addition, the detection of ARPP19 in the nerve rings is intriguing. Any idea of its function there? The only data linking ARPP19 and the nervous system concerns a mammalian variant of ARPP19 that is highly expressed in the striatum. This point is included in the Discussion. __Changes: __page 16, lines 12-13 & 17-20 - page 19, lines 6-9.

      • Figure 3C. The way these graphs are presented is somehow confusing. If authors wish so, they can keep the figure as it is, but then Also, please provide in the text a plausible explanation for the cause of oocyte lysis for all experimental conditions. please explain this graph better in the text, and please include statistical analysis. This figure is intended to provide an overview of all the Clytia oocyte injection experiments, for which full details are given in Supplementary Table 1. We have modified the figure and now clarified this fully in the text and figure legend. Clytia oocytes are relatively fragile. Sensitivity of oocytes to injection varies between batches, while in general increasing injection volumes or protein concentrations increases the levels of lysis observed. We do not know exactly what causes this but it probably relates to mechanical trauma. We now explain these injection experiments more fully in the text. Changes: Page 9, lines 16-39 - Fig. 3C and Supplementary Table 1 - Legend page 32, lines 34-41 & page 33, lines 1-11.

      • In Figure 5 D-F, cited in page 9 lines 35-35. Can you provide an explanation of why the time course of meiosis resumption was delayed? The most probable explanation is that Clytia protein recognizes less efficiently the unknown ARPP19 effectors that mediate the prophase arrest in Xenopus. This explanation is provided in the Results section. Changes: page 11, line 16-19.

      • All the sections start with a long introductory paragraph. I believe this facilitates the contextualization of the experiments, but please try to summarize when possible. As requested, we have shortened or removed the introductory passages of the Results section paragraphs, which were redundant with the information given in the introduction. Changes: Page 7, lines 3-4 & 14-16 & 36-37 - Page 8, lines 12-15 - Page 8, lines 37-40 & Page 9, lines 1-6.

      • Page 7, Lines 14-19 present a general conclusion of the findings explained in lines 20-27. I think these results are important and they should be explained better, in my opinion they are slightly poorly described. The explanation of the experiments and the results are now more detailed and the paragraph ends with a general conclusion which came too early in the previous version. Changes: Page 8, lines 22-24 & 32-34.

      • Page 8, lines 16-17: "It was not possible to increase injection volumes or protein concentrations without inducing high levels of non-specific toxicity". What are the non-specific toxicity effects? How was this addressed? What fundaments this conclusion? As explained above, increasing injection volumes or protein concentrations increases the levels of lysis observed due probably to mechanical trauma. But it is important to note that at the same concentration, both Clytia and Xenopus proteins induce activation of Cdk1 and GVBD in the Xenopus oocyte. Changes: Page 9, lines 16-29 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      • Lines 25-27 of page 8. "These results suggest that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia, although we cannot rule out that the quantity of OA or Gwl thiophosphorylated ARPP proteins delivered was insufficient to trigger GVBD." Please provide evidence if higher concentrations of OA or Gwl were tested to state this conclusion. High OA concentrations lead to cell lysis/apoptosis as a result of a massive deregulation of protein phosphorylation. For these reasons, we cannot increase OA concentrations over 2 µM. When injected in Xenopus oocyte at 1 or 2 µM, OA induces Cdk1 activation, but then the cell dies because PP2A has multiple substrates essential for cell life. When injected at 2 µM in Clytia oocytes, OA does not induce Cdk1 activation but promotes cell lysis. This supports the conclusion that 2 µM OA is sufficient to inhibit PP2A but that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia. We have reworded the relevant text. Changes: Page 9, lines 31-35 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      • Lines 12-13: the sentence "This in vitro assay thus places S81 as the sole residue in ClyARPP19 for phosphorylation by PKA." is overstated. As not all residues had been tested, please indicate that "it is likely that" or "among the residues tested", in contrast to "the sole residue in ClyARPP19". The observed thiophosphorylation of the wild-type protein demonstrates that one or more residues are phosphorylated by PKA. This thiophosphorylation was completely prevented by mutation of a single residue, S81. This experiment thus shows that S81 is entirely responsible for phosphorylation by PKA in this assay. We have rewritten this section more clearly. Changes: Page 10, lines 18-28.

      • Some parts of the discussion are a bit speculative. We have reduced the speculation in this part of the discussion, in particular regrouping and reformulating ideas about evolutionary scenarios into a single paragraph. Changes: page 17, lines 37-41 - page 18, lines 1-41 - page 19, lines 1-18.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)): *

      *REVIEW COMMENT *

      *The article titled "The tRNA thiolation-mediated translational control is essential for plant immunity" by Zheng et al. highlights the critical role of tRNA thiolation in Arabidopsis plant immunity through comprehensive analysis, including genetics, transcriptional, translational, and proteomic approaches. Through their investigation, the authors identified a cbp mutant, resulting in the knockout of ROL5, and discovered that ROL5 and CTU2 form a complex responsible for catalyzing the mcm5s2U modification, which plays a pivotal role in immune regulation. The findings from this study unveil a novel regulatory mechanism for plant defense. Undoubtedly, this discovery is innovative and holds significant potential impact. However, before considering publication, it is necessary for the authors to address the various questions raised in the manuscript concerning the experiments and analysis to ensure the reliability of the study's conclusions. *

      __Response: Thank you very much for your support and suggestions! __

      *Here is Comments: *

      *Line 64-65: *

      *The author mentioned that 'While NPR1 is a positive regulator of SA signaling, NPR3 and NPR4 are negative regulators.' However, several recent discoveries are suggesting that it may not be a definitive fact that NPR3 and NPR4 are negative regulators. Therefore, I recommend the authors to review this section in light of the findings from recent papers and make necessary edits to reflect the most current understanding. *

      __Response: Thank you for your feedback. Since we mainly focused on NPR1 in this study, we removed this sentence to avoid confusion. We provided additional information about NPR1 in the Introduction section to emphasize the importance of NPR1 (Line 64-68). __

      *Line 182- & Figure 4: *

      *The author conducted RNA-seq, Ribo-seq, and proteome analysis. Describing the analysis as "transcriptional and translational" using RNA-seq and proteome data seems not entirely accurate. Proteome data compared with RNA-seq not only reflects translational changes but may also encompass post-translational regulations that contribute to the observed differences. To maintain precision, the title of this section should either be modified to "transcriptional and protein analysis" or, alternatively, compare RNA-seq and Ribo-seq data to demonstrate the transcriptional and translational changes more explicitly. *

      __Responses: Thank you for your suggestions. We agree with you and thus change the description accordingly throughout the manuscript. __

      *Line 229-235 and Figure 5C: *

      *The interpretation of Figure 5C's polysome profiling results is inconclusive. There does not seem to be a noticeable difference in polysomal fractions between the cab mutant and CAM. The observed differences in the overlay of multiple polysome fractions between cab and COM could be primarily influenced by baseline variations rather than a significant decrease in the polynomial fractions in cpg. Therefore, it is necessary to carefully review other relevant papers that discuss polysome fraction data and their analysis. By doing so, the authors can make the appropriate corrections to ensure accurate interpretations. *

      __Responses: Thank you for your comments. We agree that the difference between cgb and COM was not dramatic visually. This is a common feature of ____polysome profiling assay (e.g. Extended Data Fig. 1f in Nature 545: 487–490; Fig. 1c in Nature Plants, 9: 289–301). In our case, the difference between polysome fractions was unlikely due to the baseline variation for two reasons. First, baseline variation affects monosome and polysome fractions in the same way. However, our results showed the monosome fraction of cgb is higher than that of COM, whereas the polysome fraction of cgb is lower than that of COM. Second, this result was repeatedly detected. For better visualization, we adjusted the scale of Y axis in the revised manuscript (Figure 5D). __

      *Line 482 Ion Leakage assay: *

      I could not find the ion leakage assay in this manuscript, so I wonder why it is mentioned.

      __Response: We are sorry for the mistake. The Ion leakage data were included in previous visions of the manuscript. We removed the data but forgot to remove the corresponding method in the present version. __

      *Materials and Methods: *

      *To enhance the reproducibility of the study, the authors should provide a more detailed description of the materials and methods, especially for critical experiments like the Yeast-two-hybrid assays. Clear documentation of specific reagents, strains, and protocols used, along with information on controls, will bolster the validity of the results and facilitate future research in this area. *

      __Response: Thank you for your suggestions. We provided more details in the methods. For y____east two-hybrid assays, the vector information was included in “Vector constructions” section. __

      *Minor Point: *

      Line 61: There is a space between ')' and '.', which needs to be edited.

      Response: The space was deleted.

      *Reviewer #1 (Significance (Required)): *

      *This study holds significant importance within the field of plant immunity research. The authors have made valuable contributions through their comprehensive analysis, encompassing genetics, transcriptional, translational, and proteomic approaches, to elucidate the critical role of tRNA thiolation in plant immunity. One of the major strengths of this study lies in its ability to shed light on a previously unknown regulatory mechanism for plant defense. By identifying the cbp mutant and investigating the role of ROL5 and CTU2 in catalyzing the mcm5s2U modification, the authors have unveiled a novel aspect of plant immune regulation. This innovative discovery provides a deeper understanding of the intricate molecular processes governing immunity in plants. *

      *Moreover, the study's findings are not limited to the immediate field of plant immunity but also have broader implications for the scientific community. By employing diverse methodologies, the authors have demonstrated how tRNA thiolation exerts control over both transcriptional and translational reprogramming, revealing intricate links between these processes. This integrative approach sets a precedent for future research in the field of plant molecular biology and opens up new avenues for investigating other aspects of immune regulation. *

      In terms of its relevance, the study's findings have the potential to captivate researchers across various disciplines, such as plant biology, molecular genetics, and translational research. The insights gained from this study may inspire researchers to explore further the role of tRNA in other regulation.

      Response: Thank you very much for your positive comments and support!

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors presented an intriguing and previously unknown mechanism that the tRNA mcm5s2U modification regulates plant immunity through the SA signaling pathway, specifically by controlling NPR1 translation. The manuscript was well-written and logically structured, allowing for a clear understanding of the research. The authors provided strong and persuasive data to support their key claims. However, further improvement is required to strengthen the conclusion that mcm5s2U regulates plant immunity by controlling NPR1 translation.

      __Response: Thank you very much for your positive comments and support! __

      Major comments:

      • NPR1 translation should be examined to verify the Mass Spec (Figure 5B) and polysome profiling data (Figure 5D) by checking the NPR1 protein and mRNA level using antibodies and qPCR, respectively, in the cgb mutant background to establish a concrete confirmation of CGB regulation in NPR1 translation. * Response: This is a very constructive suggestion. We performed these experiments and found that the transcription levels of NPR1 were similar between COM and cgb both before and after ____Psm_ES4326 infection (Figure S2), _which is consistent with RNA-Seq data____. Consistent with the Mass Spec and polysome profiling data, _the NPR1 protein level was much higher in COM than that in cgb(Figure 5C) after _Psm____ ES4326 infection. Together, these data further supported our conclusion that translation of NPR1 is impaired in cgb. __

      • Analyzing the genetic epistasis of CGB and NPR1 to check if CGB regulates plant immunity through the NPR1-dependent SA signal pathway. If the authors' claim is valid, I would expect no addictive effect on bacterial growth in the cgb/npr1 double mutant compared to the single mutants. Due to the broad impact of CGB on plant signaling (Figures 4E and 4F), the SA protection assay, which concentrates on the SA signal pathway, needs to be tested in WT, cgb and npr1 plants as an alternative assay to the genetic epistasis analysis. I expect that the SA-mediated protection is also compromised in cgb mutant background.*

      __Response: Thank you for your suggestions. We did examine the growth of Psm ES4326 in the cgb npr1double mutant and found that cgb npr1 was significantly more susceptible than npr1 and cgb (Figure below). Although the additive effects were observed, this result was not against our conclusion for the following reasons. First, the translation of NPR1 was reduced rather than completely blocked in cgb. In other words, NPR1 still has some function in cgb. But in the cgb npr1 double mutant, the function of NPR1 is completely abolished, which explains why cgb npr1 was more susceptible than cgb. Second, in addition to NPR1, some other immune regulators (such as PAD4, EDS5, and SAG101) were also compromised in cgb(Figure 5B), which explained why cgb npr1 was more susceptible than npr1. Since the result of the genetic analysis was not intuitive, we decided not to include it in the manuscript. __

      __SA signaling is known to regulate both basal resistance and systemic acquired resistance (SA-mediated protection). We have shown that cgb is defective in the defect of basal resistance, which cgb is sufficient to support our conclusion that the tRNA thiolation is essential for plant immunity. We agree that it is expected that the SA-mediated protection is also compromised in cgb. We will test this in the future study. __

      • Could the authors comment on why using COM instead of WT as a control to perform the majority of the experiments? __Response: Thank you for your comments. In addition to ROL5, the cgb mutant may have other mutations compared with WT.COM is a complementation line in the cgb background. Therefore, the genetic background between COM and cgb may be more similar than that of WT and cgb*. __

      • In Figure 5E, why does ACTIN2 have an enhanced translation while NPR1 shows a compromised one in cgb mutant? How does the mcm5s2U distinguish NPR1 and ACTIN2 codons? Does mcm5s2U modification have both positive and negative roles in regulating protein translation? __Response: Thank you for raising this question. As previously reported, _loss of the mcm5s2U modification causes ribosome pausing at AAA and CAA codons. Therefore, the translation of the mRNAs with more _GAA/CAA/AAA codons (called s2 codon) is likely to be affected more dramatically in cgb*. We have analyzed the percentage of s2 codon at whole-genome level (Figure below). The average percentage is 8.5%, while NPR1 contains 10.1% s2 codon and actin contains only 4.5% s2 codon. When fewer ribosomes are used for translation of the mRNAs with high s2 codon percentage, more ribosomes are available for translation of the mRNAs with low s2 codon percentage, which may account for the enhanced translation efficiency. To focus on NPR1 and to avoid confusion, we removed the ACTIN data in the revised manuscript. __

      • Specify the protein amount used for the in vitro pull-down assay and agrobacteria concentration used for the tobacco Co-IP assay in the protocol section.*

      Response: We added this information in Method section in the revised manuscript.

        1. Delete the SA quantification and Ion leakage assay in the protocol, which are not used in the study.*

      __Response: We are sorry for the mistake. The ____SA quantification and ion leakage data were included in previous visions of the manuscript. We removed the data but forgot to remove the corresponding method in the present version. We deleted them in the revised manuscript. __

      • The strain Pst DC3000 avrRPT2 was not used in this study. Please remove it.*

      Response: We are sorry for the mistake. ____The strain Pst DC3000 avrRPT2 was used for ion leakage assay in previous visions of the manuscript. We deleted it in the revised manuscript.

      • In Figure 5F, did the 59 genes tested overlap with the 366 attenuated proteins in the cgb mutant? Were the 59 genes translationally regulated?*

      __Response: Thank you for your suggestion. Venn diagram analysis revealed that 12 genes (about 20%) are also attenuated proteins, suggesting that ____the mcm5s2U modification regulates the translation of some SA-responsive genes. __

      Reviewer #2 (Significance (Required)):

      The authors' study is significant as it establishes the first connection between tRNA mcm5s2U modification and plant immunity, specifically by regulating NPR1 protein translation. This research expands our understanding of the biological role of tRNA mcm5s2U modification and highlights the importance of translational control in plant immunity. It is likely to captivate scientists working in this field.

      Response: Thank you very much for your positive comments and support!

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, the authors identified a cgb mutant that carries a mutation in the ROL5 gene Both the cgb mutant and the newly created rol5-c mutant are susceptible to the bacterial pathogen Psm. The authors showed that ROL5 interacts with CTU2, the Arabidopsis homologous protein of the yeast tRNA thiolation enzyme NCS2. A ctu2-1 mutant is also susceptible to Psm, suggesting the tRNA thiolation may play a role in plant immunity. Indeed, tRNA mcm5S2U levels are undetectable in rol5-c and ctu2-1 mutants. The authors found that the cgb mutation significantly attenuated basal and Psm-induced transcriptome and proteome changes. Furthermore, it was found that the translation efficiency of a group of SA signaling-related proteins including NPR1 is compromised.

      The manuscript provides solid evidence for the involvement of ROL5 and CTU2 in plant immunity using the rol5 and ctu2 mutants. The authors may consider the following suggestions and comments to improve the manuscript.

      Response: Thank you very much for your support and suggestions!

      • The function of the Elongator complex in tRNA modification/thiolation has been extensively studied. In Arabidopsis Elongator mutants, mcm5S2U levels are very low, similar to the levels in the rol5 and ctu2 mutants (Mehlgarten et al., 2010, Mol Microbiology, 76, 1082-1094; Leitner et al., 2015 Cell Rep). In elp mutants, the PIN protein levels are reduced without reduced mRNA levels (Leitner et al., 2015), indicating that Elongator-mediated tRNA modification is involved in translation regulation. The Elongator complex plays an important role in plant immunity, though the reduced mcm5S2U levels in elp mutants were not proposed as the exclusive cause of the immune phenotypes. In fact, it would be difficult to establish a cause-effect relationship between tRNA modification and immunity. These results should be discussed in the manuscript.* Response: Thank you very much for your insightful comment on the role of the ELP complex in tRNA modification and plant immunity. We added a paragraph ____discussing the ELP complex in the revised manuscript (Line 280-295).

      __In addition to tRNA modification, the ELP complex has several other distinct activities including histone acetylation, α-tubulin acetylation, and DNA demethylation. Therefore, it is difficult to dissect which activity of the ELP complex contributes to plant immunity. However, the only known activity of ROL5 and CTU2 is to catalyze _tRNA thiolation. Considering that the elp, rol5, and ctu2 mutants are all defective in tRNA thiolation, it is likely the _tRNA modification activity of the ELP complex underlies its function in plant immunity. __

      • The interaction between CTU2 and ROL5 in Y2H has previously been reported (Philipp et al., 2014). The same report also showed reduced tRNA thiolation in the ctu2-2 mutant using polyacrylamide gel. These results should be mentioned/discussed in the manuscript.*

      __Response: Thank you for pointing them out. We added this information in the revised version (Line 146-147). __

      • tRNA modification unlikely plays a unique role in plant immunity. It can be inferred that mutations affecting tRNA modification (rol5, ctu2, elp, etc.) would delay both internal and external stimulus-induced signaling including immune signaling.*

      Response: We agree with you that tRNA modification has other roles in addition to plant immunity. In the Discussion section, we have mentioned that “it was found that tRNA thiolation is required for heat stress tolerance ____(Xu et al., 2020)____. ……It will also be interesting to test whether tRNA thiolation is required for responses to other stresses such as drought, salinity, and cold.” (Line276-279).

      • It would be interesting to conduct statistical analyses on the genetic codons used in the CDSs whose translation was attenuated as described in the manuscript. Do these genes including NPR1 use more than average levels of AAA, CAA, and GAA codons? If not, why their translation is impaired?*

      __Response: Thank you for your suggestion. We called _GAA/CAA/AAA codons s2 codon. We have analyzed the percentage of s2 codon at whole-genome level (Figure below). NPR1 does contain more s2 codon (10.1%) than the average level (8.5%). We are preparing another manuscript, which will report the relationship between _s2 codon and translation. __

      **Referees cross-commenting**

      It is important to put current research in the context of available knowledge in the field. The digram in Figure 3C shows that the Elongator complex functions upstream of ROL5 & CTU2 in modifying tRNA. The function of Elongator in plant immunity has been well established. The similarities and differences should be discussed. Additionally, it may no be a good idea to claim that the results are novel.

      __Response: Thank you for your comments. We added a paragraph ____discussing the ELP complex in the revised manuscript (Line 280-295). The ELP complex catalyzes the cm5U modification, which is the precursor of mcm5s2U catalyzed by ROL5 and CTU2. In addition to tRNA modification, the ELP complex has several other distinct activities including histone acetylation, α-tubulin acetylation, and DNA demethylation. Therefore, it is difficult to dissect which activity of the ELP complex contributes to plant immunity. However, the only known activity of ROL5 and CTU2 is to catalyze tRNA thiolation. Considering that the elp, rol5, and ctu2 mutants are all defective in tRNA thiolation, it is likely the tRNA modification activity of the ELP complex underlies its function in plant immunity. Therefore, our study improved our understanding of the ELP complex in plant immunity. We have deleted the words “new” and “novel” throughout the manuscript. __

      Reviewer #3 (Significance (Required)):

      *The manuscript provides solid evidence for the involvement of ROL5 and CTU2 in plant immunity. However, the authors did not acknowledge the existing results about the Elongator complex that functions in the same pathway in modifying tRNA. The involvement of Elongator in plant immunity has been well established. The cause-effect relationship between tRNA modification and plant immunity is difficult to demonstrate. *

      Response: We think that t____he cause-effect relationship between the activities of the ELP complex and plant immunity is difficult to demonstrate because the ELP complex has several distinct activities other than tRNA modification. However, since the only known activity of ROL5 and CTU2 is to catalyze tRNA thiolation, the cause-effect relationship between tRNA thiolation and plant immunity is clear, which indicated that ____the ____tRNA modification activity of the ELP complex contributes to plant immunity.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Dear Editor and Reviewers,

      *We thank you for the thorough and detailed examination of our preprint and providing the very valuable comments that helped to even better present and interpret our data. *

      *Thank you in particular for appreciating the extensive set of microscopic techniques that we have combined to study in a unique manner the characteristics and functionalities of FIT nuclear bodies in living plant cells. *

      We prepared a revised preprint in which we address all reviewer comments. Our revision includes a NEW experiment (in four repetitions) that addresses one comment made by the reviewers with regard to the effects of the environmental FIT NB-inducing situation:

      • NEW Supplemental Figures S6 and S7: Analysis of previously reported intron retention splicing variants of Fe deficiency genes FIT, BHLH039, IRT1, FRO2 in new gene expression experiments (Four independent repetitions of the experiments with three biological replicates of each sample – white/blue light treatment, sufficient and deficient iron supply). In the following, please find our detailed response to all reviewer comments.

      With these changes, we hope that our peer-reviewed preprint can receive a positive vote,

      We are looking forward to your response,

      Sincerely

      Petra Bauer and Ksenia Trofimov on behalf of all authors

      Comments to the reviews:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this paper entitled " FER-LIKE IRON DEFICIENCY-INDUCED TRANSCRIPTION FACTOR (FIT) accumulates in homo- and heterodimeric complexes in dynamic and inducible nuclear condensates associated with speckle components", Trofimov and colleagues describe for the first time the function of FIT in nuclear bodies. By an impressive set of microscopies technics they assess FIT localization in nuclear bodies and its dynamics. Finally, they reveal their importance in controlling iron deficiency pathway. The manuscript is well written and fully understandable. Nonetheless, at it stands the manuscript present some weakness by the lack of quantification for co-localization and absence controls making hard to follow authors claim. Moreover, to substantially improve the manuscript the authors need to provide more proof of concepts in A. thaliana as all the nice molecular and cellular mechanism is only provided in N. bentamiana. Finally, some key conclusions in the paper are not fully supported by the data. Please see below:

      Main comments:

      1) For colocalization analysis, the author should provide semi-quantitative data counting the number of times by eyes they observed no, partial or full co-localization and indicate on how many nucleus they used.

      Authors:

      We have added the information in the Materials and Method section, lines 731-734:

      In total, 3-4 differently aged leaves of 2 plants were infiltrated and used for imaging. One infiltrated leaf with homogenous presence of one or two fluorescence proteins was selected, depending on the aim of the experiment, and ca. 30 cells were observed. Images are taken from 3-4 cells, one representative image is shown.

      In all analyzed cases, except in the case of colocalization of FIT and PIF4 fusion proteins, the ca. 30 cells had the same localization and/or colocalization patterns. This information has also been added in the figure legends. Each experiment was repeated at least 2-3 times, or as indicated in the figure legend.

      2) Do semi-quantitative co-localization analysis by eyes, on FIT NB with known NB makers in the A. thaliana root. For now, all the nicely described molecular mechanism is shown in N. benthamiana which makes this story a bit weak since all the iron transcriptional machinery is localized in the root to activate IRT1.

      Authors:

      The described approach has been very optimal, and we were able to screen co-localizing marker proteins in FIT NBs in N. benthamiana to better identify the nature of FIT NBs. This has been successful as we were able to associate FIT NBs with speckles. The N. benthamiana system allowed optimal microscopic observation of fluorescence proteins and quantification of FIT NB characteristics in contrast to the root hair zone of Arabidopsis where Fe uptake takes place. FIT is expressed at a low level in roots and also in leaves, whereby fluorescence protein expression levels are insufficient for the here-presented microscopic studies. The tobacco infiltration system is also well established to study FIT-bHLH039 protein interaction and nuclear body markers. We discuss this point in the discussion, see line 489-500.

      3) The authors need to provide data clearly showing that the blue light induce NB in A. thaliana and N. benthamiana.

      Authors:

      *For tobacco, see Figure 1B (t = 0, 5 min) and Supplemental Movies S1. For Arabidopsis, please see Figure 1A (t = 0, 90 and 120 min) and Supplemental Figure S1A. We provide an additional image of pFIT:cFIT-GFP Arabidopsis control plants, showing that NB formation is not detected in plants that were grown in white light and not exposed to blue light before inspection (Supplemental Figure S1B). We state, that upon blue light exposure, plants had FIT NBs in at least 3-10 nuclei of 20 examined nuclei in the root epidermis in the root hair zone (in three independent experiments with three independent plants). White-light-treated plants showed no NB formation unless an additional exposure to blue light was provided (in three independent experiments, three independent plants per experiment and with 15 examined nuclei per plant). *

      4) Direct conclusion in the manuscript:

      • Line 170: At this point of the paper the author cannot claim that the formation of FIT condensates in the nucleus is due to the light as it might be indirectly linked to cell death induced by photodamaging the cell using a 488 lasers for several minutes. This is true especially with the ELYRA PS which has strong lasers made for super resolution and that Cell death is now liked to iron homeostasis. The same experiment might be done using a spinning disc or if the authors present the data of the blue light experiment mentioned above this assumption might be discarded. Alternatively, the author can use PI staining to assess cell viability after several minutes under 488nm laser.

      Authors:

      As stated in our response to comment 3, we have included now a white light control to show that FIT NB formation is not occurring under the normal white light conditions. Since the formation of FIT NBs is a dynamic and reversible process (Figure 1A), it indicates that the cells are still viable, and that cell death is not the reason for FIT NB formation.

      • Line 273: I don't agree with the first part of the authors conclusion, saying that "wild-type FIT had better capacities to localize to NBs than mutant FITmSS271AA, presumably due its IDRSer271/272 at the C-terminus. This is not supported by the data. In order to make such a claim the author need to compare the FA of FIT WT with FITmSS271AA by statistical analysis. Nonetheless, the value seems to be identical on the graphs. The main differences that I observed here are, 1) NP value for FITmSS271AA seems to be lower compared to FIT-WT, suggesting that the Serine might be important to regulate protein homedimerization partitioning between the NP and the NB. 2) To me, something very interesting that the author did not mention is the way the FA of FITmSS271AA in the NB and NP is behaving with high variability. The FA of those is widely spread ranging from 0.30 to 0.13 compared to the FIT-WT. To me it seems that according to the results that the Serine 271/272 are required to stabilize FIT homodimerization. This would not only explain the delay to form the condensate but also the decreased number and size observed for FITmSS271AA compared to FIT-WT. As the homodimerization occurs with high variability in FITmSS271AA, there is less chance that the protein will meet therefore decreasing the time to homodimerize and form/aggregate NB.

      Authors:

      We fully agree. We meant to describe this result it in a similar way and thank you for help in formulating this point even better. Rephrasing might make it better clear that the IDRSer271/272 is important for a proper NB localization, lines 272-278:

      “Also, the FA values did not differ between NBs and NP for the mutant protein and did not show a clear separation in homodimerizing/non-dimerizing regions (Figure 3D) as seen for FIT-GFP (Figure 3C). Both NB and NP regions showed that homodimers occurred very variably in FITmSS271AA-GFP.

      In summary, wild-type FIT could be partitioned properly between NBs and NP compared to FITmSS271AA mutant and rather form homodimers, presumably due its IDRSer271/272 at the C-terminus.”

      • Line 301: According to my previous comment (line 273), here it seems that the Serine 271/272 are required only for proper partitioning of the heterodimer FIT/BHLH039 between the NP and NB but not for the stability of the heterodimer formation. However, it might be great if the author would count the number of BHLH039 condensates in both version FITmSS271AA and FIT-WT. To my opinion, they would observe less BHLH039 condensate because the homodimer of FITmSS271AA is less likely to occur because of instability.

      Authors:

      bHLH039 alone localizes primarily to the cytoplasm and not the nucleus, and the presence of FIT is crucial for bHLH039 nuclear localization (Trofimov et al., 2019). Moreover, bHLH039 interaction with FIT depends on SS271AA (Gratz et al., 2019). We therefore did not consider this experiment for the manuscript and did not acquire such data, as we did not expect to achieve major new information.

      5) To wrap up the story about the requirements of NB in mediating iron acquisition under different light regimes, provide data for IRT1/FRO2 expression levels in fit background complemented with FITmSS271AA plants. I know that this experiment is particularly lengthy, but it would provide much more to this nice story.

      Authors:

      Data for expression of IRT1 and FRO2 in FITmSS271AA/fit-3 transgenic Arabidopsis plants are provided in Gratz et al. (2019). To address the comment, we did here a NEW experiment. We provide gene expression data on FIT, BHLH039, IRT1 and FRO2 splicing variants (previously reported intron retention) to explore the possibility of differential splicing alterations under blue light (NEW Supplemental Figure S6 and S7, lines 454-466). Very interestingly, this experiment confirms that blue light affects gene expression differently from white light in the short-term NB-inducing condition and that blue light can enhance the expression of Fe deficiency genes despite of the short 1.5 to 2 h treatment. Another interesting aspect was that the published intron retention was also detected. A significant difference in intron retention depending on iron supply versus deficiency and blue/white light was not observed, as the pattern of expression of transcripts with respective intron retentions sites was the same as the one of total transcripts mostly spliced.

      Minor comments

      In general, I would suggest the author to avoid abbreviation, it gets really confusing especially with small abbreviation as NB, NP, PB, FA.

      Authors:

      *We would like to keep the used abbreviations as they are utilized very often in our work and, in our eyes, facilitate the understanding. *

      Line 106: What does IDR mean?

      Authors:

      Explanation of the abbreviation was added to the text, lines 105-108:

      “Intrinsically disordered regions (IDRs) are flexible protein regions that allow conformational changes, and thus various interactions, leading to the required multivalency of a protein for condensate formation (Tarczewska and Greb-Markiewicz, 2019; Emenecker et al., 2020).”

      Line 163-164: provide data or cite a figure properly for blue light induction.

      Authors:

      We have removed this statement from the description, as we provide a white light control now, lines 157-158:

      “When whole seedlings were exposed to 488 nm laser light for several minutes, FIT became re-localized at the subnuclear level.”

      Line 188: Provide Figure ref.

      Authors:

      Figure reference was added to the text, lines 184-185:

      “As in Arabidopsis, FIT-GFP localized initially in uniform manner to the entire nucleus (t=0) of N. benthamiana leaf epidermis cells (Figure 1B).”

      Line 194: the conclusion is too strong. The authors conclude that the condensate they observed are NB based on the fact the same procedure to induce NB has been used in other study which is not convincing. Co-localization analysis with NB markers need to be done to support such a claim. At this step of the study, the author may want to talk about condensate in the nucleus which might correspond to NB. Please do so for the following paragraph in the manuscript until colocalization analysis has not been provided. Alternatively provide the co-localization analysis at this step in the paper.

      Authors:

      We agree. We changed the text in two positions.

      Lines 176-178: “Since we had previously established a reliable plant cell assay for studying FIT functionality, we adapted it to study the characteristics of the prospective FIT NBs (Gratz et al., 2019, 2020; Trofimov et al., 2019).”

      Lines 192-193:We deduced that the spots of FIT-GFP signal were indeed very likely NBs (for this reason hereafter termed FIT NBs).”

      Line 214: In order to assess the photo bleaching due to the FRAP experiment the quantification of the "recovery" needs to be provided in an unbleached area. This might explain why FIT recover up to 80% in the condensate. Moreover, the author conclude that the recovery is high however it's tricky to assess since no comparison is made with a negative/positive control.

      Authors:

      In the FRAP analysis, an unbleached area is taken into account and used for normalization.

      We reformulated the description of Figure 1F, lines 212-214:

      “According to relative fluorescence intensity the fluorescence signal recovered rapidly within FIT NBs (Figure 1F), and the calculated mobile fraction of the NB protein was on average 80% (Figure 1G).”

      Line 220-227: The conclusion it's too strong as I mentioned previously the author cannot claim that the condensate are NBs at this step of the study. They observed nuclear condensates that behave like NB when looking at the way to induce them, their shape, and the recovery. And please include a control.

      Authors:

      Please see the reformulated sentences and our response above.

      Lines 176-178: “Since we had previously established a reliable plant cell assay for studying FIT functionality, we adapted it to study the characteristics of the prospective FIT NBs (Gratz et al., 2019, 2020; Trofimov et al., 2019).”

      Lines 192-193:We deduced that the spots of FIT-GFP signal were indeed very likely NBs (for this reason hereafter termed FIT NBs).”

      Line 239: It's unappropriated to give the conclusion before the evidence.

      Authors:

      Thank you. We removed the conclusion.

      Line 240: Figure 2A, provide images of FIT-G at 15min in order to compare. And the quantification needs to be provided at 5 minutes and 15 minutes for both FIT-G WT and FIT-mSS271AA-G counting the number of condensates in the nucleus. Especially because the rest of the study is depending on these time points.

      Authors:

      *This information is provided in the Supplemental Movie S1C. *

      Line 241: the author say that the formation of condensate starts after 5 minutes (line 190) here (line 241) the author claim that it starts after 1 minutes. Please clarify.

      Authors:

      In line 190 we described that FIT NB formation occurs after the excitation and is fully visible after 5 min. In line 241 we stated that the formation starts in the first minutes after excitation, which describes the same time frame. We rephrased the respective sentences.

      Lines 185-188: “A short duration of 1 min 488 nm laser light excitation induced the formation of FIT-GFP signals in discrete spots inside the nucleus, which became fully visible after only five minutes (t=5; Figure 1B and Supplemental Movie S1A).”

      Lines 239-242: “While FIT-GFP NB formation started in the first minutes after excitation and was fully present after 5 min (Supplemental Movie S1A), FITmSS271AA-GFP NB formation occurred earliest 10 min after excitation and was fully visible after 15 min (Supplemental Movie S1C).”

      Line 254: Not sure what the authors claim "not only for interaction but also for FIT NB formation ". To me, the IDR is predicted to be perturbed by modeling when the serines are mutated therefore the IDR might be important to form condensates in the nucleus. Please clarify.

      Authors:

      The formation of nuclear bodies is slow for FITmSS271AA as seen in Figure 2. Previously, we showed that FITmSS271AA homodimerizes less (Gratz et al., 2019.) Therefore, the said IDR is important for both processes, NB formation and homodimerization. We have added this information to make the point clear, lines 253-255:

      “This underlined the significance of the Ser271/272 site, not only for interaction (Gratz et al., 2019) but also for FIT NB formation (Figure 2).”

      Line 255: It's not clear why the author test if the FIT homodimerization is preferentially associated with condensate in the nucleus.

      Authors:

      We test this because both homo- and heterodimerization of bHLH TFs are generally important for the activity of TFs, and we unraveled the connection between protein interaction and NB formation. We state this in lines 228-232.

      Line 269-272: It's not clear to what the authors are referring to.

      Authors:

      We are describing the homodimeric behavior of FIT and FITmSS271AA assessed by homo-FRET measurements that are introduced in the previous paragraph, lines 256-268.

      Line 309: This colocalization part should be presented before line 194.

      Authors:

      We find it convincing to first examine and characterize the process underlying FIT NB formation, then studying a possible function of NBs. The colocalization analysis is part of a functional analysis of NBs. We thank the reviewer for the hint that colocalization also confirms that indeed the nuclear FIT spots are NBs. We will take this point and discuss it, lines 516-522:

      “Additionally, the partial and full colocalization of FIT NBs with various previously reported NB markers confirm that FIT indeed accumulates in and forms NBs. Since several of NB body markers are also behaving in a dynamic manner, this corroborates the formation of dynamic FIT NBs affected by environmental signals.”

      “In conclusion, the properties of liquid condensation and colocalization with NB markers, along with the findings that it occurred irrespective of the fluorescence protein tag preferentially with wild-type FIT, allowed us to coin the term of ‘FIT NBs’.”

      Line 328: add the ref to figure, please.

      Authors:

      Figure reference was added to the text, lines 330-332:

      “The second type (type II) of NB markers were partially colocalized with FIT-GFP. This included the speckle components ARGININE/SERINE-RICH45-mRFP (SR45) and the serine/arginine-rich matrix protein SRm102-mRFP (Figure 5).”

      Line 334: It seems that the size of the SR45 has an anormal very large diameter between 4 and 6 µm. In general a speckle measure about 2-3µm in diameter. Can the author make sure that this structure is not due to overexpression in N. benthamiana or make sure to not oversaturate the image.

      Authors:

      Thank you for this hint. Indeed, there are reports that SR45 is a dynamic component inside cells. It can redistribute depending on environmental conditions and associate into larger speckles depending on the nuclear activity status (Ali et al., 2003). We include this reference and refer to it in the discussion, lines 557-564:

      “Interestingly, typical FIT NB formation did not occur in the presence of PB markers, indicating that they must have had a strong effect on recruiting FIT. This is interesting because the partially colocalizing SR45, PIF3 and PIF4 are also dynamic NB components. Active transcription processes and environmental stimuli affect the sizes and numbers of SR45 speckles and PB (Ali et al., 2003; Legris et al., 2016; Meyer, 2020). This may indicate that, similarly, environmental signals might have affected the colocalization with FIT and resulting NB structures in our experiments. Another factor of interference might also be the level of expression.”

      Line 335: It seems that the colocalization is partial only partial after induction of NB. The FIT NB colocalize around SR45. But it's hard to tell because the images are saturated therefore creating some false overlapping region.

      Authors:

      The localization of FIT with SR45 is partial and occurs only after FIT has undergone condensation, see lines 335-338.

      Line 344-345: It's unappropriated to give the conclusion before the evidence.

      Authors:

      We explain at an earlier paragraph that we will show three different types of colocalization and introduce the respective colocalization types within separate paragraphs accordingly, see lines 314-321.

      Line 353: increase the contrast in the image of t=5 for UAP56H2 since it's hard to assess the colocalization.

      Authors:

      This is done as noted in the figure legend of Figure 6.

      Line 381-382: "In general" does not sound scientific avoid this kind of wording and describe precisely your findings.

      Authors:

      We rephrased the sentence, line 387-388:

      Localization of single expressed PIF3-mCherry remained unchanged at t=0 and t=15 (Supplemental Figure S5A).

      Line 384-385: Provide the data and the reference to the figure.

      Authors:

      We apologize for the misunderstanding and rephrased the sentence, line 389-391:

      After 488 nm excitation, FIT-GFP accumulated and finally colocalized with the large PIF3-mCherry PB at t=15, while the typical FIT NBs did not appear (Figure 7A)

      Line 386: The structure in which FIT-G is present in the Figure 7A t=15 is not alike the once already observed along the paper. This could be explained by over-expression in N. benthamiana. Please explain.

      Authors:

      Thank you for the hint. We discuss this in the discussion part, see lines 555-568.

      Line 393: Explain and provide data why the morphology of PIF4/FIT NB do not correspond to the normal morphology.

      Authors:

      Thank you for the valuable hints. Several reasons may account for this and we provide explanations in the discussion, see lines 555-568.

      Line 396-398: It seems also from the data that co-expression of PIF4 of PIF3 will affect the portioning of FIT between the NP and the NB.

      Authors:

      We can assume that residual nucleoplasm is depleted from protein during NB formation. This is likely true for all assessed colocalization experiments. We discuss this in lines 492-494.

      The discussion is particularly lengthy it might be great to reduce the size and focus on the main findings.

      Authors:

      *We shortened the discussion. *

      **Referees cross-commenting**

      All good for me, I think that the comments/suggestions from Reviewer #2 are valid and fair. If they are addressed they will improve considerably the manuscript.

      Reviewer #1 (Significance (Required)):

      This manuscript is describing an unprecedent very precise cellular and molecular mechanism in nutrition throughout a large set of microscopies technics. Formation of nuclear bodies and their role are still largely unexplored in this context. Therefore, this study sheds light on the functional role of this membrane less compartment and will be appreciated by a large audience. However, the fine characterization is only made using transient expression in N. Bentamiana and only few proofs of concept are provided in A. thaliana stable line.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript of Trofimov et al shows that FIT undergoes light-induced, reversible condensation and localizes to nuclear bodies (NBs), likely via liquid-liquid phase separation and light conditions plays important role in activity of FIT. Overall, manuscript is well written, authors have done a great job by doing many detailed and in-depth experiments to support their findings and conclusions.

      However, I have a number of questions/comments regarding the data presented and there are still some issues that authors should take into account.

      Major points/comments:

      1) Authors only focused on blue light conditions. Is there any specific reason for selecting only blue light and not others (red light or far red)?

      Authors:

      There are two main reasons: First, in a preliminary study (not shown) blue light resulted in the formation of the highest numbers of NBs. Second, iron reductase activity assays and gene expression analysis under different light conditions showed a promoting effect under blue light, but not red light or dark red light (Figure 9). This indicated to us, that blue light might activate FIT, and that active FIT may be related to FIT NBs.

      2) Fig. 3C and D: as GFP and GFP-GFP constructs are used as a reference, why not taking the measurements for them at two different time points for example t=0 and t=5 0r t=15???

      Authors:

      Free GFP and GFP-GFP dimers are standard controls for homo-FRET that serve to delimit the range for the measurements.

      3) Line 27-271: Acc to the figure 3d, for the Fluorescence anisotropy measurement of NBs appears to be less. Please explain.

      Authors:

      FA in NBs with FITmSS271AA is variable and the value is lower than that of whole nucleus but not significantly different compared with that in nucleoplasm. We describe the results of Figure 3D in lines 272-275.

      4) Figure 4: For the negative controls, data is shown at only t=0, data should be shown at t=5 also to prove that there is no decrease in fluorescence in these negative controls when they are expressed alone without bhlh39 as there is no acceptor in this case.

      Authors:

      Neither for FIT/bHLH039 nor the FITmSS271AA/bHLH039 pair, there is a significant decrease in the fluorescence lifetime values between t=0 and t=5/15. FIT-G is a control to delimit the range. The interesting experiment is to compare the protein pairs of interest between the different nuclear locations at t=5/15.

      5) Line 300-301: In Figure 4D and 4E. Fluorescence lifetime of G measurement at t=0 seems very similar for both FIT-G as well as FITmSS but if we look at the values of t=0 for FIT-G+bhlh039 it is greater than 2.5 and for FITmSS271AA-G+bhlh039 it is less which suggests more heterodimeric complexes to be formed in FITmSS271AA-G+bhlh039. Similar pattern is observed for NBs and NPs, according to the figure 4d and E.

      Therefore, heterodimeric complexes accumulated more in case of FITmSS271AA-G+bhlh039 as compared to FIT-G+bhlh039 (if we compare measurement values of Fluorescence lifetime of G of FITmSS271AA-G+bhlh039 with FIT-G+bhlh039).

      Please comment and elaborate about this further.

      Authors:

      These conclusions are not valid as the experiments cannot be conducted in parallel. Since the experiments had to be performed on different days due to the duration of measurements including new calibrations of the system, we cannot compare the absolute fluorescence lifetimes between the two sets.

      6) Figure 4: For the negative controls, data is shown at only t=0, data should be shown at t=5 also to prove that there is no decrease in fluorescence in these negative controls when they are expressed alone without bhlh39 as there is no acceptor in this case.

      Authors:

      Please see our response to your comment 4).

      7) Line 439-400: As iron uptake genes (FRO2 and IRT1) are more induced in WT under blue light conditions and FRO2 is less induced in case of red-light conditions. So, what happens to Fe content of WT grown under blue light or red light as compared to WT grown under white light. Perls/PerlsDAb staining of WT roots under different light conditions will add more information to this.

      Authors:

      We focused on the relatively short-term effects of blue light on signaling of nuclear events that could be related to FIT activity directly, particularly gene expression and iron reductase activity as consequence of FRO2 expression. These are both rapid changes that occur in the roots and can be measured. We suspect that iron re-localization and Fe uptake also occur, however, in our experience differences in metal contents will not be directly significant when applying the standard methods like ICP-MS or PERLs staining.

      Minor comments:

      Line 75-76: Rephrase the sentence

      Authors:

      We rephrased the sentence, lines 73-74:

      “As sessile organisms, plants adjust to an ever-changing environment and acclimate rapidly. They also control the amount of micronutrients they take up.”

      Line 119: Rephrase the sentence

      Authors:

      We rephrased the sentence, line 118-119:

      “Various NBs are found. Plants and animals share several of them, e.g. the nucleolus, Cajal bodies, and speckles.”

      Line 235-236: rephrase the sentence

      Authors:

      We rephrased the sentence, line 232-234:

      “In the work of Gratz et al. (2019), the hosphor-mimicking FITmS272E protein did not show significant changes in its behavior compared to wild-type FIT.”

      Line 444: Correct the sentence “Fe deficiency versus sufficiency”

      Authors:

      We corrected that, line 449-451:

      “In both, the far-red light and darkness situations, FIT was induced under iron deficiency versus sufficiency, while on the other side, BHLH039, FRO2 and IRT1 were not induced at all in these light conditions (Figure 9I-P).”

      **Referees cross-commenting**

      I agree with R1 suggestions/comments and i think manuscript quality will be much better if authors carry out the experiments suggested by R1. I believe this will also strengthen their conclusions.

      Reviewer #2 (Significance (Required)):

      Overall, manuscript is well written, authors have done a nice job by doing several key experiments to support their findings and conclusions. However, the results and manuscript can be improved further by addressing some question raised here. This study is interesting for basic scientists which unravels the crosstalk of light signaling in nutrient signaling pathways.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      In this manuscript, Hoskins et al describe analyses of the effects of sequence variation on RNA levels, protein levels, and ribosome loading for the COMT gene. They use multiple experimental approaches to assay these levels and report on how sequence differences affect expression. Overall, the paper is interesting in that it presents a very deep dive into the effects of sequence variation on gene expression, including in coding sequences. However, there are some issues with the polysome loading assay technique and there are substantial issues with the figure presentation, which is often confusing.

      __Response: __Thanks for the positive assessment of our manuscript and the constructive feedback regarding the issues with the figure presentation. We have addressed all of these below and they have significantly improved the clarity.

      • Major comments:*

      • 1) Figures:*

      • --Fig 1C needs a cartoon description to show where the UTRs are. Y-axis should say "Ribo-seq CPM"*

      __Response: __Fig 1C now includes a schematic and the y-axis is updated. Locations of the uORFs are also now included in Fig 1A.

      • --Sup Fig 1A confusing, what is "start" what is the point of this panel?*

      __Response: __We apologize for the confusing labeling of the panels in Sup Fig 1. “Start” refers to the MB-COMT start codon. We removed this annotation as it is irrelevant to the figure. We included Supplementary Figure 1A to show RNA probing data for the entire transcript. Figure 1A and B only show the regions that encompass the variants assayed in our study.

      • --Sup Fig 1B what is PCBP del?*

      Response: “PCBP del” refers to deletion of PCBP1/PCBP2 RNA binding protein motifs. The legend now specifies this.

      • --Sup Fig 1C what is "uORF B restore"? The description in the figure legend is not interpretable. Draw diagrams of the mutations that tell the reader what was assayed and why it was assayed. Why are there multiplication factors listed (e.g. 1.33X)? The data are depicted on a log scale, which makes it difficult to appreciate the fold-effects of the mutations (e.g. does uORFA mutation increase expression 1.5-fold?). Please calculate median expression values and report them on a bar graph or something like that so readers can interpret the results.*

      Response: “uORF B restore” refers to restoration of the endogenous uORF B frame with a silent variant in the Flag tag of the transgene. The multiplication factors listed were the fold change in median fluorescence between each mutant and the template (wild-type) transgene. We retained the figures as they show the raw distribution of fluorescence in each cell line, but in response to the reviewer’s suggestion we included a new figure displaying the effects as a bar graph (Supplementary Figure 1E).

      • --Fig 2A. It's hard to understand the cartoon diagram of the expression reporter construct. Why is +Dox shown here? Does that induce transcription?*

      __Response: __The reviewer is correct. “+Dox” indicated addition of Doxycycline to induce transcription before the data collection step. We agree that there may have been too much detail in this diagram and have now removed this for simplicity and indicated this in the Methods section.

      • --Fig 2B. What's on the x-axis? is it Log2(RNA/gDNA) from sequencing? is it Log2 or Log10 or Ln?*

      __Response: __Variant effects in each figure were derived from ALDEx2 analysis, which reports effect size as the median standardized difference between groups. The effect size is not directly interpretable as a log fold change; it takes into account the difference between groups as well as the dispersion. This analysis strategy has been previously demonstrated for analysis of SELEX experiments (Fernandes et al. 2014), which are used to select small populations of cells with specific phenotypes.

      ALDEx2 is a robust and principled choice for the analysis of count-compositional datasets, particularly after selection (e.g. sorted cell populations or low-input RNA fractions arising from polysome profiling). While we understand that this choice leads to less easily interpretable effect sizes, the mathematical advantages make ALDEx2 a more appropriate choice for this type of data. In the past, we had used other methods to analyze log frequencies (limma, a frequency based normalization-dependent analysis, as previously employed in Hoskins et al. 2023. Genome Biology) that directly reported fold changes. In our experience, the ALDEx2-derived effect sizes are well-correlated with those estimates (Pearson correlation 0.93 for variants significant at a FDR

      • --Fig 2C. What's on the y-axis (same question). I think it's LogX(mutant/wt)RNA level?*

      __Response: __For consistency with other figures, we replaced Figure 2C to report the effect size statistic as described above.

      • --Fig 2D. What's on the y-axis now? Fold-difference (not log transformed)?*

      __Response: __Please see our response above.

      • --Fig 2E. The scale bar is flipped vs. normal convention. This is also log transformed, but it's not labeled. Please label as log(whatever) and put the negative values on the left side of the bar (red on the left, blue on the right).*

      __Response: __Thanks for the suggestion, we have now updated the scale bar.

      --Fig 2F y-axis should say Ribo-seq CPM.

      __Response: __Done

      • --Fig 3A - please separate the graphs more. Did you sort cells from ROI2 into populations, or just cells from ROI1?*

      __Response: __Thanks for the suggestion, we now separate the graphs further. Cells were sorted for both ROI 1 and ROI 2 libraries.

      • --Fig3C-F What's the "effect size" mean on these graphs?*

      __Response: __Please see the response above regarding the effect size estimate from ALDEx2.

      • --Fig3D It looks like the colors have switched for positive / negative "effects" on the heat map*

      • compared to Figure 2E. Please define what "median effect" means and be consistent with*

      • comparison to figure 2E.*

      __Response: __We intentionally inverted colors for Figure 3. The rationale is that a variant causing low protein abundance corresponds to enrichment in P3 compared to gDNA, as opposed to depletion in P3. On the other hand, for effects on RNA abundance and ribosome load, a variant leading to low abundance for these measures is depleted.

      • --Figure 4 what does effect size mean, what's the log-transformed scale (log2, 10, etc) same issues from earlier figures.*

      __Response: __Please see response above.

      • --Figure 5 "effect size"*

      __Response: __The same definition of effect size was used with the exception that effect sizes are multiplied by -1 so that color schemes are consistent for deleterious effects.

      • 2) "Codon stability" should always be "Codon Stability Coefficient", maybe use "CSC". Otherwise it's confusing.*

      __Response: __Thanks for the suggestion. This has been updated throughout the manuscript.

      3) Flow cytometry section talks about "RNA fluorescence", which is confusing. You need to explain that it's IRES-driven mCherry as a proxy for the level of RNA first. It would also help to state explicitly that you sorted the cells into four populations, and define them all first before describing the results.

      __Response: __We apologize for the use of imprecise language with respect to this reporter. We revised the text to emphasize that mCherry is a proxy for RNA abundance and described the populations first as suggested.

      4) What are DeMask scores? How are they related to conservation or amino acid properties? If you define these, you can help the reader interpret the result.

      __Response: __Thanks for the suggestion. We now include a conceptual interpretation of the DeMask score in the relevant section. We also include a comparison to a recent large language model for variant effect prediction (ESM1b, Brandes et al. 2023) which is now reported in Supplementary Figure 5C.

      5) There are several issues with the Polysome gradient fractionation. The gradients did not separate 40S, 60S, and monosomal fractions, so it's hard to tell how many ribosomes correspond to each peak on the gradient graph in Figure S5. This is probably because the authors used a 20-50% gradient instead of a lower percentage on top. More significantly, variations in the coding region of COMT are likely affecting the polysome association in ways the authors didn't consider. Nonsense codons will simply make the orf a lot shorter, hence fewer ribosomes. This may have nothing to do with NMD. Silent and missense variants may have unpredictable effects because they may make translation faster (fewer ribosomes) or slower (more ribosomes) on the reporter. This could lead to more ribosomes with less protein or fewer ribosomes with more protein. The reporter RNA also has an IRES loading mCherry on it, which probably helps blunt or dampen the effects of the COMT sequence variants on polysome location distribution. Overall, the design of the polysome assay is probably very limited in power to detect changes in ribosome loading (four fractions, limited separation by 20-50 gradient, IRES loading, etc). This is partially addressed in the limitations section, but these issues could be discussed in more detail.

      __Response: __Given high polysomal association of endogenous COMT and our COMT transgene (Supplementary Figure 2B, Supplementary Figure 5B-C), we chose a 20-50% sucrose gradient to better resolve changes in ribosome load among heavy polysomes.

      We thank the reviewer for offering another valid explanation regarding the depletion of nonsensense variants. We have now included a sentence in the discussion to indicate lower ribosome load for nonsense variants may be due to a shorter ORF as opposed to NMD. We further include the potential limitation of the assay due to the presence of the IRES-mCherry.

      We agree that variants may have unpredictable effects due to effects on the dynamics of translation elongation. To address this potential limitation, we attempted to devise a selective ribosome profiling strategy by immunoprecipitating N-terminal Flag tagged peptides to enrich ribosomes translating COMT. However, we were unable to achieve significant enrichment, limiting our ability to measure variant effects on elongation in a high-throughput manner.

      Significance

      The study is novel in that it assays both 5' UTR and a wide range of protein coding sequence variants for effects on RNA and protein levels from a clinically important gene, COMT. The manuscript reports that most protein coding variants have modest effects on RNA levels, and that the minority of variants that do affect RNA levels are not predictable due to their affect on codon usage. The work also determines the distribution of effects of variants on protein levels, finding a variety of effects on expression. Interestingly, the authors found SNPs that affect ribosome loading generally affect RNA structure of the COMT coding region, rather than affecting codon usage.

      This should appeal to many different communities of biologists - gene expression experts, geneticists, and clinical neurobiologists who focus on COMT. So there is a potential for fairly broad interest. The main limitations to the work are in a lack of clarity in the figures and perhaps in the underdeveloped nature of the discussion section. The discussion section reports new results (SNP associations that affect expression). These would make more sense in the results section, such that the discussion could do a better job relating the impact of sequence variants on expression levels to prior work to highlight the novelty.

      __Response: __We thank reviewer #1 for their positive assessment of the broad significance of our study. We also thank them for constructive suggestions that led to increased clarity in the presentation. We have moved the analysis of gnomAD variants to the Results section and expanded the discussion.

      Reviewer #2

      Evidence, reproducibility and clarity

      Summary:

      Hoskins and colleagues expressed a reporter containing all silent, missense, and nonsense codons at 58 amino acid positions in the human COMT gene in HEK293T cells and measured levels of DNA, bulk RNA, and pooled polysomal mRNA. They included a C-terminal translational GFP fusion and a downstream transcriptional mCherry fusion in the reporter in order to also bin variants by their relative protein and mRNA levels by flow cytometry. They hypothesized that RNA structure, in-part by mediating uORF translation, influences COMT gene expression. The authors conclude by identifying previously-uncharacterized COMT variants that, in this reporter system, affect RNA abundance and ribosome load. We generally found the results of this paper convincing and clear. We do not have major comments, but have many minor comments that we hope the authors can address. These comments mostly deal with clarification on analysis metrics and giving recommendations on data presentation.

      __Response: __Thanks for highlighting the strengths of our study and the constructive suggestions to improve the presentation.

      Minor comments:

      In Figure 2C, the vertical axis reads "Median between-group difference". How was this metric calculated and normalized? We also agree that nonsense mutations having consistently-detrimental effects on RNA abundance is reassuring, but recommend more explanation as to why the difference in the effects of silence and missense mutations between regions may be biologically relevant.

      __Response: __Variant effects in each figure derive from ALDEx2 analysis, which reports effect size as the median standardized difference between groups. In particular, to avoid any distributional assumptions for standardization, ALDEx2 uses a permutation based non-parametric estimate of dispersion. The effect size is not directly interpretable as a log fold change; it takes into account the difference between groups as well as the max dispersion of the groups. We have now provided explicit references to the specific R functions that were used to calculate the effect size.

      ALDEx2 is robust for analysis of count-compositional datasets, particularly after selection and bottlenecking (e.g. sorted cell populations or low-input RNA fractions arising from polysome profiling). While we have used other methods to analyze log frequencies (limma, a frequency based normalization-dependent analysis, as previously employed in Hoskins et al. 2023. Genome Biology), we opted for the less-interpretable but more robust ALDEx2 analysis to report variant effects between varying nucleic acid inputs.

      We currently lack a mechanistic interpretation for the difference in RNA abundance effects between ROI 1 and 2. However, we observed consistent results using a different analysis framework, which makes use of variant frequencies (as in Hoskins et al. 2023 Genome Biology) instead of the centered log ratios used in ALDEx2 analysis, further supporting a biological difference between the two.

      In Figure 3, we believe that the authors are claiming that lower RNA abundance causes lower protein abundance in some variants. However, this data only reports on protein abundance relative to transcript abundance, not absolute protein abundance. We think the claim should be revised to (1) clarify that the authors are measuring protein per mRNA, and (2) express that lower mRNA amounts are more likely to co-occur with lower protein amounts, but that this data does not support any causative model.

      __Response: __Thanks for the suggestion. We have now included an explicit description of the experimental design in the results section and noted that we are unable to assign protein abundance effects to underlying RNA abundance effects. In the current setup, we did not sort cells based on the ratio of moxGFP/mCherry fluorescence (protein per mRNA), but rather we defined gates based on the 2D plot of moxGFP versus mCherry. This is explicitly marked in Figure 3A.

      On page 9, the authors claim that their data supports a model that rs4633 increases RNA

      abundance, leading to higher COMT expression. Can the authors rule out a model whereby rs4633 facilitates translation initiation, as suggested by Tsao et al. 2011, leading to both an increase in mRNA and protein abundance?

      __Response: __Thanks for this question and opportunity to clarify. We have now added a sentence to the Discussion and included the following paragraph in the Supplementary Note:

      “Importantly, our study does not rule out a model where rs4633 facilitates translation initiation. Nevertheless, our data suggest a potential concurrent mechanism where rs4633 leads to higher protein abundance in human cell lines and in an in vitro translation assay (Tsao et al. 2011) by increasing RNA abundance. We note that Tsao et al did not directly measure RNA abundance in their study. In Supplementary Figure 3A of Nackley et al 2006, the APS haplotype containing rs4633 C>T showed slightly higher total RNA abundance compared to the LPS haplotype (in our study, the wild-type template). However, this was not statistically significant and was only observed for the S-COMT isoform. It is possible that our observations are compatible with the conclusions in Tsao et al. 2011. For example, increased translation of rs4633 C>T may lead to stabilization of the RNA.”

      The paper references "effect size" at multiple points (e.g. "polysome effect size") but we could not find this term explicitly defined (for example: for the polysome effect size, were RNA counts for each polysome fraction divided by the relative abundance of that RNA in total RNA?)

      __Response: __We apologize for this confusion. Please see our response above. We have also stated the definition of effect size explicitly in the revised manuscript.

      Could you elaborate on how you define "protein abundance and "effect size: in Figure 5G? How is enrichment in P3 or P1 calculated?

      __Response: __Effect size is defined as described above. Enrichment in P3 or P1 is calculated with respect to the abundance in gDNA (unsorted cells).

      Were 3396 variants considered for all readouts in this paper? How many of these variants were present in each ROI? It may be worth clarifying sample sizes.

      __Response: __Thanks for the suggestion. The reviewer is correct: 3396 variants were present in all biological replicates and all readouts (after excluding polysome metafractions 1 and 2 and flow cytometry population 4). The Methods were updated to include all readouts that were dropped. The number of variants in each ROI are now included in this section of the main text.

      How did Twist generate these mutagenized sequences? We assumed that they used error-prone PCR due to the mention of multiple nucleotide polymorphisms, but couldn't find an explicit answer.

      __Response: __Twist generates these mutagenized inserts using degenerate primers. This allows all alternate codons to be assayed (all silent, missense changes). This is now noted in the Methods.

      https://www.twistbioscience.com/resources/technical-note/solid-phase-dna-synthesis-allows-tight-control-combinatorial-library

      In the methods, it may be worth elaborating on the composition of the HsCD00617865 plasmid. For example: this COMT reporter is under the control of a constitutively-expressed T7 promoter, correct?

      __Response: __The HsCD00617865 plasmid was only used as a template for PCR amplification and generation of the transgene. The transgene is cloned into a vector containing attB sites for recombination into the landing pad cell line (Matreyek et al 2020). Transcription is induced by Doxycycline from the landing pad locus. Plasmid maps used for transfection into the landing pad line are now included in the GitHub repository.

      In Supplementary Figures 4 and 5, it would be helpful to explicitly say that you are reporting Pearson correlations between biological replicates.

      __Response: __Thanks for the suggestion. The legends have been updated accordingly.

      "After summarizing biological replicates (N=4) for each readout...": how did the authors summarize biological replicates? Were counts averaged?

      __Response: __Biological replicates were summarized using the median. This is now clarified in the Methods.

      The authors used pairwise correlations between flow cytometry fractions, polysome fractions, and total RNA/gDNA as indications of data quality. Do the authors expect for these counts to be strongly correlated? We would not necessarily expect to see a strong correlation between ribosome load and RNA/gDNA.

      __Response: __We used replicate correlation as an indicator of data quality. Our readouts of ribosome load reflect the abundance of a variant in a particular polysome fraction. Given that variants that are highly abundant in the RNA pool will on average be more highly represented in polysome fractions, we would expect a correlation between the abundance of a variant in total RNA and in polysome fractions.

      The authors may need to check that their standard deviations on fold changes are properly reported.

      __Response: __iIn the Figures and the main text, we specified the confidence intervals as calculated by ALDEx2 method instead of reporting standard deviations on fold changes,. Specifically, the confidence intervals were determined by Monte Carlo methods that produce a posterior probability distribution of the observed data given repeated sampling. Variants in which the confidence intervals do not cross 0 are considered true discoveries (section 5.4.1 of the ALDEx2 vignette on Bioconductor).

      https://www.bioconductor.org/packages/devel/bioc/vignettes/ALDEx2/inst/doc/ALDEx2_vignette.html#541_The_effect_confidence_interval

      We would expect standard deviation bounds to be symmetric for log fold changes, but not on unlogged fold changes - for example see page 8, for the sentence "our point estimate for nonsense variant effects on COMT RNA abundance was approximately a two-fold decrease relative to the gDNA frequency (fold change of 0.43 +/- 0.13; mean +/- standard deviation; Methods)."

      __Response: __Thanks for the suggestion. To avoid any confusion about the symmetry, we replaced the +/- notation, and explicitly noted the mean and standard deviation. To help the reader gain an intuition of the magnitude of variant effects, we conducted a frequency based normalization-dependent analysis using limma (as previously employed in Hoskins et al. 2023. Genome Biology). We now report a fold change (unlogged) for RNA abundance compared to gDNA abundance. The point estimate is the mean and s.d. across all nonsense variants.

      On page 10, the authors say that their data suggests that hydrophobicity in the early coding region of COMT may be important for COMT folding. If this is the case, would we expect to see this effect in flow cytometry data (which is affected by protein degradation) and not polysome profiling (which is unaffected by post-translational protein degradation)?

      __Response: __We apologize as we are uncertain about the reviewer’s intended question. The section that refers to the importance of hydrophobicity indeed refers to the flow cytometry data. While there are specific instances in which the amino acid properties encoded by the mRNA influences translation dynamics, these are not universally true. Consequently, we did not expect these impacts to be observed at the level of polysome profiling.

      We believe that we would have some trouble replicating the analysis from this paper from the raw data, given that the bulk of the analysis on GitHub is presented as a single R Markdown file, with references to local files to which we do not have access. We recommend that the authors add additional documentation to their repository to facilitate re-analysis.

      __Response: __Thanks for the opportunity to address this issue of critical importance. To facilitate replication, we have now deposited all analysis files to Zenodo and refactored the code to enable replication by simply running a markdown file.

      In Figure 1B, indicating that more signal indicates less structure (in the legend or the figure itself) may assist readers who are unfamiliar with DMS-seq.

      __Response: __Thanks for the suggestion. This is now updated.

      Figure 1C does a great job presenting evidence for the translation of uORFs, but does not seem to flow with the overall argument of the paper, so may fit better in the supplement.

      __Response: __We considered this suggestion, and opted for keeping its placement as it gives evidence that our transgene is translated primarily as the MB-COMT isoform. This ensures that, for variants upstream of the S-COMT isoform, we can assay effects on ribosome load that are tied to mechanisms of translation elongation and codon stability.

      We believe there is a typo in the Figure 1 legend that should read "K562" instead of "H562".

      __Response: __Thank you, this was indeed a typo.

      You also gated to separate into P1-P4, correct? Can you also show the bounds of that gating

      strategy in Figure 3A?

      __Response: __This has been updated. We also added the gating strategy in response to comments from reviewer #1.

      We find Figure 3F very compelling. Do you have any theories as to why mutating I59-H66 to

      nonpolar, uncharged residues leads to increased COMT expression?

      __Response: __We do not have any theories for why this may be. However, we noted that with the exception of V63, residues I59-H66 are not evolutionarily constrained (based on DeMask entropy values). This suggests mutational tolerance for nonpolar, uncharged residues in this region (with the exception of V63 and H66; see Figure 3D).

      There appears to be a non-negligible proportion of di- and tri- nucleotide polymorphisms in Supplementary Figure 4. Were these excluded in downstream analyses?

      __Response: __These variants are expected from the Twist mutagenesis strategy and included in analysis. We believe they are at lower frequency compared to SNPs due to less favorable annealing of the degenerate primers.

      A minor typo in the discussion reads "fluoresce".

      __Response: __Done

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This work investigated the regulatory effects of thousands of coding variants in the COMT gene, focusing on two regions with clinical significance, by using high-throughput reporter assays. The results from this will be useful for clinical scientists interested in understanding the impacts of COMT mutations and be a useful framework for other systems/computational biologists to understand the impacts of coding mutations across different levels of regulatory function. Mutations in protein regions, if having a function, are classically known to interfere with protein function. There are fewer large-scale efforts to understand the impacts of coding mutations affecting expression through potentially changing of RNA structure or codon optimization - this work has contributed towards that frontier.

      Place the work in the context of the existing literature (provide references, where appropriate). This is (as far as I am aware) the first paper that has integrated high-throughput screens massively parallel reporter assays from RNA degradation, ribosomal load, and flow cytometry. Previous papers have tended to measure on expression regulation on only one dimension (i.e. Greisemer et al. 2023 on RNA degradation, Sample et al. 2019 on ribosomal load, and de Boer at al. 2020 on protein expression).

      __Response: __Thanks for highlighting the novelty of our approach compared to existing strategies in the literature.

      State what audience might be interested in and influenced by the reported findings.

      Clinicians/researchers interested in COMT, computational biologists, geneticists and potentially structural biologists interested in understanding the consequences of amino acid mutations on RNA/protein expression

      __Response: __Thanks for noting the broad significance of our study.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Genomics, Massively parallel reporter assays, High-throughput regulatory screens.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *This manuscript reports on transcript sequence variants that affect expression of the gene COMT. Targeted analysis of SNPs identifies 5' UTR variants that affect COMT, leading to the identification of translated uORFs. Common coding sequence SNPs do not affect COMT expression, however. Massively parallel analyses of mRNA abundance, protein abundance, and translation are combined to look more broadly at coding sequence variants. These analyses focus on regions of predicted structure in the COMT transcript. Both silent and missense mutations that increase mRNA abundance are identified. Protein abundance is then measured and many missense mutations are found to change protein levels. To address translation directly, analysis of polysome loading is performed and significant differences are identified, although technical challenges limit data quality in these experiments. These different experiments are then analyzed jointly to classify mutation effects and identify a class of silent mutations with expression effects, leading to a proposal that these act through structure. *

      *The joint, integrative analysis of COMT variants through a range of methods allows clearer insights into interconnected post-transcriptional effects. The massively parallel experiments generate high-quality data, although targeted validation of key results would strengthen the work. The findings advance our understanding of silent variant effects, which remains an open question, and technical innovations could find broader applications. *

      __Response: __Thanks for the positive assessment of the quality of the data generated and the potential for the broader application of the technical innovations.

      *I do have concerns with the present version of this work. *

        • There is no validation presented for high-throughput experimental data. I would say that validating the effects of M152T and V63V variants from Figure 2B would substantially strengthen the work and support key conclusions. * __Response: __Our experiments collectively enabled nearly 10,000 measurements of variant effect (summed over three layers of gene expression). The goal of our study was to identify broad mechanisms of variant effect. While we are excited about the specific variants uncovered, targeted experimental methods for validating changes to RNA abundance, such as RT-qPCR, are unlikely to be sufficiently sensitive. For example, RNA abundance effects in our study had a median effect size of 1.47 for variants up in RNA, and 0.4 for variants down in RNA. This likely corresponds to less than one Ct difference between the variant and the reference allele. Indeed, previous studies such as Findlay et al., 2018 Nature that reported similar effect sizes (FGF7 and FOS, respectively (Figure 4B).

      Thus, for time and cost concerns, we respectfully suggest that targeted experiments involving V63V and M152T are beyond the scope of our study. Nevertheless, to further strengthen our conclusions, we have computationally confirmed our findings using a different analysis framework. We found 75/76 of the variants significant by ALDEx2 analysis were also significant by limma analysis (a frequency based normalization-dependent analysis, as previously employed in Hoskins et al. 2023. Genome Biology) using the same FDR (0.1).

      • In the fluorescent reporter scheme, it seems that variants reducing mRNA abundance should be enriched in the "P2" gate region relative to "P1", as they would have lower mRNA abundance and correspondingly lower protein abundance. However, this analysis is not performed, and instead P1 and P3 are compared (Figure 3G), which would seem to focus on protein-level effects. *

      __Response: __Our initial hesitation in comparing P2 to P1 is that the P2 population may be enriched for cells that underwent inefficient induction of transcription with Doxycycline. Hence technical factors as opposed to the effect of the variants may dominate this comparison. In response to the reviewer’s comments, we carried out the suggested analysis (new Supplementary Figure 5B). We found that variants that are down in RNA are enriched in P2 relative to P1 as expected. This is now noted in the Results section.

      • In general the work classifies variants in several different ways and it would help to be a little clearer in naming these classes. For instance, in describing the FACS-based analysis of variant expression it is written, "protein fluorescence conditioned on RNA fluorescence" which is confusing at best-it's a fluorescence-based measurement that is used indirectly to measure COMT reporter abundance. *

      __Response: __Thanks for the suggestion. We agree that our initial word-choice was imprecise. We rewrote this section to indicate mCherry fluorescence is an indirect proxy for RNA abundance.

      • Likewise, the populations with shifted GFP/mCherry ratio in this assay are described as "uncorrelated" populations, which is opaque and somewhat inaccurate-there seems to be a correlation in this group but at a different ratio. *

      __Response: __We have revised the language in the manuscript. We opted for “low or high RNA/protein abundance” to indicate the relationship between GFP and mCherry fluorescence in populations P3 and P4.

      • In the same way, "deleterious variants" is used to describe protein abundance changes, but this term implies a fitness effect and is not very specific. *

      __Response: __We apologize for the confusing word choice. We did away with this term in favor of “variants with low protein abundance”.

      • In discussing the effects of missense COMT variants on protein levels, there is an implicit assumption that degradation of mis-folded protein (or perhaps properly-folded protein with excess hydrophobic exposure?) explains these effects. This is plausible, but it would help to lay out this reasoning more clearly. *

      __Response: __Thanks for the suggestion. We have added a sentence at the end of the section that specifies this assumption and cites a recent study reporting that rare missense variants in COMT may be misfolded and degraded by the proteasome (Larsen et al. 2023).

      • It is written that,"In line with codon stability as a predictor of translational efficiency (Presnyak et al., 2015), variants with low codon optimality were depleted from polysomes compared to variants with optimal codons". However, this mis-states the conclusions of the cited study, which notes, "Importantly, under normal conditions the ribosome occupancy of the HIS3 opt and non-opt constructs was determined to be similar (Fig. 6B)". *

      __Response: __We apologize for mis-stating the conclusions of Presnyak et al. 2015. We have now revisited the relevant literature to more accurately place our conclusions in the context of literature. While Presnyak et al. and several other studies (Bazzini et al., 2016; Mauger et al., 2019) have clearly linked the association between codon choice and mRNA stability. We now reference Mauger et al. 2019 who used elegant experiments to demonstrate that mRNA secondary structure is a driver of increased protein production and synergizes with codon optimality (Figure 5B). Their results further support the role of codon optimality on RNA stability while providing evidence of additive impact on translation efficiency.

      • It is written that, "One intriguing possibility is to develop multiplexed assays of variant effect on RNA folding, using mutational profiling RNA probing methods (Weng et al., 2020; Zubradt et al., 2017)." How would this differ from the "Mutate and Map" approach in doi://10.1038/nchem.1176 and subsequent work from the same group? *

      __Response: __Thanks for pointing out the more recent work following the initial papers in 2010-2011. We have missed the work from the Das lab that extended the Mutate and Map approach to utilize mutational profiling (Cheng and Kladwang et al., 2017). We updated our Discussion to indicate that the proposed assay has been pioneered and is a viable approach for high-throughput determination of variant effects on RNA folding.

      Because mutational profiling methods leverage reverse transcriptase readthrough and mismatch incorporation, they enable deeper and more uniform coverage of sequencing reads, particularly for longer transcripts. A key design principle of the proposed assay is to mutagenize only certain types of variants in the library such that they do not overlap RT mismatch signatures arising from the RNA probing reagent/RT enzyme. For example, readthrough of DMS base adducts largely generates A>N or C>N mismatches, so a variant library would be designed to only contain variants at G or T bases. This ensures variants in the library can be differentiated from signals of the RNA probing method.

      ***Referees cross-commenting** *

      *I generally agree with the other reviewers and found that many small points on the figures were confusing, and in some cases the values being computed and displayed were under-specified. *

      *I agree with Reviewer 1 that the polysome fractionation probably has limited power due to experimental design, and that the interpretation of changed ribosome loading is subtle. *

      __Response: __In response to these helpful comments, we have clarified the points highlighted by the reviewers and expanded the limitations section related to the ribosome loading assay. Thanks for these constructive suggestions to strengthen our study.

      *Reviewer #3 (Significance (Required)): *

      *The joint, integrative analysis of COMT variants through a range of methods allows clearer insights into interconnected post-transcriptional effects. The massively parallel experiments generate high-quality data, although targeted validation of key results would strengthen the work. The findings advance our understanding of silent variant effects, which remains an open question, and technical innovations could find broader applications. *

      __Response: __Thanks for pointing out the high-quality of the generated data and the broad significance of our study. The goal of our study was to identify broad mechanisms of variant effect instead of focusing on differential expression for any specific variants.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Hoskins and colleagues expressed a reporter containing all silent, missense, and nonsense codons at 58 amino acid positions in the human COMT gene in HEK293T cells and measured levels of DNA, bulk RNA, and pooled polysomal mRNA. They included a C-terminal translational GFP fusion and a downstream transcriptional mCherry fusion in the reporter in order to also bin variants by their relative protein and mRNA levels by flow cytometry. They hypothesized that RNA structure, in-part by mediating uORF translation, influences COMT gene expression. The authors conclude by identifying previously-uncharacterized COMT variants that, in this reporter system, affect RNA abundance and ribosome load.

      We generally found the results of this paper convincing and clear. We do not have major comments, but have many minor comments that we hope the authors can address. These comments mostly deal with clarification on analysis metrics and giving recommendations on data presentation.

      Minor comments:

      In Figure 2C, the vertical axis reads "Median between-group difference". How was this metric calculated and normalized? We also agree that nonsense mutations having consistently-detrimental effects on RNA abundance is reassuring, but recommend more explanation as to why the difference in the effects of silence and missense mutations between regions may be biologically relevant.

      In Figure 3, we believe that the authors are claiming that lower RNA abundance causes lower protein abundance in some variants. However, this data only reports on protein abundance relative to transcript abundance, not absolute protein abundance. We think the claim should be revised to (1) clarify that the authors are measuring protein per mRNA, and (2) express that lower mRNA amounts are more likely to co-occur with lower protein amounts, but that this data does not support any causative model.

      On page 9, the authors claim that their data supports a model that rs4633 increases RNA abundance, leading to higher COMT expression. Can the authors rule out a model whereby rs4633 facilitates translation initiation, as suggested by Tsao et al. 2011, leading to both an increase in mRNA and protein abundance?

      The paper references "effect size" at multiple points (e.g. "polysome effect size") but we could not find this term explicitly defined (for example: for the polysome effect size, were RNA counts for each polysome fraction divided by the relative abundance of that RNA in total RNA?)

      Could you elaborate on how you define "protein abundance and "effect size: in Figure 5G? How is enrichment in P3 or P1 calculated?

      Were 3396 variants considered for all readouts in this paper? How many of these variants were present in each ROI? It may be worth clarifying sample sizes.

      How did Twist generate these mutagenized sequences? We assumed that they used error-prone PCR due to the mention of multiple nucleotide polymorphisms, but couldn't find an explicit answer.

      In the methods, it may be worth elaborating on the composition of the HsCD00617865 plasmid. For example: this COMT reporter is under the control of a constitutively-expressed T7 promoter, correct?

      In Supplementary Figures 4 and 5, it would be helpful to explicitly say that you are reporting Pearson correlations between biological replicates.

      "After summarizing biological replicates (N=4) for each readout...": how did the authors summarize biological replicates? Were counts averaged?

      The authors used pairwise correlations between flow cytometry fractions, polysome fractions, and total RNA/gDNA as indications of data quality. Do the authors expect for these counts to be strongly correlated? We would not necessarily expect to see a strong correlation between ribosome load and RNA/gDNA.

      The authors may need to check that their standard deviations on fold changes are properly reported. We would expect standard deviation bounds to be symmetric for log fold changes, but not on unlogged fold changes - for example see page 8, for the sentence "our point estimate for nonsense variant effects on COMT RNA abundance was approximately a two-fold decrease relative to the gDNA frequency (fold change of 0.43 +/- 0.13; mean +/- standard deviation; Methods)."

      On page 10, the authors say that their data suggests that hydrophobicity in the early coding region of COMT may be important for COMT folding. If this is the case, would we expect to see this effect in flow cytometry data (which is affected by protein degradation) and not polysome profiling (which is unaffected by post-translational protein degradation)?

      We believe that we would have some trouble replicating the analysis from this paper from the raw data, given that the bulk of the analysis on GitHub is presented as a single R Markdown file, with references to local files to which we do not have access. We recommend that the authors add additional documentation to their repository to facilitate re-analysis.

      In Figure 1B, indicating that more signal indicates less structure (in the legend or the figure itself) may assist readers who are unfamiliar with DMS-seq.

      Figure 1C does a great job presenting evidence for the translation of uORFs, but does not seem to flow with the overall argument of the paper, so may fit better in the supplement.

      We believe there is a typo in the Figure 1 legend that should read "K562" instead of "H562".

      You also gated to separate into P1-P4, correct? Can you also show the bounds of that gating strategy in Figure 3A?

      We find Figure 3F very compelling. Do you have any theories as to why mutating I59-H66 to nonpolar, uncharged residues leads to increased COMT expression? There appears to be a non-negligible proportion of di- and tri- nucleotide polymorphisms in Supplementary Figure 4. Were these excluded in downstream analyses?

      A minor typo in the discussion reads "fluoresce".

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This work investigated the regulatory effects of thousands of coding variants in the COMT gene, focusing on two regions with clinical significance, by using high-throughput reporter assays. The results from this will be useful for clinical scientists interested in understanding the impacts of COMT mutations and be a useful framework for other systems/computational biologists to understand the impacts of coding mutations across different levels of regulatory function. Mutations in protein regions, if having a function, are classically known to interfere with protein function. There are fewer large-scale efforts to understand the impacts of coding mutations affecting expression through potentially changing of RNA structure or codon optimization - this work has contributed towards that frontier.

      Place the work in the context of the existing literature (provide references, where appropriate).

      This is (as far as I am aware) the first paper that has integrated high-throughput screens massively parallel reporter assays from RNA degradation, ribosomal load, and flow cytometry. Previous papers have tended to measure on expression regulation on only one dimension (i.e. Greisemer et al. 2023 on RNA degradation, Sample et al. 2019 on ribosomal load, and de Boer at al. 2020 on protein expression).

      State what audience might be interested in and influenced by the reported findings.

      Clinicians/researchers interested in COMT, computational biologists, geneticists and potentially structural biologists interested in understanding the consequences of amino acid mutations on RNA/protein expression

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Genomics, Massively parallel reporter assays, High-throughput regulatory screens.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Through Review Commons, we received some highly favorable and constructive feedback from reviewers who are clearly knowledgeable about phylogenomics and/or the field of bacterial anti-phage immunity. We have responded to all suggestions made by the reviewers, which we feel have substantially improved and clarified the manuscript. We thank all three reviewers for their thoughtfulness and time.

      Reviewer #1

      Evidence, reproducibility and clarity

      Culbertson and Levin present an elegant computational analysis of the evolutionary history of several families of immune proteins conserved in bacteria and metazoan cells. The authors' work is impressive, revealing interesting insight into previously known connections and identifying exciting new connections that further link bacterial anti-phage defense and animal innate immunity. The results are overall well-presented and will have an important impact on multiple related fields. I have a few comments for the authors to help explain some of the new connections observed in their findings and clarify the results for a general audience.

      We thank the reviewer for their kind appraisal of our manuscript as well as their helpful comments. We found their comments to be very useful in strengthening our work and increasing the clarity of the writing.

      Comments: 1) The authors adeptly navigate difficult and changing nomenclature around cGAS-STING signaling but there may be room for clarifying terminology. Although historically the term "CD-NTase" has been used to describe both bacterial and animal enzymes (including by this reviewer's older work as well), the field has now settled on consistent use of the name "CD-NTase" to describe bacterial cGAS/DncV-like enzymes and the use of the names "cGAS" and "cGLR" to describe animal cGAS-like receptor proteins. Nearly all papers describing bacterial signaling use the term CD-NTase, and since 2021 most papers describing divergent cGAS-like enzymes in animal signaling now use the term "cGLR" (for recent examples see primary papers Holleufer et al 2021 PMID 34261128; Slavik et al 2021 PMID 34261127; Li et al 2023 PMID 37379839; Cai et al 2023 PMID 37659413 and review articles Cai et al 2022 PMID 35149240; Slavik et al 2023 PMID 37380187; Fan et al 2021 PMID 34697297; West et al 2021 PMID 34373639 Unterholzner Cell 2023 PMID 37478819). Kingdom-specific uses of CD-NTase and cGLR may help add clarity to the manuscript especially as each group of enzyme is quite divergent and many protein members synthesize signaling molecules that are distinct from cyclic GMP-AMP (i.e. not cGAS).

      Related to this point, the term "SMODS" is useful for describing the protein family domain originally identified in the elegant work of Burroughs and Aaravind (Burroughs et al 2015 PMID 26590262), but this term is rarely used in papers focused on the biology of these systems. "eSMODS" is a good name, but the authors may want to consider a different description to better fit with current terminology.

      We appreciate the reviewer’s suggestion and have updated the text to try to be more clear (ex: using cGLR as a more specific term whenever possible). However, as OAS is distinctly not a cGLR, strict kingdom-specific use of the terms CD-NTase and cGLR is not possible. We have updated the Mab21 superfamily to be re-named as the cGLR superfamily, as those seem to be synonymous based on recent literature. At this time we are choosing to stick with the eSMODS terminology as it remains to be shown that these eukaryotic proteins have a CD-NTase-like biochemical function.

      An example of how we have tried to navigate this naming issues is:

      “The cGLR superfamily passed all four of these HGT thresholds, as did another eukaryotic clade of CD-NTases that were all previously undescribed. We name this clade the eukaryotic SMODS (eSMODS) superfamily, because the top scoring domain from hmmscan for each sequence in this superfamily was the SMODS domain (PF18144), which is typically found only in bacterial CD-NTases (Supplementary Data).”

      2) The authors state that proteins were identified using an iterative HMM-based search until they "began finding proteins outside of the family of interest" (Line 86). Is it possible to please explain in more detail what this means? A key part of the analysis pipeline is knowing when to stop, especially as some proteins like CD-NTases and cGLRs share related-homology to other major enzyme groups like pol-beta NTases while other proteins like STING and viperin are more unique.

      We have updated the text to better explain how we determined that a given protein sequence was excluded:

      “After using this approach to create pan-eukaryotic HMMs for each protein family, we then added in bacterial homologs to generate universal HMMs (Fig. 1A and Supp. Fig. 1), continuing our iterative searches until we either failed to find any new protein sequences or began finding proteins outside of the family of interest (Supp. Fig. 1). To define the boundaries that separated our proteins of interest from neighboring gene families, we focused on including homologs that shared protein domains that defined that family (see Materials and Methods for domain designations) and were closer to in-group sequences than the outgroup sequences on a phylogenetic tree (outgroup sequences are noted in the Materials and Methods). “

      We also added a section to the Methods specifically defining our outgroups:

      “As outgroup sequences, we used Poly(A) RNA polymerase (PAP) sequences for the CD-NTases, and molybdenum cofactor biosynthetic enzyme (MoaA) for viperin. We did not have a suitable outgroup for STING domains, nor did any diverged outgroups come up in our searches.”

      3) The authors comment on several controls to guard against potential contaminating bacterial sequences present in metazoan genome sequencing datasets (Lines 174-182). It may be helpful to include this very important part of the analysis as part of the stepwise schematic in Figure 1a. Additionally, have the authors used other eukaryotic features like the presence of introns or kingdom specific translation elements (e.g. Shine-Dalgarno- vs. Kozak-like sequences) as part of the analysis?

      We agree that it will be very interesting to look for these eukaryotic gene features, both to rule out contamination and to discern how eukaryotes have acquired and domesticated bacteria-like immune proteins. However, one limitation when working with the data in EukProt is that many species are represented by de novo transcriptome datasets and therefore information about the local gene environment, introns, or promoters are unavailable.

      4) A particularly surprising result of the analysis is a proposed connection between oligoadenylate synthase-like (OAS-like) enzymes and bacterial Clade C CD-NTases. A concern with these results is that previous structural analysis has demonstrated that bacterial CD-NTase enzymes and animal cGLRs are more closely related to each other than they are to OAS (Slavik et al 2021 PMID 34261127). Can the authors provide further support for a connection between OAS and Clade C CD-NTases? The C-terminal alpha-helix bundle of OAS is known to be distinct (Lohöfener et al 2015 PMID 25892109) and perhaps AlphaFold2 modeling of bacterial Clade C CD-NTases and additional OAS sequences may provide further bioinformatic evidence to support the authors' conclusions.

      We were also surprised by this finding as it seems to be in opposition to structural comparisons in studies such as Whiteley et. al 2019 (PMID 30787435). As the reviewer suggests,e used AlphaFold to predict the structures of two CD-NTases, that of Bacterioides uniformis (Clade C016) and Escherichia coli (Clade C018) as well as a previously uncharacterized OAS-like protein (Tripos fusus P058904) and compared those structural predictions to those of cGAS (PDB: 6CTA), OAS1 (PDB: 4RWO), and DncV (PDB: 4TY0). We used the DALI server to make these all vs all comparisons.

           We have not included these analyses in the manuscript as the results were largely inconclusive. The average pairwise z-score between any of these structures was around 20, with a narrow range of scores between 16 (e.g. OAS vs. DncV) and 22 (e.g. DncV vs. the Clade C CD-NTases). For reference, the z-score of a given protein compared to itself was ~50 and a z-score of 20 is a general DALI benchmark used to determine if structures are homologous ( z-scores between 8-20 are in a gray area, and 20+ are generally considered homologous).
      

      In our view, these pairwise structural comparisons suffer from essentially the same problem that is evident in phylogenetic trees containing only animal and bacterial homologs. Namely, all structures/sequences under consideration are extremely different from each other, on very long branches that are difficult to place with confidence when few homologs are being considered. The benefit of our approach is that we have the ideal species diversity to break up the long branches (particularly with respect to the OAS superfamily), allowing us to place those sequences confidently on the phylogeny.

      That said, while we have strong support for the topology of OAS within the CD-NTase tree, the interpretation of the relationships relies partly on the inferred root of the tree. In our analyses, we opted not to include a distant outgroup such as pol-beta for rooting purposes, as these sequences aligned poorly with the CD-NTases, resulting in a substantial decrease in alignment and tree quality. Instead, in Fig. 2 we present a tree that is arbitrarily rooted within the bacterial CD-NTases, as this root allows for clade C to be phylogenetically coherent. Our data are also consistent with an alternative rooting, placing OAS as an outgroup. If so, this would yield a tree that implies that OAS-like sequences could have given rise to all other CD-NTases and that, within the non-OAS sequences, all bacterial CD-NTases emerged from within Clade C. We thought it slightly more likely that the root of CD-NTases was solidly within bacteria, hence the display we chose. However, we were not intending to rule out an OAS-outgroup model here. As this response to reviewers will be publically available alongside the final manuscript, we hope this clarifies our claims about the placement of OAS.

      5) One of the most exciting results in the paper is identification of a family of putative CD-NTase enzymes conserved in metazoans. Although full description may be beyond the scope of this paper, if possible, some more analysis would be interesting here: a. Are these CD-NTase enzymes in a conserved gene neighborhood within the metazoan genomes (i.e. located next to a potential cyclic nucleotide receptor?) b. Do these metazoan genomes encode other known receptors for cyclic nucleotide signaling (PFAM searches for CARF or SAVED domains for instance). c. Similar to points 3 and 4, is it possible to add further evidence for support of these proteins as true metazoan sequences that have predicted structural homology to bacterial CD-NTase enzymes?

      Yes agreed, we think point a is an exciting avenue of questioning to pursue. However, as mentioned above, the Eukprot dataset often does not provide the relevant information for the analyses proposed. Therefore, we feel that answering questions about the genomic region of these proteins is beyond the scope of the current manuscript. In particular, all 6 of the eSMODS species are represented only by transcriptomes, making these analyses impossible.

      For point b, we searched EukProt with HMMs for SAVED domains (PF18145), finding 24 total SAVED-containing proteins in EukProt. (We did not find a CARF HMM in Pfam, Tigrfam or other databases, and so could not easily carry out these searches.) Five of the 24 SAVED-containing sequences came from species encoding an eSMODS gene. This represented 3 species out of the total 20 species where we detected a SAVED domain. While this is a potentially intriguing overlap, we cannot make a strong claim about whether these SAVED sequences derive from eukaryotes vs. bacterial contamination without undergoing the extensive searching and phylogenetic tree construction methods for SAVED domains that we have performed for our three families of interest. We expect this will be an interesting line of inquiry for a future study.

      For point c, we agree that additional evidence to support the finding that the eSMODS are eukaryotic rather than bacterial sequences would be helpful. To us, the strongest pieces of evidence would be: 1) presence of eukaryotic gene architecture, 2) adjacency to clearly eukaryotic genes in the contig, and/or 3) fluorescence in situ hybridization experiments in these species to localize where the genes are encoded. Unfortunately, the transcriptome data available does not provide this level of information. We hope that other groups will follow up on these genes and species to decide the matter more definitively. In the meantime, we feel that our filters for HGT vs. contamination have done as much as possible with the existing dataset. We have modified the text in this region to leave open potential scenarios that could be fooling us, such as the presence of unusual, long-term, eukaryote-associated symbionts in the taxa where we detect eSMODS:

      “For species represented only by transcriptomes, these criteria may still have difficulty distinguishing eukaryote-bacteria HGT from certain specific scenarios such as the long-term presence of dedicated, eukaryote-associated, bacterial symbionts. However, because these criteria allow us to focus on relatively old HGT events, they give us higher confidence these events are likely to be real. ”

      6) The authors state that obvious CD-NTase/cGLR enzymes are not present in organisms that encode the group of divergent eukaryotic "blSTINGs". Have the authors analyzed the protein-coding genes encoded immediately upstream and downstream of the blSTING proteins with AlphaFold2 and FoldSeek? It would be very exciting if putative cyclic nucleotide generating enzymes are predicted to be encoded within the nearby gene neighborhood.

      Similar to the eSMODS, the majority of the species with blSTINGs were represented by transcriptomes (22/26). We do agree that this type of analysis would be very interesting. However, we feel that this is beyond the scope of this manuscript.

      7) Line 144 appears to reference the incorrect supplementary figure. SI Figure 4 may be the correct reference?

      We agree and have made this change. We thank the reviewer for catching this error.

      I hope the authors will find my comments useful, thank you for the opportunity to read this exciting manuscript.

      Significance

      Culbertson and Levin present an elegant computational analysis of the evolutionary history of several families of immune proteins conserved in bacteria and metazoan cells. The authors' work is impressive, revealing interesting insight into previously known connections and identifying exciting new connections that further link bacterial anti-phage defense and animal innate immunity. The results are overall well-presented and will have an important impact on multiple related fields. I have a few comments for the authors to help explain some of the new connections observed in their findings and clarify the results for a general audience.

      Reviewer #2

      Evidence, reproducibility and clarity

      Describe your expertise? Molecular Evolution, Mechanisms of Protein evolution, Phylogenomics, Adaptation.

      Summary: This manuscript broadly aims to improve our understanding the evolutionary relationships between eukaryote and bacterial protein families where members of those families have immune roles. The study focuses on three such families and samples deeply across the eukaryotic tree. The approaches taken include a nice application of the EukProt database and the use of homology detection approaches that are sensitive to the issues of assigning homology through deep time. The main findings show the heterogeneity in means by which these families have arisen, with some of the families originating at least as far back as the LCA of eukaryotes, in contrast the wide spread yet patchy distribution of other families is the result of repeated independent HGT events and/or convergent domain shuffling.

      We thank the reviewer for this excellent review and their helpful comments and suggestions. We firmly believe that these comments will strengthen and clarify our work.

      Major Comments: 1. Overall the level of detail provided throughout the manuscript is lacking, perhaps the authors were constrained by a word limit for initial submission, if so then this limit needs to be extended to include the detail necessary. In addition, there are some structural issues throughout, e.g. some of the very brief intro (see later comment) reads a little more like methods (paragraph 2) and abstract (paragraph 3). The results section is lacking detail of the supporting evidence from the clever analyses that were clearly performed and the statistics underpinning conclusions are not included.

      Good suggestion, we have updated the paper to include more details and statistics on the analyses that were performed. We have also expanded on some of the most interesting findings about these bacterial innate immune proteins in the introduction (see Comment 2 below for our changes), as well as shifting the methods-like paragraph mentioned (paragraph 2) to later on in the paper. For paragraph 3, we have slimmed this down to include fewer details, but leave the final paragraph of the Introduction as a brief synopsis to prime the reader for the rest of the paper.

      1. The intro and discussion both include statements about some recent discoveries that bacteria and mammals share mechanisms of innate immunity - but there is no further detail into what would appear to be important work leading to this study. This context needs to be provided in more detail therefore I would encourage the authors to expand on the intro to include specific detail on these significant prior studies. In addition, more background information on the gene families investigated in detail here would be useful e.g. how the proteins produced influence immunity etc should be a feature of the intro. A clear and concise rationale for why these 3 particular gene families (out of all the possible innate immune genes known) were selected for analysis.

      We have added in additional background about some of the most exciting discoveries made in the past few years. We also included specific rationale as to why we chose to look at cGAS, STING, and Viperin.

      Specifically, we have added the following to the introduction:

      “ For example, bacterial cGAS-DncV-like nucleotidyltransferases (CD-NTases), which generate cyclic nucleotide messengers (similar to cGAS), are massively diverse with over 6,000 CD-NTase proteins discovered to date. Beyond the cyclic GMP-AMP signals produced by animal cGAS proteins, bacterial CD-NTases are capable of producing a wide array of nucleotide signals including cyclic dinucleotides, cyclic trinucleotides, and linear oligonucleotides [11,14]. Many of these bacterial CD-NTase products are critical for bacterial defense against viral infection[8]. Interestingly, these discoveries with the CD-NTases mirror what has been discovered with bacterial viperins. In mammals, viperin proteins restrict viral replication by generating 3’-deoxy-3’,4’didehdro- (ddh) nucleotides[4,15–17] block RNA synthesis and thereby inhibit viral replication[15,18]. Mammalian viperin generates ddhCTP molecules while bacterial viperins can generate ddhCTP, ddhUTP, and ddhGTP. In some cases, a single bacterial protein is capable of synthesizing two or three of these ddh derivatives[4]. These discoveries have been surprising and exciting, as they imply that some cellular defenses have deep commonalities spanning across the entire Tree of Life, with additional new mechanisms of immunity waiting to be discovered within diverse microbial lineages. But despite significant homology, these bacterial and animal immune proteins are often distinct in their molecular functions and operate within dramatically different signaling pathways (reviewed here[5]). How, then, have animals and other eukaryotes acquired these immune proteins?”

      In regards to why we choose to investigate CD-NTases, STING, and Viperin specifically, we have added the following to the third paragraph of the introduction:

      “We choose to focus on the cGAS, STING, and Viperin for a number of reasons. First, in metazoans cGAS and STING are part of the same signaling pathway whereas bacterial CD-NTases often act independently of bacterial STINGs[21], raising interesting questions about how eukaryotic immune proteins have gained their signaling partners. Also, given the vast breadth of bacterial CD-NTase diversity, we were curious as to if any eukaryotes had acquired CD-NTases distinct from cGAS. For similar reasons, we investigated Viperin, which also has a wide diversity in bacteria but a much more narrow described function in eukaryotes.”

      1. Context: Genome quality is always a concern, and confirming the absence of an element/protein in a genome is challenging given the variation in quality of available genomes. Low BUSCO scores mean that the assessment of gene loss is difficult to evaluate (but we are not provided with said scores). Query: in the results section it states that the BUSCO completeness scores (which need to be provided) etc were insufficient to explain the pattern of gene loss. I would like to know how they reached this conclusion - what statistical analyses (ANOVA?? OTHER??) have been performed to support this statement and please include the associated P values etc. Similarly, throughout the paper, including in the discussion section, the point is brushed over. If, given a statistical test, you find that some of the disparity in gene presence is explained by BUSCO score, most of your findings are still valid. It would just be difficult to make conclusions about gene loss.

      We have rewritten this section to be more clear about what we feel we can and cannot say about gene loss and BUSCO scores. This section now reads:

      “However, outside of Metazoa, these homologs were sparsely distributed, such that for most species in our dataset (711/993), we did not recover proteins from any of the three immune families examined (white space, lack of colored bars, Fig. 1B). While some of these absences may be due to technical errors or dataset incompleteness (Supp. Fig. 2), we interpret this pattern as a reflection of ongoing, repeated gene losses across eukaryotes, as has been found for other innate immune proteins[27–29] and other types of gene families surveyed across eukaryotes[28,30–32]. Indeed, many of the species that lacked any of the immune homologs were represented by high-quality datasets (Ex: Metazoa, Chlorplastida, and Fungi). Thus, although it is always possible that our approach has missed some homologs, we believe the resulting data represents a fair assessment of the diversity across eukaryotes, at least for those species currently included within EukProt.”

      In addition, we direct readers to EukProt v3, where the BUSCO scores are publicly available.

      “BUSCO scores can also be viewed on EukProt v3 (https://evocellbio.com/SAGdb/images/EukProtv3.busco.output.txt).”

      1. In terms of the homolog search strategy - line 394 - can you please state what an "outgroup gene family" means in this context. It is unclear but very important to the downstream interpretation of results.

      We have updated the materials and methods to specifically name our outgroups:

      “As outgroup sequences, we used Poly(A) RNA polymerase (PAP) sequences for the CD-NTases, and molybdenum cofactor biosynthetic enzyme (MoaA) for viperin. We did not have a suitable outgroup for STING domains, nor did any diverged outgroups come up in our searches.”

      1. For reproducibility, the materials and methods section needs to provide more detail/sufficient detail to reproduce these results. E.g the section describing phase 1 of the euk searches the text here repeats what is in the results section for the crystal structure work but doesn't give me any information on how, what method was used to "align the crystal structures", what scoring scheme is used and how the scoring scheme identifies "the core"? What specific parameters are used throughout. Why is MAFFT the method of choice for some of the analyses? Whereas, in other cases both MAFFT and MUSCLE are employed. What are the specific settings used for the MAFFT alignments throughout - is it default (must state if that is the case) or is it MAFFT L-INS-I with default settings etc.

      We have updated the text to include the specific settings used each time a particular software package was deployed. We also have included information for STING as to how we aligned 3 published crystal structures to determine the boundaries of homology.

      Here is how we now discuss identifying the “core” STING domain:

      “ For STING, where the Pfam profile includes regions of the protein outside of the STING domain, we generated a new HMM for the initial search. First, we aligned crystal structures of HsSTING (6NT5), Flavobacteriaceae sp. STING (6WT4) and Crassostrea gigas STING (6WT7) with the RCSB PDB “Pairwise Structure Alignment” tool with a jFATCAT (rigid) option[73,74]. We defined a core “STING” domain, as the ungapped region of 6NT5 that aligned with 6WT7 and 6WT4 (residues G152-V329 of 6NT5).Then we aligned 15 eukaryotic sequences from PF15009 (all 15 of the “Reviewed” sequences on InterPro) with MAFFT(v7.4.71)[75] with default parameters and manually trimmed the sequences down to the boundaries defined by our crystal alignment (residues 145-353 of 6NT5). We then trimmed the alignment with TrimAI (v1.2)[76] with options -gt 0.2. The trimmed MSA was then used to generate an HMM profile with hmmbuild from the hmmer (v3.2.1) package (hmmer.org) using default settings. “

      We employed three alignment softwares at specific times throughout our analyses. MAFFT was used as our default aligner for most of the analysis. Hmmalign (part of the hmmer package) was used to make the alignments prior to hmmbuild. The overall goal of this work was to reconstruct the evolutionary history of these proteins via a phylogenetic tree. To ensure that this tree topology was as robust as possible we employed the more computationally intensive, but more accurate, tree builder MUSCLE. We have updated the text in the methods section to be more clear as to why we used each software.

      We have updated the methods section to read:

      “MUSCLE was deployed in parallel with MAFFT to generate these final alignments to ensure that the final tree topology would be as robust as possible. MUSCLE is a slightly more accurate but more computationally intensive alignment software[79].”

      1. The justification for the number of HMM searches needs to be included. The choice of starting points for the HMMs was cryptic - please provide details. It is likely that you ran the search until no more sequences were found or until sequences were added from a different gene family, and that these happened to be between 3 and 5 searches, but it reads like you wanted to run it 3 or 5 times and that corresponds to the above condition. Something like this would be clearer: "The profile was [...] until no more sequences were found or until sequences from other gene families were found which was between 3 and 5 times in all cases" - the same is true of figure 1.

      We agree that this could have been worded better. We have updated the text to make it more clear that we searched until saturation which happened to occur between 3-5 searches and not that we arbitrarily wanted to do 3-5 searches.

      We have updated the text, which now reads:

      “After using this approach to create pan-eukaryotic HMMs for each protein family, we then added in bacterial homologs to generate universal HMMs (Fig. 1A and Supp. Fig. 1), continuing our iterative searches until we either failed to find any new protein sequences or began finding proteins outside of the family of interest (Supp. Fig. 1). To define the boundaries that separated our proteins of interest from neighboring gene families, we focused on including homologs that shared protein domains that defined that family (see Materials and Methods for domain designations) and were closer to in-group sequences than the outgroup sequences on a phylogenetic tree (outgroup sequences are noted in the Materials and Methods). “

      We also updated the figure legend to Fig. 1. It now reads:

      “Each set of searches was repeated until few or no additional eukaryotic sequences were recovered which was between 3-5 times in all cases.”

      1. Why do you limit hits to 10 per species - might this lead to misleading findings about gene family diversity? Info and justification for approach is required (411-412).

      We limited the hits to 10 per species to limit the influence of any one species on our alignments and subsequent phylogenetic trees. This 10-per-species cap was never reached with any search for STING or Viperin, but was used to throttle the number of Metazoan hits when searching for CD-NTases. Because of this, we probably have missed some amount of the diversity of Metazoan Mab21-like/OAS-like sequences, although this was not a focus of our manuscript. We have updated the text to be more clear about why we have included this limit and when the limit was invoked.

      We have update the text, which now reads:

      “HMM profiles were used to search EukProt via hmmsearch (also from hmmer v3.2.1) with a statistical cutoff value of 1e-3 and -hit parameter set to 10 (i.e. the contribution of a single species to the output list is capped at 10 sequences). It was necessary to cap the output list, as EukProt v3 includes de novo transcriptome assemblies with multiple splice isoforms of the same gene and we wanted to limit the overall influence a single species had on the overall tree. We never reached the 10 species cap for any search for STING or viperin homologs; only for the CD-NTases within Metazoa did this search cap limit hits.”

      1. The information in Supplementary Figure 3 is quite difficult to assess visually, but I think that is what is expected from that figure. However, this is an important underpinning element of the work and should really be quantitatively assessed. A metric of comparison of trees, with defined thresholds etc there are many out there, even a simple Robinson-Foulds test perhaps? Essentially - comparing the panels in Supplementary Figure 3 by eye is unreliable and in this case not possible given there are no labels. It would also be important to provide these full set of phylogenies generated and associated RF/other scores as supplementary file.

      We agree that this Supplementary Figure is difficult to assess by eye, however we feel that it is vital to show this data. Visually, we do feel like this figure conveys the idea that while individual branches may move around, the major clades/areas of interest are stable across the different alignments and tree builders. To increase robustness, we have included the weighted Robinson-Foulds test results into a new panel of this figure (Supplementary Fig. 3B).

      We have added a section to the methods on how this weighted Robinson-Foulds test was conducted:

      “Weighted Robinson-Foulds distances for Supp. Fig. 3B were calculated with Visual TreeCmp (settings: -RFWeighted -Prune trees -include summary -zero weights allowed)[83].”

      We added the weighted Robinson-Foulds data to Supplemental Fig. 3 and have updated the figure legend to reflect this new data. The new legend for Supp. Fig. 3B reads:

      “(B) The average weighted Robinson-Foulds distances all pairwise comparisons between the four tree types (MAFFT/MUSCLE alignment built with IQTREE/RAXML-ng). Although the distances were higher for the CD-NTase tree (as expected for this highly diverse gene family), all of the key nodes defining the cGLR, OAS, and eSMODS superfamilies, as well as their nearest bacterial relatives, were well supported (>70 ultrafast bootstrap value).”

      1. Does domain shuffling mean that phylogenetic reconstruction is less valid? How was the alignment performed in these cases to account for this.

      Thank you for bringing this up, this is a point we have now clarified in the text. Our searches, alignments, and trees are all of single protein domains, as typically only conservation within domains is retained across the vast distances between bacteria and eukaryotes. As such, domain shuffling should have no impact on the validity of that phylogenetic reconstruction. We have updated the text to be more clear about the scope of the alignments and searches. We made changes to our wording throughout the manuscript. One specific example of this is:

      “Using maximum likelihood phylogenetic reconstruction on the STING domain alone, we identified STING-like sequences from 26 diverse microeukaryotes whose STING domains clustered in between bacterial and metazoan sequences, breaking up the long branch.”

      Minor Comments: 10. I am not sure about the use of the term "truly ancestral" or variants thereof, same issues with "significant homology" and "inherited since LECA and possibly longer" .. these are awkwardly phrased. E.g. I think perhaps "homologous across the whole length" might be clearer, and elsewhere "present in LECA and possibly earlier" may be more fitting.

             We have updated the text for these phrases throughout the manuscript and have replaced them with more specific language.
      
      1. Line 75 - "Detecting" rather than discovering?

      We appreciate the suggestion. However, because many of these gene families have never been described in the eukaryotic lineages considered here, we think ‘discovering’ is more appropriate. Indeed, the eSMODS lineage demonstrates that our search approach has the power to find not just new homologs but to discover totally new subfamilies of these eukaryotic proteins.

      1. 132-133 - more justification is needed for the choice of bacterial genes.

      We have clarified that our selection of bacterial CD-NTases included every known CD-NTase at the time of our analysis. The text now reads:

      “As representative bacterial CD-NTases, we used 6,132 bacterial sequences, representing a wide swath of CD-NTase diversity[43]. To our knowledge, this dataset included every known bacterial CD-NTase at the time of our analysis.”

      1. For the downsizing from 6000 to 500 what were the criteria and thresholds.

      We have updated the text to include the PDA software options for downsampling.The text now reads:

      “We downsampled the CD-NTase bacterial sequences from ~6000 down to 500 using PDA software (options -k 500) on a FastTree (default settings) tree built upon a MAFFT (default parameters) tree, to facilitate more manageable computation times on alignments and tree construction.“

      1. How are you rooting your trees e.g. figure 2? Information is provided for Viperin but not others.

      We have updated the text to ensure that the root of every tree is specifically stated.

      1. In the results section on CD-NTases I think it would be best to place the second paragraph detailing the role of cGAS earlier in this section, perhaps after the first sentence.

      We have moved the second paragraph, which introduces cGAS, OAS, and the other CD-NTases to the beginning of the CD-NTase section.The first paragraph of the CD-NTase section of the results now reads:

      “We next studied the evolution of the innate immune proteins, beginning with cGAS and its broader family of CD-NTase enzymes. Following infections or cellular damage, cGAS binds cytosolic DNA and generates cyclic GMP-AMP (cGAMP)[32–35], which then activates downstream immune responses via STING [34,36–38]. Another eukaryotic CD-NTase, 2’5’-Oligoadenylate Synthetase 1 (OAS1), synthesizes 2',5'-oligoadenylates which bind and activate Ribonuclease L (RNase L)[39]. Activated RNase L is a potent endoribonuclease that degrades both host and viral RNA species, reducing viral replication (reviewed here[40,41]). Some bacterial CD-NTases such as DncV behave similar to animal cGAS; they are activated by phage infection and produce cGAMP[8,42,43]. These CD-NTases are commonly found within cyclic oligonucleotide-based anti-phage signaling systems (CBASS) across many bacterial phyla and archaea[8,27,43].”

      1. Is FASTtree really necessary to include as it will underperform in all instances? Removing that method and comparing the remaining two (i.e. IQTREE and RAXML) - what level of disagreement do you find between the 2 alignment and 2 tree building methods? The cases that disagree should also be detailed.

      We agree that FASTtree underperforms against IQTREE and RAXML and have eliminated those trees from the supplement. We initially had included FASTtree, as it still seems to be widely used in phylogenetic analyses within the recent papers on bacterial immune homologs, but we completely agree with the reviewer and have removed it. In addition, we have calculated and added in the average weighted Robinson-Foulds Distance to Supplemental Figure 3. Our manuscript focuses on features of the phylogenetic trees that were consistent across all the replicate methods. However, given the numerous sequences and high degree of divergence involved, there were many cases where individual branches shifted between the methods, e.g. if individual CD-NTases within bacterial clade G swapped positions with one another. The differences we observed between the trees were inconsequential to our overall conclusions.

      1. Again a structural point - the start to paragraph "To understand the evolutionary history of CD-NTases we used the Pfam domain PF03281 as a starting point", I don't know at this point why or how you have done this. The sentence seems a little premature. I would therefore suggest that you start that paragraph with your motivation, "In order to..." and then finish that paragraph with your sentence in quotes above which actually summarizes the paragraph.

      We have updated the text to clear up this paragraph (in addition to other structural changes in the CD-NTase section. The paragraph containing information about how we started the HMM searches for the CD-NTases now reads:

      “ To begin our sequence searches for eukaryotic CD-NTases, we used the Pfam domain PF03281, representing the main catalytic domain of cGAS, as a starting point. As representative bacterial CD-NTases, we used 6,132 bacterial sequences, representing a wide swath of CD-NTase diversity[21]. Following our iterative HMM searches, we recovered 313 sequences from 109 eukaryotes, of which 34 were metazoans (Supplemental Data and Fig. 1B). Within the phylogenetic trees, most eukaryotic sequences clustered into one of two distinct superfamilies: the cGLR superfamily (defined by clade and containing a Mab21 PFAM domain: PF03281) or the OAS superfamily (OAS1-C: PF10421) (Fig. 2A). Bacterial CD-NTases typically had sequences matching the HMM for the Second Messenger Oligonucleotide or Dinucleotide Synthetase domain (SMODS: PF18144).”

      1. Line 148 - "within" change to "before"?

      We have updated the text with this suggestion.

      1. Unclear from text as is whether you found any STING homologs in arthropods (~line 157). Please update the text for clarity. Would also suggest that "agreeing" should be replaced with "aligning".

      We found several STING homologs in arthropods and have updated the text to specifically note this. We also have updated the text as per the suggestion of using the term “aligning” instead of “agreeing”.The text now reads:

      “Almost half of these species (10/19) were arthropods, aligning with prior findings of STING sparseness among arthropods(Wu et al. 2014). We did find STING homologs in 8/19 arthropod species in EukProt v3, including the previously identified STINGs of Drosophila melanogaster, Apis mellifera and Tribolium castaneum(Wu et al. 2014; Margolis, Wilson, and Vance 2017).”

      1. Line 169 - If clade D is not a clade, maybe it should be called something different.

      Yes, unfortunate naming, isn’t it? Clade D is not a coherent clade in our results nor when it was first described, but we feel that for consistency with the rest of the field, it is best if we adhere to previously published nomenclature.

      1. Line 188-190 - In principle, max likelihood should be able to infer the right tree even with high divergence.

      Yes, we agree that maximum likelihood methods should be able to infer the correct tree. However, we are not sure what change the reviewer is suggesting here.

      1. Paragraph starting at 199 - eSMODS - always unknown function or mostly - could be important.

      To our knowledge the function of the two closest bacterial CD-NTases to the eSMODS group have an unknown function.

      1. For calling HGT you state that one of the criteria is that the euk and bac sequences branched near one another, what is "near" in this scenario?

      “Near” in this case refers to being adjacent on the phylogenetic tree. We have updated the text for clarity. The text now reads:

      “To minimize such false positive HGT calls, we took a conservative approach in our analyses, considering potential bacteria-eukaryote HGT events to be trustworthy only if: 1) eukaryotic and bacterial sequences branched adjacent to one another with strong support (bootstrap values >70); 2) the eukaryotic sequences formed a distinct subclade, represented by at least 2 species from the same eukaryotic supergroup; 3) the eukaryotic sequences were produced by at least 2 different studies; and 4) the position of the horizontally transferred sequences was robust across all alignment and phylogenetic reconstruction methods used (Supp. Fig. 3A).”

      1. In legends be specific about what type of support value, e.g. bootstrap or jack-knife.. I think it is always bootstrap but would be good to have that precision.

      Our phylogenetic trees only use bootstrap values for support and so have updated the figure legends and methods to provide this information. Apologies for this lack of clarity.

      1. Throughout the text if stating e.g. "clustered robustly and with high support" please provide the appropriate values.

      We have updated the text to provide bootstrap values when invoking statements about support. An example of this is:

      “There are two clades of Chloroplastida (a group within Archaeplastida) sequences that branch robustly (>80 ultrafast bootstrap value) within the bacteria clade.”

      1. It is unclear from the text how the animal origin of the TIR domain is supported (~line 274). Please provide necessary details to support your statements in the results section.

      Our phylogenetic tree of TIR domains (Supp. Fig. 7), places C. gigas’ TIR domain (of its STING protein) clusters with high support next to other metazoan TIR domains.

      We have updated the STING section to include these lines:

      “We also investigated the possibility that C. gigas acquired the TIR-domain of its TIR-STING protein via HGT from bacteria, however this analysis also suggested an animal origin for the TIR domain (Supp. Fig. 7), as the C. gigas TIR domain clustered with other metazoan TIR domains such as Homo sapiens TICAM1 and 2 (ultrafast bootstrap value of 75). Eukaryotic TIR-STINGs are also rare, further supporting the hypothesis that this protein resulted from recent convergence, where animals independently fused STING and TIR domains to make a protein resembling bacterial TIR-STINGs, consistent with previous reports[19].”

      1. Replace similar with -> similar "to"

      We have accepted the suggestion and replaced “with” with “to”.

      1. Line 266: It was previously shown .. or it is known but not "it was previously known"

      We have rephrased the sentence to be clearer: “Some eukaryotes like C. gigas…”.

      1. The last sentence in paragraph ~line 277: "Our work also identified a number of non-metazoan STINGS...." Please expand on this and provide some of the details on this finding in the text or point to the figure that supports the statement and provide a little more detail here.

      The intent of the words on line 277 was a summary of what we had previously discussed in the STING section. For clarity we updated the text, which now reads:

      “Interestingly the non-metazoan, blSTINGs (Fig. 3C) that are found in the Stramenopiles, Haptista, Rhizaria, Choanoflagellates and Amoebozoa have a TM-STING domain architecture similar to animal STINGs but a STING domain more similar to bacterial STINGs..”

      blSTINGs are discussed in more detail earlier in the STING section (specifically paragraph 3) where we say:

      “Using maximum likelihood phylogenetic reconstruction on the STING domain alone, we identified STING-like sequences from 26 diverse microeukaryotes whose STING domains clustered in between bacterial and metazoan sequences, breaking up the long branch. We name these sequences the bacteria-like STINGs (blSTINGs) because they were the only eukaryotic group of STINGs with a bacteria-like Prok_STING domain (PF20300) and because of the short branch length (0.86 vs. 1.8) separating them from bacterial STINGs on the tree (Fig. 3C). While a previous study reported STING domains in two eukaryotic species (one in Stramenopiles and one in Haptista) [19], we were able to expand this set to additional species and also recover blSTINGs from Amoebozoa, Rhizaria and choanoflagellates. This diversity allowed us to place the sequences on the tree with high confidence (bootstrap value >70), recovering a substantially different tree than previous work[19]. As for CD-NTases, the tree topology we recovered was robust across multiple different alignment and phylogenetic tree construction algorithms (Supp. Fig. 3A).”

      1. Line 294: it is unclear which are the orphan taxa -we are directed to figure 1 but there is no notation for orphan taxa here perhaps add something to the figure to make obvious which these are.

      We have updated the text to mention these orphan taxa specifically by name.

      The text now reads:

      “The 194 viperin-like proteins we recovered came from 158 species spanning the full range of eukaryotic diversity, including organisms from all of the major eukaryotic supergroups, as well as some orphan taxa whose taxonomy remains open to debate (Fig. 1, Ancyromonadida, Hemimastigophora, Malawimonadida).”

      1. Lines 340-341 - some redundant use of eukaryotic/eukaryotes

      We have updated the text to reduce redundancy.

      1. Lines 475-480 - some further detail needed - how were sequences trimmed to the TIR domain? - what were your starting sequences? etc.

      We have updated the text detailing how we acquired a set of proteins from Interpro and how we used hmmscan to determine the coordinates for the TIR domains in those proteins. We then isolated the TIR domains (using the coordinates defined by hmmscan) and proceeded to align those sequences

      The text now reads:

      “We used hmmscan to identify the coordinates of TIR domains in a list of 203 TIR domain containing-sequences from InterPro (all 203 proteins from curated “Reviewed” selection of IPR000157 (Toll/interleukin-1 receptor homology (TIR) domain as of 2023-04-04)) and 104 bacterial TIR-STING proteins (the same TIR-STING proteins used in Fig. 3)[3]. Next, we trimmed the sequences down to the hmmscan identified TIR coordinates and aligned the TIR domains with MUSCLE (-super5). We trimmed the alignments with TrimAL and built a phylogenetic tree with IQtree (-s, -bb 1000, -m TEST, -nt AUTO).”

      1. Check that the colour schemes for branches etc are detailed in the legends of supplementary as well as main.

      We have updated the text of figure legends to be more clear about our maintenance of the same color scheme throughout the manuscript. This involved ensuring that the following statement (or an equivalent statement) was present in the figure legends of Figures 2, 3, 4, S2, S3,S4,S5,S6, and S7:

      “Eukaryotic sequences are colored according to eukaryotic group as in Fig. 1B.”

      1. The threshold set for gaps is very strict at 0.2. This seems quite strict given the sequences are potentially quite highly divergent. What length are the alignments that you are using after trimming - these details need to be included and considered.

      We have updated the text to specifically detail how long our alignments were after trimming and how that post-trimming length compares to the length of the alignment for each PFAM group.

      Specifically, the text now reads:

      “The length of these final alignments were 232, 175, and 346 amino acids long for CD-NTases, STING, and viperin respectively. These alignments represent ≥75% of the length of alignment their respective PFAM domain (PF3281 (Mab-21 protein nucleotidyltransferase domain) for CD-NTases, PF20300 (Prokaryotic STING domain) for STING, and PF404055 (Radical SAM family) for viperin.”

      1. How were sequences downsampled with PDA? Line 424.

      We have updated the text to include the PDA settings that were used to downsample sequences. The text now reads:

      “To ensure the combined HMM did not have an overrepresentation of either bacterial or eukaryotic sequences, we downsampled the bacterial sequences and eukaryotic sequences to obtain 50 phylogenetically diverse sequences of each, and then combined the two downsampled lists. To do this, eukaryotic and bacterial sequences were each separately aligned with MAFFT (default parameters), phylogenetic trees were built with FastTree (v2.1.10)[77], and the Phylogenetic Diversity Analyzer (pda/1.0.3)[78] software with options -k 50 or -k 500 with otherwise default parameters was run the the FastTree files to downsample the sequences while maximizing remaining sequence diversity.”

      1. Please provide adequate descriptions for the materials in the supplementary files for the manuscript, they currently lack description. They are useful and we fully support their inclusion with sufficient information.

      We have expanded the descriptions of the provided supplementary files.

      1. The starting sequences, hmm pipeline and scripts would be great to include, apologies if we have missed them.

      We have added the starting bacterial sequences to the supplementary data, as well as the final HMMs, and the one script that we used in our analysis. All other software (including the included script) is freely and publicly available.

      Significance

      This study provides us with examples of instances where a medley of different mechanisms have resulted in the emergence of innate immune proteins across eukaryotes. The study is entirely bioinformatic in nature and provides some nice cases for future study. The thorough search strategies are to be commended. The limitations of the work are that we don't know whether the functions have also been conserved across deep time and/or in the independent events described. Nevertheless, this work contributes to a growing body of evidence on the complex, and sometimes shared, nature of the evolution of animal and bacterial immunity. I would classify this nice study as a conceptual advance of our understanding of the evolution of protein families through deep time and would imagine it is of interest to a broad audience of biologists from immunologists to evolutionary biologists and structural biologists.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Culbertson and Levin takes a bioinformatic approach to investigate the evolutionary origins/trajectories of three different proteins domains involved in innate immunity in both bacteria and eukaryotes: cGAS/CD-NTases, STING, and Viperins. To perform this analysis, the authors apply an iterative homology search model to the EukProt database of eukaryotic genomes. Their analysis finds that that eukaryotic CD-NTases arose from multiple horizontal gene transfer events between bacteria and eukaryotes. They also fill in an important gap in understanding how STING from bacteria evolved into modern human STING by identifying blasting in diverse eukaryotes. Finally, they determine that Viperins are an ancient protein family that likely existed in LECA, but found two more recent HGT events for proteins related in Vipirin.

      Major comments

      1. The hypothesis for the origin of STING via convergent domain shuffling could be handled with a little more care in the text. The authors show that homologs of STING from animals can also be found in the genomes of diverse eukaryotes outside the metazoa, demonstrating (1) STING and cGAS have had different histories, and (2) that these sequences are more bacteria-like than metazoan STING. However, in multiple places (the title, line 275, elsewhere) the term "convergence" could be misleading. "Convergence" leaves the reader with the impression that there is no common ancestor between the STING domain from bacteria and eukaryotes. I understand that the authors are using "convergent domain shuffling" to draw this distinction, but I'm unsure if a naïve reader will glean the distinction between domain shuffling and STING itself converging. I would argue that we simply cannot place eukaryotic STING and blSTING proteins on the tree of bSTING sequences. i.e. blSTING are no more related to bacterial TM-STING than bacterial TIR-STING (likely the missing bSTING sequences are simply extinct?). Can the authors curate their language to state more simply that STING likely arose through horizontal gene transfer, but it is unlikely that bacterial TM-STING is the unequivocal progenitor?

      We thank the reviewer for this comment, and we absolutely agree that we should be clearer about the distinction between convergence and convergent domain shuffling. We have changed the title and edited the text to increase clarity. In addition, we have clarified what our data does and does say about the evolutionary history of STING. We feel that our STING tree (Fig.3 C), due to a general sparseness of eukaryotic and bacterial sequences, is insufficient to confidently call if eukaryotes acquired STING by HGT or if STING was present in the LECA.

      We have added the following to clear up this issue:

      “Overall, the phylogenetic tree we constructed (Fig. 3C) suggests that there is domain-level homology between bacterial and eukaryotic STINGs, but due to sparseness and lack of a suitable outgroup, this tree does not definitively explain the eukaryotic origin of the STING domain. However, the data does clearly support a model in which convergent domain shuffling in eukaryotes and bacteria generated similar TM-STING and TIR-STING proteins independently.”

      Minor Comments

      1. Spelling error in Figure 3B and 3C: "cannoical"

      Thanks, we have corrected this error.

      1. Figure 5 could be improved to more clearly articulate the findings of the manuscript. In A, it's unclear how OAS relates to Mab21 and a reader not paying close attention might think that OAS was part of the gene duplications after Mab21 was acquired. The LECA origins of OAS are also not presented (albeit, these are still defined in the legend). In B, this panel would suggest that there was not horizontal transfer of STING from bacteria to eukaryotes but rather both domains of life received STING from a separate source. My understanding is STING did likely arise in bacteria, however, the assumption that extant TM-STING in bacteria is the predecessor of TM-STING in eukaryotes is not well supported. Similarly for the TIR domain.

      We have updated Fig. 5 to more clearly show that OAS was likely in the LECA and that eSMODS and cGLRs were HGT’d from bacteria to other eukaryotic lineages. For STING, it was not our intent to imply that the extant TM-STING in bacteria is the predecessor of TM-STING in eukaryotes, and we agree with the reviewer that this is unlikely. Although we do not have sufficient data to speak to the origin of the STING domain itself, we do feel confident in our evidence of domain shuffling. Our illustration in Fig 5B was meant to correspond to the following statement: “Drawing on a shared ancient repertoire of protein domains that includes STING, TIR, and transmembrane (TM) domains, bacteria and eukaryotes have convergently evolved similar STING proteins through domain shuffling.” We believe this inference valid and best describes our results for STING.

      1. Line 119: While the role of Mab21L1-2 are established for development, I'm unaware of a role for MB21D2 in development (or any other phenotype).

      We agree with the reviewer that MB21D2 has not been shown to have any phenotype and have corrected the wording to clarify this point.

      The line now reads “However, the immune functions of Mab21L1 and MB21D2 remain unclear, although Mab21L1they has been shown to be important for development[29–31].”

      1. Line 210: "Gamma" should be "genes"

      We have corrected this error and replaced the word.

      Reviewer #3 (Significance (Required)):

      This work is of high quality, is timely, and will have a large impact on shaping the field. The origins and evolution of antiviral immunity from bacteria to eukaryotes have been investigated from multiple angles. While the phylogeny and evolutionary trajectory of these genes have been traced in bacteria, there have been relatively fewer analyses across diverse (non-metazoan) eukaryotes. For this reason, I am confident that this manuscript will help future researchers select homologs for investigation and guide similar analyses of other bacterial defense systems.

      A particular challenge of this work is accounting for gene loss across taxa and weighing that possibility against horizontal gene transfer. The authors are conservative in their conclusions and well-reasoned. The comments I have can be addressed with changes to the writing and emphasis of certain points.

      I expect these findings to be of interest to a broad audience of evolutionary biologists, microbiologists, and immunologists.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Describe your expertise? Molecular Evolution, Mechanisms of Protein evolution, Phylogenomics, Adaptation.

      Summary: This manuscript broadly aims to improve our understanding the evolutionary relationships between eukaryote and bacterial protein families where members of those families have immune roles. The study focusses on three such families and samples deeply across the eukaryotic tree. The approaches taken include a nice application of the EukProt database and the use of homology detection approaches that are sensitive to the issues of assigning homology through deep time. The main findings show the heterogeneity in means by which these families have arisen, with some of the families originating at least as far back as the LCA of eukaryotes, in contrast the wide spread yet patchy distribution of other families is the result of repeated independent HGT events and/or convergent domain shuffling.

      Major Comments:

      1. Overall the level of detail provided throughout the manuscript is lacking, perhaps the authors were constrained by a word limit for initial submission, if so then this limit needs to be extended to include the detail necessary. In addition, there are some structural issues thorughout, e.g. some of the very brief intro (see later comment) reads a little more like methods (paragraph 2) and abstract (paragraph 3). The results section is lacking detail of the supporting evidence from the clever analyses that were clearly performed and the statistics underpinning conclusions are not included.
      2. The intro and discussion both include statements about some recent discoveries that bacteria and mammals share mechanisms of innate immunity - but there is no further detail into what would appear to be important work leading to this study. This context needs to be provided in more detail therefore I would encourage the authors to expand on the intro to include specific detail on these significant prior studies. In addition, more background information on the gene families investigated in detail here would be useful e.g. how the proteins produced influence immunity etc should be a feature of the intro. A clear and concise rationale for why these 3 particular gene families (out of all the possible innate immune genes known) were selected for analysis.
      3. Context: Genome quality is always a concern, and confirming the absence of an element/protein in a genome is challenging given the variation in quality of available genomes. Low BUSCO scores mean that the assessment of gene loss is difficult to evaluate (but we are not provided with said scores). Query: in the results section it states that the BUSCO completeness scores (which need to be provided) etc were insufficient to explain the pattern of gene loss. I would like to know how they reached this conclusion - what statistical analyses (ANOVA?? OTHER??) have been performed to support this statement and please include the associated P values etc. Similarly, throughout the paper, including in the discussion section, the point is brushed over. If, given a statistical test, you find that some of the disparity in gene presence is explained by BUSCO score, most of your findings are still valid. It would just be difficult to make conclusions about gene loss.
      4. In terms of the homolog search strategy - line 394 - can you please state what an "outgroup gene family" means in this context. It is unclear but very important to the downstream interpretation of results.
      5. For reproducibility, the materials and methods section needs to provide more detail/sufficient detail to reproduce these results. E.g the section describing phase 1 of the euk searches the text here repeats what is in the results section for the crystal structure work but doesn't give me any information on how, what method was used to "align the crystal structures", what scoring scheme is used and how the scoring scheme identifies "the core"? What specific parameters are used throughout. Why is MAFFT the method of choice for some of the analyses? Whereas, in other cases both MAFFT and MUSCLE are employed. What are the specific settings used for the MAFFT alignments throughout - is it default (must state if that is the case) or is it MAFFT L-INS-I with default settings etc.
      6. The justification for the number of HMM searches needs to be included. The choice of starting points for the HMMs was cryptic - please provide details. It is likely that you ran the search until no more sequences were found or until sequences were added from a different gene family, and that these happened to be between 3 and 5 searches, but it reads like you wanted to run it 3 or 5 times and that corresponds to the above condition. Something like this would be clearer: "The profile was [...] until no more sequences were found or until sequences from other gene families were found which was between 3 and 5 times in all cases" - the same is true of figure 1.
      7. Why do you limit hits to 10 per species - might this lead to misleading findings about gene family diversity? Info and justification for approach is required (411-412)
      8. The information in Supplementary Figure 3 is quite difficult to assess visually, but I think that is what is expected from that figure. However, this is an important underpinning element of the work and should really be quantitatively assessed. A metric of comparison of trees, with defined thresholds etc there are many out there, even a simple Robinson-Foulds test perhaps? Essentially - comparing the panels in Supplementary Figure 3 by eye is unreliable and in this case not possible given there are no labels. It would also be important to provide these full set of phylogenies generated and associated RF/other scores as supplementary file.
      9. Does domain shuffling mean that phylogenetic reconstruction is less valid? How was the alignment performed in these cases to account for this.

      Minor Comments:

      1. I am not sure about the use of the term "truly ancestral" or variants thereof, same issues with "significant homology" and "inherited since LECA and possibly longer" .. these are awkwardly phrased. E.g. I think perhaps "homologous across the whole length" might be clearer, and elsewhere "present in LECA and possibly earlier" may be more fitting.
      2. Line 75 - "Detecting" rather than discovering?
      3. 132-133 - more justification is needed for the choice of bacterial genes.
      4. For the downsizing from 6000 to 500 what were the criteria and thresholds.
      5. How are you rooting your trees e.g. figure 2? Information is provided for Viperin but not others.
      6. In the results section on CD-NTases I think it would be best to place the second paragraph detailing the role of cGAS earlier in this section, perhaps after the first sentence.
      7. Is FASTtree really necessary to include as it will underperform in all instances? Removing that method and comparing the remaining two (i.e. IQTREE and RAXML) - what level of disagreement do you find between the 2 alignment and 2 tree building methods? The cases that disagree should also be detailed.
      8. Again a structural point - the start to paragraph "To understand the evolutionary history of CD-NTases we used the Pfam domain PF03281 as a starting point", I don't know at this point why or how you have done this. The sentence seems a little premature. I would therefore suggest that you start that paragraph with your motivation, "In order to..." and then finish that paragraph with your sentence in quotes above which actually summarises the paragraph.
      9. Line 148 - "within" change to "before"?
      10. Unclear from text as is whether you found any STING homologs in arthropods (~line 157). Please update the text for clarity. Would also suggest that "agreeing" should be replaced with "aligning".
      11. Line 169 - If clade D is not a clade, maybe it should be called something different.
      12. Line 188-190 - In principle, max likelihood should be able to infer the right tree even with high divergence.
      13. Paragraph starting at 199 - eSMODS - always unknown function or mostly - could be important.
      14. For calling HGT you state that one of the criteria is that the euk and bac sequences branched near one another, what is "near" in this scenario?
      15. In legends be specific about what type of support value, e.g. bootstrap or jack-knife.. I think it is always bootstrap but would be good to have that precision.
      16. Throughout the text if stating e.g. "clustered robustly and with high support" please provide the appropriate values.
      17. It is unclear from the text how the animal origin of the TIR domain is supported (~line 274). Please provide necessary details to support your statements in the results section.
      18. Replace similar with -> similar "to"
      19. Line 266: It was previously shown .. or it is known but not "it was previously known"
      20. The last sentence in paragraph ~line 277: "Our work also identified a number of non-metazoan STINGS...." Please expand on this and provide some of the details on this finding in the text or point to the figure that supports the statement and provide a little more detail here.
      21. Line 294: it is unclear which are the orphan taxa -we are directed to figure 1 but there is no notation for orphan taxa here perhaps add something to the figure to make obvious which these are.
      22. Lines 340-341 - some redundant use of eukaryotic/eukaryotes
      23. Lines 475-480 - some further detail needed - how were sequences trimmed to the TIR domain? - what were your starting sequences? etc.
      24. Check that the colour schemes for branches etc are detailed in the legends of supplementary as well as main.
      25. The threshold set for gaps is very strict at 0.2. This seems quite strict given the sequences are potentially quite highly divergent. What length are the alignments that you are using after trimming - these details need to be included and considered.
      26. How were sequences downsampled with PDA? Line 424.
      27. Please provide adequate descriptions for the materials in the supplementary files for the manuscript, they currently lack description. They are useful and we fully support their inclusion with sufficient information.
      28. The starting sequences, hmm pipeline and scripts would be great to include, apologies if we have missed them.

      Significance

      This study provides us with examples of instances where a medley of different mechanisms have resulted in the emergence of innate immune proteins across eukaryotes. The study is entirely bioinformatic in nature and provides some nice cases for future study. The thorough search strategies are to be commended. The limitations of the work are that we don't know whether the functions have also been conserved across deep time and/or in the independent events described. Nevertheless, this work contributes to a growing body of evidence on the complex, and sometimes shared, nature of the evolution of animal and bacterial immunity. I would classify this nice study as a conceptual advance of our understanding of the evolution of protein families through deep time and would imagine it is of interest to a broad audience of biologists from immunologists to evolutionary biologists and structural biologists.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thanks the reviewers for their critique of our report and our responses to all of their comments are given below.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary Toxoplasma gondii is an obligate intracellular parasite. Intracellular survival critical depends on secretory vesicles named dense granules. These vesicles are predicted to contain >100 different proteins that are released into PV, PV membrane and the host cell to control the parasites intracellular environment and host cell gene expression and immune response. How and where these vesicles are released from the parasite is a long-standing question in the field because T. gondii, and other apicomplexan parasites contained a complex pellicular cytoskeletal structure called the IMC which limits dense granule access to the plasma membrane. In this manuscript by Chelaghma, Ke and colleagues demonstrates for the first time that dense granules are secreted from the parasite at pore structures called the apical annuli. The authors used their previously generated HyperLOPIT data set and identified a plasma membrane protein that is specifically enriched at the apical annuli. Using BioID the authors then identify three SNARE proteins that also localize at the apical annuli. The localization of these proteins is determined using excellent super-resolution structured illumination microscopy. Conditional protein knockdowns for all four proteins were created and both proteomics and microscopy used to demonstrate a reduction in dense granule secretion in the absence of these proteins. Collectively, these data make new and substantial contributions to our understanding of mechanisms of dense granule secretion. Major comments: Overall, these data is convincing and well-described. The text is clear and well written. There are a few instances (see below) where the authors doesn't adequately describe the data or over state the strength of the results. These issues could all be addressed editorially or by process existing data.

      Comment 1.1

      The authors use proteomics and IFA to show that there is a reduction (rather than an inhibition of) in dense granule secretion. However, from the phase images in figure 5, the vacuoles of KD parasites look normal and so not have the phenotypes that one would expect after a significant reduction in dense granule secretion, such as the "bubble" phenotype described for GRA17 and GRA23 knockouts (Gold et al 2015; PMID: 25974303). Authors should describe their findings in the context of the expected phenotypes based on the published literature. The statement on line 369-371 is too strong and should imply a reduction rather than an inhibition of dense granule secretion.

      Authors’ response: It is difficult to compare our results to individual dense granule protein mutants described in the literature because such phenotypes are the result of the loss of only a single protein being exported to the host, whereas we are observing the effects of the reduction of secretion of up to 120+ different proteins. Furthermore, we agree with this reviewer that none of the protein knockdowns appear to completely prevent dense granule secretion, which we implied by ‘inhibition’, and this could be either due to incomplete knockdown of each of these proteins with some residue function, or some redundancy where other proteins can contribute to secretion. We have changed the statement flagged by this reviewer to: ‘Depletion of all four of these proteins affects dense granule secretion*’ to avoid the interpretation of complete loss of function. We now further state that residual secretion may still occur and consider this in the light of possible reasons for this (Discussion, paragraph 4). In any case, none of these considerations change our conclusion that these proteins, at the site of the apical annuli, are implicated in dense granule secretion. *

      __Comment 1.2 __

      The more severe phenotype observed in the AAQa iKD and the additional localizations of AAQa and AAQc suggests an additional role for these protein in protein trafficking that is supported by the authors data. In both AAQa and AAQc there appears to be an accumulation of GRA1 in a post-Golgi compartment and is less vesicular in appearance than the phenotype observed in the AAQb iKD parasites. Additionally, I disagree with the authors assessment that KD of these proteins does not effect microneme localization. In both AAQa and AAQc there appears to be increased number of micronemes at the basal end of the parasites compared with controls. Although this is not a direct focus of the authors papers, a description of these findings should be included in the results and discussion sections.

      Authors’ response: We have included a more complete discussion that considers the differences in phenotypes of the four mutants, including additional locations of two SNAREs, all of which is consistent with known SNARE biology (Discussion, fourth paragraph). These considerations, however, have no impact on our conclusions where all four proteins, including two that are exclusive to the apical annuli, have equivalent effects on dense granule exocytosis.

      Concerning the effects on microneme and rhoptries of the different knockdowns, we have modified and limited our interpretation to overall IFA staining strength and protein organelle protein abundance by proteomics, where we see no differences. This addresses if there is a major post-Golgi trafficking defect that could affect biogenesis of all of micronemes, rhoptries and dense granules, for which we see no evidence. Whether there are subtle differences in the location of these organelles, which are known to show some variability, is beyond the scope or relevance to our central questions. Given that growth phenotypes are seen for all mutants, it is quite possible that secondary effects of retarded cells might present as some disorder within the cell, although we saw nothing conspicuous of this nature in many hundreds of examples observed.

      __ Comment 1.3__

      Presentation of the data in Figure 5. This figure contains images where the fluorescent dense granule signal is overlaid on phase images. However, in some cases (AAQb, AAQc, AAQa, GRA1 KD) the merged imaged looks like a straight merges of the two images, whereas in the rest of the images it looks like a thresholded fluorescent image is merge with phase. Authors need to process the images in consistent manner and provide a description of the image processing in the figure legend and materials and methods.

      Authors’ response: Thank you for this suggestion, we have now processed all of these merges the same way (ImageJ -> merge channels -> Composite Sum). While the merges are only intended to aid in aligning the fluorescence signal with the phase image, we agree that it is better to present them the same way.

      Minor comments:

      Comment 1.4

      The discussion is overly long and could be shorted in some places. Lines 373 and 388 in particularly don't seems directly relevant to the manuscript.

      Authors’ response: The paragraph identified by this reviewer considers the LMBD protein that is the first, and currently only, trans plasma membrane protein specific to the apical annuli that implies that this structure is exposed to the exterior of the cell. It is, therefore, of considerable significance to how we interpret the function and behaviour of these annular structures. We believe that it is very relevant to our study to consider what else is known about these relatively mysterious, but widely conserved, eukaryotic proteins, which is the subject of this paragraph. The other reviewers highlight the relevance of LMBD3 to the interpretation of this structure. This reviewer hasn’t identified any further superfluous discussion elements, and we believe that the current length is not excessive and is justified.

      Comment 1.5

      Line 184 - Remove question mark from this sentence

      Authors’ response: The question mark has been removed.

      Comment 1.6

      Line 321. Should read Figure 7A, not figure 6A.

      Authors’ response: Thank you, corrected.

      Comment 1.7

      Line 139 - should read Figure 1B instead of 2C

      Authors’ response: Thank you, corrected (although to 1C, which is in fact correct).

      Comment 1.8

      Figure 3- Column labels for early, mid, or late endodyogeny would help with the clarity of this figure, especially for readings who are unfamiliar with the field.

      Authors’ response: We have labelled the figure as suggested.

      Comment 1.9

      Figure S2 - the letter n is missing from knockdown labels. And the number 3 from LMBD 3 is covering the word knockdown in the last panel.

      Authors’ response: Thank you, corrected.

      Reviewer #1 (Significance (Required)):

      The manuscript provides, for the first time, insight into the mechanism of dense granule secretion in Toxoplasma and identifies the sites on parasite pellicle where these vesicles can traverse the IMC to reach the plasma membrane. This is a significant conceptual advance in our understanding of this cellular vital process, one that is required for T. gondii intracellular survival. This paper would have broad interest from other research groups studying parasitology, secretion and protein trafficking.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript reports on characterizing the function of the long-known apical annuli, which are pores embedded in the membrane skeleton of Toxoplasma gondii. Since their function has remained long elusive, this manuscript is a major breakthrough.

      Comment 2.1

      It is of note, however, that this breakthrough, using the same three SNAREs, was recently, in parallel, also reported by Fu et al in PLoS Pathogens (PMID 36972314), which work is cited here. The additional novelty here is the finding of LMDB3 in the plasma membrane at the site of the annuli. This is a widely conserved protein for which little function is known except roles in signaling, The connection between LMDB3 and the SNAREs is through BioID, but they are preys quite far down the list. Furthermore, the function of LMDB3 is not explored here. As such, the additional advance compared to the Fu et al report is limited. The function of the SNAREs in dense granule exocytosis is much more robustly done here through the proteomics data displaying an accumulation of DG proteins.

      Authors’ response: While it is true that the discovery of the three SNAREs at the apical annuli was made and reported in parallel by Fu et al (2023), a major difference in their conclusions is that they suggest that dense granules are not secreted at this site (this reviewer has mistakenly thought that this was their conclusion — “In our experiments, none of the SNAREs were shown to be related to the exocytosis of GRAs. Therefore, the mechanism that mediates exocytosis of GRAs at the plasma membrane remains to be elucidated.” Fu et al (2023)*). The failure of Fu et al to detect this was almost certainly because they only tested for dense granule secretion defects by inducing depletion of the apical annuli SNAREs after the parasites had invaded the host cells. It is known that dense granule protein secretion happens rapidly in the initial moments after invasion, so apical annuli perturbation in their assay would have only occurred after these secretion events. We directly discuss this experimental difference in our revised discussion and how it accounts for their different conclusions (Discussion, fourth paragraph). We independently tested for this effect by quantitative proteomics which further supported our conclusions. *

      As this reviewer indicates, we additionally discovered that a protein (LMBD3) also spans the plasma membrane at these structures, and this implicates signalling or events at the cell surface. We show that this protein is also required for normal dense granule secretion. While we have not identified an explicit mechanistic role for LMBD3 in this process, such insight is also lacking for all LMBD proteins, including those in humans where they are implicated in disease. While we continue to pursue this interesting question of LMBD3 function, we are by no means alone in cell biology for these answers to be outstanding still.

      Comment 2.2

      The presentation of the data is very clean and convincing, and the broader evolutionary context is well-presented as well. The discussion on whether maintaining the IMC during cell division is an innovation or ancestral is an open debate where the authors seem to come down on the side of innovation, but the evidence could go either way, so I would caution a bit more.

      Authors’ response: We are puzzled by this reviewer’s comment because we do not make reference to the maintenance of the IMC during cell division in this evolutionary context — ancestral or a recent innovation. We describe the case of Toxoplasma and its close relatives maintaining the maternal IMC during division as ‘unusual’, not ancestral (second sentence of the last paragraph of the Discussion), and this is the only statement that we think might have elicited this query from the reviewer. But this does not imply what the ancestral state might have been which is not a subject of any of our considerations here.

      Major comments: - Are the key conclusions convincing?

      Comment 2.3

      The identification of the three SNARE proteins through BioID is not very convincingly represented in Table S1. These SNAREs were not showing significant changes and were not detected universally across the three bio-reps, and thyn were also present in the controls. Although this does not diminish the message of the work, this appears to be quite Cherry-picked, while other top hits in the BioID were overlooked, e.g. Nd6 and Nd2 are right in the top ten, which have a demonstrated role in rhoptry exocytosis. This certainly piqued my interest, but is not even discussed.

      *Authors’ response: We have used BioID as a protein discovery strategy, not to directly measure protein proximity for which it is an imperfect measure for many technical reasons. Accordingly, discovered ‘candidates’ for proteins that might occur at the annuli were all independently verified by protein reporter tagging. We focused our efforts on discovering apical annuli plasma membrane-tethered proteins and, therefore, parsed our BioID data for those shown previously to be in the plasma membrane by LOPIT spatial proteomics (Barylyuk et al, 2020). It is true that the SNARE proteins were not favoured over many other proteins in the BioID signal, but their verified location at these sites justified our pursuit of them as new apical annuli proteins. *

      Other proteins, including the previously identified apical proteins Nd6 and Nd2 that are implicated in rhoptry secretion, similarly piqued our interest! But when we reporter-tagged them they were revealed as BioID false positives, consistent with published work on these proteins, and other ‘top hits’ included some other false positives. Table S1 is included as a further recourse for the field, but it only served as a first step in functional protein discovery in our study.

      Comment 2.4

      TgAAQa, TgAAQb and TgAAQc were recently also reported to localize to the annuli by Fu et al 2023 (PMID: 36972314; this report is even cited in this manuscript for Rab11a accumulation), who gave them different names: TgStx1, TgStx20, and TgStx21 (not in this order). I see no reason to adopt a new nomenclature here, which will be very confusing in the future literature. Please adopt the Stx names in this manuscript.

      *Authors’ response: We agree that where there is precedent in naming it is better to use the earliest used names. Naming of proteins is also best done to reflect orthologues found between species so that consistent names indicate common functions. The naming system proposed by Fu et al for the Qa, Qb and Qc SNAREs unfortunately does not fulfil this second important criterion. They based their names on ‘Syntaxin’ which was first used for an animal SNARE of the nervous system that is almost exclusively used for Qa paralogues. Furthermore, in animals Stx1-4 are all vertebrate-specific Qa paralogues that have arisen only in this group. So, to name the Qa SNARE of Toxoplasma according to one of these animal-specific nerve proteins (Stx1) implies an evolutionary inheritance that is very unlikely (i.e., lateral gene transfer from an animal) and is unsupported by published phylogenies. Furthermore, Fu et al also give the Qb and Qc SNAREs the animal Qa name ‘syntaxin’, and arbitrarily number them Stx21 and Stx20. So, while they have named these proteins first, we think that the names given provide confusing and misleading labels for these proteins. *

      We initially proposed a simpler system according to the location of the SNARE in Toxoplasma (AA = Apical Annuli) and the Q domain type (Qa, Qb, Qc), e.g., AAQa. But on reflection we propose using precedent and orthology and adopt the existing orthologue names as the most useful solution. Klinger et al (2022) have resolved the phylogeny of the three Toxoplasma SNAREs, and they group with strong phylogenetic support with known eukaryote-wide orthogroups with previous names: Qa=StxPM (Syntaxin Plasma Membrane); Qb=NPSN (Novel Plant ‘Syntaxin’); and Qc=Syp7 (a Qc SNARE family originally thought to be specific to plants). These SNARE types are all known to operate at the plasma membrane, and accordingly the names TgStxPM, TgNPSN, and TgSyp7 would indicate their orthology and similar functional location known in other eukaryotes. We have justified this preferred naming system in the text of our report (Discussion, third paragraph), but making it clear which Fu et al names correspond to these more universally consistent names so that these can be easily cross-referenced.

      Comment 2.5

      No knock-down of LMBD3 is pursued: how would this impact SNARE distribution and/or other annuli proteins? The fitness score is very severe, -4.07, so this is somewhat puzzling. Lower comment is related. This could provide tantalizing insights in the architecture of the annuli, and/or their function as a secretory conduit.

      LMBD3 relative to the SNAREs is not explored: co-IPs or detergent extraction to see if they are all in a physically interacting complex. What keeps them together. Is LBCDR3 interfacing with any annuli proteins Cen2 is suggested through the image in Fig 2A, though there appears to be some separation in some images: AAP2, 3 and 5 were previously shown to have smaller diameters than Cen2 and therefore appear better positioned.

      Authors’ response: LMBD3 knockdowns were pursued in so far as identifying that they also have a phenotype of reduced dense granule secretion as for the SNAREs, but it will indeed require further studies of this intriguing molecule to define its specific function. Our central questions of this study were what is the association of the apical annuli with respect to the IMC and plasma membrane, and what is the overall significance and function of these structures. These core questions have been answered in our study. The questions that this review raises here are further and logical questions specifically related to LMBD3 that we are now pursuing as an independent follow-on study.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Comment 2.6

      The discussion on whether maintaining the IMC during cell division is an innovation or ancestral is an open debate where the authors seem to come down on the side of innovation, but the evidence could go either way, so I would caution a bit more.

      Authors’ response: This comment (2.2) is already made and addressed above.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      Comment 2.7

      The heavy focus on the LMBD3 in Fig 1 and the evolutionary discussion would warrant a more direct functional dissection. Either through an LMDB3 known-down, or its interface with the SNAREs or annuli more directly.

      Authors’ response: This reviewer has not made it clear that further work on LMBD3 is necessary to support the conclusions of the paper or address the questions that we have asked, only that they would like to see more insight into LMBD3. We would also! But we do present knock-down studies and show that there are functional consequences for dense granule secretion. The question of if LMBD3 is involved in the maintenance of apical annuli structure and/or integrity is an interesting one, but a further question to those that we have presented in this first study. LMBD proteins have poorly characterised molecular functions throughout eukaryotes, and while we are also motivated to understand their role more, this has not proven a straightforward task in other systems also.

      Comment 2.8

      The claim that the annuli are the conduits though which the dense granules travel to get exocytosis is not directly supported by any of the experiments as it is solely based on co-localization studies, not even direct interactions.

      Authors’ response: We agree that we have not directly observed dense granules in the act of secretion at the apical annuli. Dense granules are known to be very mobile in the cell and traffic dynamically on actin networks. So, they do not accumulate at any one site, and their fusion and exocytosis is likely a rapid, transient event. Multiple lines of evidence for them pausing and fusing with the plasma membrane, while indirect, independently support this conclusion:

      • SNARE proteins restricted to the apical annuli in the plasma membrane are required for normal dense granule secretion
      • When these SNAREs are depleted dense granule proteins accumulate in the parasite
      • Rab11A is a further vesicle-tethering molecule that has been shown to be attached to dense granules and its mutation also leads to inhibition of dense granule proteins (Venugopal et al, 2020)
      • When the apical annuli SNAREs are depleted Rab11A accumulates at the annuli (Fu et al, 2023) Collectively, we believe that the claim that the apical annuli are the sites of dense granule secretion is very strongly supported, particularly by the very molecules that would be required for vesicle docking and fusing at these sites, and is justified to be noted in the title. We have, however, made it clear in our report now that these data are indirect and that dense granules are yet to be captured in the act of secreting their contents at these sites (Discussion, paragraph five).

      **Referees cross-commenting**

      The consolidating themes I see (and value) in the reviews:

      Comment 2.9

      1. functional follow up of role of LMDB3 Authors’ response: This work is already part of a follow-up project.

      Comment 2.10

      adopt nomenclature of Fu et al, to avoid confusion in literature

      Authors’ response: Please see our response to Comment 2.4

      Comment 2.11

      better integrate the findings in light of the Fu et al publication throughout this manuscript

      *Authors’ response: We have further acknowledged and compared our findings to those of the parallel study of Fu et al with additional text in the discussion. *

      Comment 2.12

      no direct evidence of dense granules at annuli; attenuate the claims (in title etc), or include supportive data

      Authors’ response: Please see our response to the equivalent Comment 2.8 above.

      Reviewer #2 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Comment 2.13

      The presented manuscript reports on a novel protein, LMBD3, embedded in the plasma membrane of Toxoplasma gondii at the site of the apical annuli, which are pores across the inner membrane complex (IMC) skeleton. This provides a novel, putative connection between the cytoplasm and plasma membrane, although this is not directly explored here. Through LMDB3 proximity biotinylation, three SNAREs are identified that were recently reported to be involved in dense granule exocytosis, which is is confirmed here through robust proteomic experiments.

      Authors’ response: This reviewer has made an error here in stating that the parallel study of Fu et al implicated the apical annuli SNAREs with dense granule exocytosis. See our response to Comment 2.1 where we describe why the experimental design used for Fu et al was unlikely to test this question effectively.

      • Place the work in the context of the existing literature (provide references, where appropriate). The annuli were first reported in 2006, and understanding of their proteomic composition has expanded over the years, however, a function has remained long elusive. This report, together with another parallel performed work, now uses three SNAREs, named TgAAQa, TgAAQb and TgAAQc in this report but previously named TgStx1, TgStx20, and TgStx21 (not in this orthologous order), localizing to the annuli as tool to assign the function of the annuli to exocytosis of the dense granules during intracellular parasite multiplication. The evolutionary context and concepts of the new findings are very well-embedded in the existing literature and insights.

      • State what audience might be interested in and influenced by the reported findings. The audience comprises people with a specific interest beyond apicomplexan biology, basically all Alveolates as they all share a similar membrane skeleton. Assigning a putative function to widely conserved LMBD3 will be of high interest to this completely different audience as well.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In the submitted work "Apical annuli are specialised sites of post-invasion secretion of dense granules in Toxoplasma", the authors explore the role of the apical annuli in T. gondii. They identify a number of proteins that localize to the membranes at the annuli, including SNARE proteins that are known players in vesicle fusion. They also shown that knockdown of several annuli localized proteins blocks replication and secretion of dense granule cargo into the parasitophorous vacuole. Overall, the work is well done and an important contribution to the field.

      Major comments

      Comment 3.1

      1. In the title and throughout the manuscript the authors claim that the apical annuli are sites of dense granule secretion (e.g. "firmly implicating the apical annuli as the site of dense granule docking and membrane fusion." or "that the apical annuli are sites of vesicle fusion and exocytosis"). However, there does not appear to be direct evidence of the dense granules docking and fusing at these sites.

      It would be ideal to see vesicles docked via EM at the annuli, either in wildtype or knockdown parasites. This may not be possible - if not, I recommend toning down the conclusions on docking (or "specialized sites of secretion" as this has not been shown) and instead stating that these structures play a critical role in dense granule secretion. Authors’ response: Please see our response to Comments 2.8 & 2.12, and we have toned down this conclusion as requested to make it clear that direct observations of dense granule fusion are yet to be made. Capturing the transient event of dense granule docking by EM would indeed be a very challenging ambition.

      Comment 3.2

      The authors should discuss earlier (in the results) the findings of Fu et al. which:

      Authors’ response: The parallel study of Fu et al (2023) has indeed generated some similar data, but there are also multiple points of difference including their conclusions. We discuss all of these relevant points in the Discussion, and believe that it would make the Results narrative confusing to introduce this element of discussion there. Our study has not been performed in response to theirs, but rather was conducted in parallel.

      • show the localization of some of the same SNAREs at the apical annuli. Fu et al also see localization to the plasma membrane separate from the annuli for some of these proteins. Do you see plasma membrane spots as well upon longer exposures? Can differences be explained by the position or type of tag used?

      Authors’ response: Fu et al have indeed used different reporters and expressed the SNARE fusion proteins with different non-native promoters. They used a very bulky reporter which combined 12 HA tags as well as the large Auxin-Inducible Degron (AID), and together it is possible that they observe some mistargeting artefacts. For our location studies we used the small epitope 3xV5 only. We did not see the additional locations that they report, and this may be due to the larger modification that they made to these proteins.

      • Fu et al also shows similar plaque defects in the knockdowns and loss of trafficking of plasma membrane proteins to the periphery. In general, the studies from this group are very complementary - they should be better acknowledged.

      Authors’ response: We have included more frequent reference and comparison to the Fu et al study now in our Discussion.

      • Fu et al see an invasion defect but no defect in GRA secretion - Do you see an invasion defect? These differences should be discussed

      Authors’ response: See our response to Comments 2.1 & 2.13 regarding why the Fu et al could not detect the GRA secretion defect. We discuss this in our Discussion now (Discussion paragraph four). We also consider the Fu et al study of an invasion defect as flawed. Both our and their study show that depletion of apical annuli SNAREs has a strong replication phenotype of parasites within the host vacuole. Given induced SNARE depletion must occur during this growing stage of the parasites, to ask if apical annuli could be involved directly in invasion processes requires testing for invasion competence of already very sick cells. It is, therefore, not possible to control for secondary effects on invasion incompetence due to general cell malaise. Furthermore, Fu et al report on invasion efficiency using an assay that relies on SAG1 presentation on the cell surface. However, they conclude independently in their study that SAG1 delivery to the surface is inhibited in their SNARE knockdowns. This further confounds any attempt to reliable measure invasion and any role for these SNAREs in this process. Therefore, for biological as well as technical reasons, we have not tested for a possible role of annuli in invasion.

      • It would be helpful for the field to use the same nomenclature whenever possible. Is it possible to use the naming described earlier?

      Authors’ response: Please see our response to Comment 2.4.

      Comment 3.3

      Fig 1C - The authors use trypsin shaving to demonstrate plasma membrane localization of LMBD3. They are probably correct - but it is important to definitively distinguish between plasma membrane and IMC membrane localization. a. The western blot bands for GAP40 should be quantified. It appears that GAP40 is also reduced and it could be reduced to a similar extent as SAG1 without quantification. In addition, this protection from digestion could be confirmed with a second marker in the space between the PM and IMC membranes like GAP45 (whereas cytoplasmic/mito markers like profilin and Tom40 are likely further protected by the IMC membranes and are thus less relevant here).

      Authors’ response: Quantitation of Western blots is notoriously inaccurate and, rather, we use it here as a qualitative indication of trypsin sensitivity of proteins in intact cells. The LMBD3 protein is completely transformed within the first time point (1 hour) to stable products of proteolysis of this polytopic membrane protein — presumably to those now protected within the cell. Known GPI-anchored surface protein SAG1 shows similar immediate sensitivity, although it is known that internalised SAG1 pools are constantly recycled to the surface and hence gradual elimination of the residual SAG1 band over 4 hours. The internal protein markers (GAP40, PRF, TOM40) show no discernible change in the first hour and little if any beyond that (within the variation common to Western blotting). GAP40 shares an equivalent polytopic membrane topology to LMBD3 except it occurs in the IMC membrane directly below the plasma membrane, so we think this is the more suitable control. Thus, this trypsin shaving experiment gives a binary output: sensitive or insensitive. This conclusion is further supported by the published spatial proteomics study (Barylyuk et al. 2020) which shows that LMBD3 segregates with other integral membrane proteins specific to the plasma membrane and not with the IMC proteins. Our super resolution imaging of LMBD3 relative to inner membrane complex markers (Centrin2, GAP45, IMC1) also show it as peripheral to them, further corroborating the plasma membrane location.

      1. Is it possible to N-terminally tag LMBD3 and then examine plasma membrane localization by detection of the tag without permeabilization? (this would also confirm the proposed topology) Authors’ response: We have tried to N-terminally tag LMBD3 with an epitope reporter but this integration was not tolerated by the cell, presumable because it interferes with membrane insertion of this protein that is essential for cell viability. So, this experimental option is not available.

      Comment 3.4

      I think it is important to make clear for the reader what is happening here. The paper sounds as though the dense granules directly dock at the annuli for release. It also seems possible from this work and Fu et al that secretion at the annuli occurs via small vesicles that originate from the dense granules. Perhaps a diagram or model would help the reader here (and discuss why DGs or other vesicles are not routinely seen at the annuli if this is the critical portal - and perhaps why the organelles are not clustered in the apical end of the cell if this is where they are needed)

      Authors’ response: This comment is related to that of review 2 (Comments 2.8/12), although we note again that Fu et al did not conclude that dense granules are exocytosed at this site. It is also unclear why this reviewer envisages that small vesicles arise from the dense granules, rather than the dense granule itself fusing at the annuli to the plasma membrane. Indeed, the occurrence of Rab11A on the dense granules, and the accumulation of this protein at the annuli with SNARE knockdown, supports that it is the dense granules that dock at this site. Why dense granules don’t otherwise cluster at their sites of secretion but are instead motile in the cell, their movement driven by Myosin F on actin filaments, is not known. Perhaps these otherwise bulky organelles would create too much cellular crowding that could interfere with other processes. We have addressed all of these points in additions to the discussion so that these interesting unknowns are transparent to the reader (Discussion paragraph 5).

      Comment 3.5

      Figure 5. The authors state the knockdown results in "strong phenotypes of reduced plaque development" - The plaque assays should be quantified.

      • Are there no plaques or just very small ones here?

      Authors’ response: The reviewer provides no rationale for this request or states what questions could be addressed by doing so. Indeed, none of our conclusions would be affected. We use the plaque assays to test whether each of the proteins tested are independently necessary for some facet of normal parasite growth where the result is binary — no difference in plaque size versus near or complete absence of plaque development. The interpretation of differing plaque sizes between different knockdown mutations is a very inexact science with assumptions of equal rates of protein depletion, sensitivity of relative protein abundance, modes of action of mutation, and kinetics of plaque growth very difficult to validate for meaningful comparisons to be made. Therefore, we don’t see any useful role for plaque quantification in the research questions that we’ve addressed or the conclusions that we present.

      Comment 3.6

      Figure 6 a. Fig 6A - The use of digitonin for semipermeabilization requires controls as there is typically a lot of variability across the monolayer. This is ideally done with something to show that the host plasma membrane has been permeabilized (e.g. host tubulin) and the PVM has not been permeabilized (e.g. SAG1). Otherwise, perhaps the authors could state what percent of cells showed the data like the representative images shown or describe further how selective permeabilization was assessed? (or wider fields with many cells and vacuoles?)

      *Authors’ response: As requested, we have included a supplemental figure showing wider fields of view where multiple vacuoles are seen. These data show that the vacuoles are similarly stained with no evidence of variability of digitonin permeabilization. The reduction in GRA5 secretion shown by microscopy is further supported by this protein being quantified using proteomics as enriched in the parasites when the apical annuli proteins are depleted (Fig 7). *

      Comment 3.7

      1. Fig 6B - "the GRA signal seen within the parasite was increased compared to the control" This is not clear from the AAQb image shown as it appears more is also present in the vacuole (or perhaps residual body?) Can this be clarified? Authors’ response: Yes, in this image it appears that the ‘residual body’, which is also an integral internal compartment of the growing parasite rosette, is a site of dense granule accumulation. We have modified the text to make it clear that the observations of IFA images showing ‘apparent’ increase in dense granule staining were then directly tested by quantitative proteomics. These subsequent data (Fig 7) provided a clear measure of the increase in dense granule proteins in the parasites when apical annuli function was perturbed.

      Minor comments

      Comment 3.8

      1. Line 215-217 The authors state that "Collectively these data imply that the apical annuli provide coordinated gaps in the IMC barrier that forms at the earliest point of IMC development and that they maintain access of the cytosol to these specialised locations in the plasma membrane."
      2. However, their data shows that LMBD3 only recruits once daughters are emerging (not earliest point of IMC development). Please clarify? Is this just referring to Centrin2 or LMBD3 as well? Authors’ response: Yes, the other AAPs indicate that these structures form early, and they were mentioned as such in the sentences preceding this statement — hence ‘collectively’.

      Comment 3.9

      Fig 5. Regarding growth arrest. AAQa appears to show an arrest but is it possible the others just grow slower? Do they arrest later and hence fail to form a plaque? Is there incomplete knockdown which enables a few parasites to persist?

      *Authors’ response: It is true that it is difficult to discern complete growth arrest from *

      *very retarded growth. However, neither alternative would affect our conclusions where we use these phenotypes as an indication of apical annuli participating in process required for normal growth. All plaque assays show strong growth phenotypes. Nevertheless, we have removed the use of the term ‘growth arrest’ with respect to these phenotypes (including in the Abstract) and replaced it with growth impairment. *

      Comment 3.10

      Line 132, Fig 1 A-C. For clarity it may be better for the reader if LMBD3 is named earlier, or if Fig 1 refers to the gene ID for panels A-C before its named.

      Authors’ response: This is a good idea and we have made this change, making note of the rationale for this name when we present the phylogeny.

      Comment 3.11

      Line 30 - "represent a second structure in the IMC specialised for protein secretion" this is confusing - do the authors mean in addition to the micronemes/rhoptries at the apical complex? Maybe "a second structure in the parasite" would be clearer

      Authors’ response: To clarify we have reworded as follows: ‘The apical annuli, therefore, represent a second type of IMC-embedded structure to the apical complex that is specialised for protein secretion

      Comment 3.12

      Line 440 - the author states that "these pre- and post-invasion secretion processes are also biochemically separated because both microneme and rhoptry secretion are SNARE-independent" Is this from the Cova and Dubios papers cited a line later? I took a quick scan of these papers and neither appear to show this? Cova claims still this is still unclear and Dubios says SNAREs are likely involved?

      Authors’ response: While both microneme and rhoptry secretion use distinctive molecular machineries for controlling membrane fusion for exocytosis, it is true that it is not formally known that these processes completely lack SNARE involvement, and neither paper cited here can eliminate this possibility. We have therefore, removed this short part of the discussion where we consider that dense granules might be unique amongst these three compartments in relying on SNAREs.

      Text editing

      Comment 3.13

      1. Line 94 - plasma membrane or cell surface. Clarify here - do you mean plasma membrane or under the membrane at the periphery? Authors’ response: We have modified as: ‘plasma membrane including the cell surface’.

      Comment 3.14

      Line 321 refers to Fig 6A but should say 7A. Panel 7B is never referenced in the text.

      Authors’ response: Thank you, we have corrected this and only sited Fig7 because A and B are both relevant to the statement made in the text.

      Comment 3.15

      Line 347-242 and fig 4A - the discussion of Q-SNARES and diagram could use some references for the reader

      Authors’ response: Thank you for this suggestion, we have acted on this request.

      Comment 3.16

      The methods says plaque assays were 7 days, fig 5 legend says 8 days

      Authors’ response: Thank you, this is corrected as 8 days.

      **Referees cross-commenting**

      • I completely agree with Rev 2
      • I also think examining invasion given Rev1 comment on the micronemes and the data from Fu et al would be worthwhile and straightforward to do

      Authors’ response: Please see our response to Comment 3.2 where the validity of measuring invasion competence of poorly growing, and/or arrested, parasites is scientifically questionable. It would require controls of similarly unhealthy parasites where the apical annuli are unaffected, but it is difficult to imagine how one would deliver such a control.

      Reviewer #3 (Significance (Required)):

      This is an excellent study that assesses the role of apical annuli in parasite secretion. It is an important addition to the field (and outstanding imaging that provides a high level of detail to the study). The study could be improved by better integrating a recent similar study noted by the authors and in the review

      Authors’ response: We have provided more direct discussion of the Fu et al paper in our Discussion section.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their critiques of our manuscript and for recognizing the importance of the questions about 3D genome organisation that it addresses. We plan to address most of their comments in our revised manuscript.

      Reviewer #1

      1. The aneuploid karyotype of the MCF-7 cells used is a concern. GREB1 is present in four copies, with two on abnormal chromosomes which may not be regulated in the same way as primary cells. The authors should include caveats to this effect in the text to account for this.

      We indicated (pg 5) that there are 4 copies of GREB1, 2 of which are on re-arranged chromosomes. RNA FISH (Figure 1C) suggests all 4 of these alleles are induced by estrogen. On each allele, the GREB1 enhancer and promoter remain closely apposed by imaging (Figure 2, DNA FISH) indicating no gross chromosomal rearrangements around the GREB1 locus. This is confirmed by our Hi-C data (Figure 2A), where any genomic rearrangements at the GREB1 locus would be detectable when the sequencing data were aligned to the reference genome. In the revised manuscript we highlight these points in the respective results sections (pgs 5 and 6). Our data suggest that all 4 alleles of GREB1 in MCF7 cells are regulated in the same way.

      2. The authors should also include more information on the generation and verification of the enhancer deletion cell lines. An illustration of the PCR primers used for screening, as well as an illustration of the sequenced product traces aligned with the reference genome (as opposed to just showing the deleted regions) should be included in Fig. S1D. This would give the reader more confidence that the designed knockout has occurred in the same way on all alleles. Furthermore, long-range PCRs and sequencing should be considered to confirm that no larger deletions have occurred (e.g. Owens et al., 2019 PMID: 31127293).

      We have replaced FigS1D with a new Figure Supplement (Figure S1.2A) that incorporates a more comprehensive diagram of the strategy used for the generation and screening of the enhancer deletion cell lines. This also includes the sequencing traces aligned to the reference genome for each of the clones used in this work. Additionally, in the revised manuscript, we will check the deletions using the C-TALE sequencing data obtained from the enhancer-deleted clones.

      1. The changes in the measured E-P interaction frequency following gene activation are __weak __at best and make visual interpretation of the results difficult. Showing the reciprocal virtual 4C plots from the promoter would help to reassure the reader that the observed effect is real.

      We thank the reviewer for this suggestion, and we will now include virtual 4C plots from the GREB1 and NRIP1promoters in our revised manuscript. These will be in figures 2B, 2E, 3C, 4C and in the supplementary figures 2B, 4B, 5C and 6C.

      4. Furthermore, the precise 3C method used is not clear. The authors repeatedly refer to "Capture-C" (a commonly used 3C-based approach using biotinylated oligos to pull down targets of interest) but the citation used (Golov et al. 2019) refers to a conceptually similar method called "C-TALE". This should be clarified in the text.

      We thank the reviewer for pointing out this potential confusion. We replace the term Capture-C with C-TALE throughout the revised manuscript.

      5. As for the changes in contact frequency, the observed changes in distance measurements between conditions are very small (although statistically significant). We acknowledge that this is likely due to the relatively small linear distances between enhancers and promoters in this study. However, it would be helpful to see the effects of the induction/treatments on a one or more control loci which is not affected by oestrogen signalling given that global changes in nuclear shape/volume and/or cell cycle effects could occur within this time (e.g. effects of tamoxifen treatment on MCF-7 cell cycle distribution, (Osborne et al. 1983 PMID: 6861130), which could impact nuclear volume.

      Data from DNA FISH control probes are already included in Supplementary Figure S3 showing no change in intra-nuclear distances and thus no general effects on chromatin compaction due to nuclear volume or cell cycle. Virtual 4C data for the entire captured regions around GREB1and NRIP1 are show in Fig S2C, also showing no general effect on the wider capture windows. We will include similar data from the viewpoint of the gene promoters in the revised manuscript. Hi-C and imaging data from the enhancer deletion cell lines (Fig S4) also supports that we are looking at an ER-specific effect, not a global one. With the regard to the comment on the effects of tamoxifen treatment on MCF-7 cell cycle distribution, we see no effects of tamoxifen on 3D genome organisation at GREB1 and NRIP1 by Hi-C or by imaging.

      6. The authors discuss previous studies demonstrating that E2 and 4OH recruit different sets of proteins to their target genes. Given that this is central to the conclusion that the ER ligand (and its recruited co-factors) determines the E-P interaction frequency and 3D distances observed, it would be important to demonstrate this at the GREB1/NRIP1 loci specifically. ChIP data of the co-activators/repressors recruited by E2 and 4OH, respectively, would greatly strengthen this claim.

      We acknowledge that investigating co-activator and co-repressor recruitment to the studied loci will strengthen our interpretation our conclusions. In the revision we will perform and include ChIP-qPCR at NRIP1, GREB1 and control loci assaying for PolII, co-activators such as p300, mediator and SRC-3 and the co-repressors N-CoR in control, estradiol and tamoxifen treated cells. We will also perform ChIP-qPCR of PolII and co-activators in cell treated with flavopiridol and triptolide.

      1. The observed uncoupling of E-P contact frequency and 3D distance upon transcriptional inhibition is interesting and offers clues to the molecular details underlying E-P interactions. However, the use of flavopiridol and triptolide, while common in the field, should be carefully qualified given the potential for their indirect effects on transcription. This is particularly important for flavopiridol given its ability to target multiple cyclin-dependent kinases beyond CDK9 and its role in transcription initiation.

      In the revised manuscript we indicate that “Flavopiridol inhibits several CDKs, including CDK9/PTEF-b”

      Minor comments:

      i. In the introduction and beginning of discussion, it would be helpful to detail previous studies where FISH-based analyses have shown more proximal E-P positioning upon activation, to make it clear that differences in E-P proximity appear to be gene-specific. Some examples include Williamson et al. (2016; PMID 27402708) and Chen et al. (2018; PMID 30038397). Speculation as to why some genes behave in this way while others do not, would also be worthwhile.

      We have followed the reviewer’s suggestion and noted these two studies in the Introduction of a revised manuscript. Given that the focus of this current manuscript is to explore discrepancies between Hi-C and DNA FISH, we do not think that this is the right forum for a wider discussion of why there might be differences in E-P proximity between different biological systems.

      ii. On page 6, the authors state that after deletion of the NRIP1 enhancer there is "almost total loss of NRIP1 induction in response to E2". This does not seem to match the data where in 3 out of 4 replicates (2 for each clone) there is a statistically significant increase in number of RNA FISH foci upon E2 stimulation in the NRIP1 enhancer KOs. This suggests that, as for GREB1, the regulation of these genes is not solely controlled by the deleted enhancers. This should be clarified in the text.

      The reviewer is referring to the data on NRIP1 expression in two NRIP1 enhancer deletion clones in Fig 1D and the replicate data in Supplementary Fig S1 (upper-right panel). These data show almost no induction of NRIP12 by E2 compared to wild-type cells. We stand by our statement.

      iii. The labelling of the FISH probes in Supp. Fig. S2 could be improved as it is currently very difficult to read these.

      We will try to improve this in a revised Figure S2.

      iv Given that the authors have referenced a distance of 200 nm as potentially being an important threshold for gene activation, it would be useful to include the fraction of alleles which are below this distance alongside the cumulative frequency plots in Figure 2D and elsewhere in the paper as the cumulative frequency plots can be hard to read in some cases (e.g. Supp. Fig. S3B e-p). This would also allow the authors to show consistency across replicates.

      We thank the reviewer for this suggestion to make the data easier to interpret. In a revised manuscript, we will incorporate the fraction of alleles below and above 200 nm for the DNA-FISH experiments in Figure 2D and Figure S4A-B.

      v. For clarity, it would be helpful to include the difference map between the vehicle-treated unstimulated/stimulated conditions for the 3C plots in Fig. 4. This would help contextualise the resulting differences observed with the drug treatments. Same for Supp. Fig. S6.

      We will include the difference heatmap between the vehicle- and estradiol treated samples for vehicle, flavopiridol and triptolide treated samples.

      vi. Statistical comparisons are not shown for all 3D FISH-based distance measurements (e.g. Supp. Figs. S3A, S4C, D, S6E). If this is because the tests were done and the results were non-significant this should be indicated.

      We had omitted all non-significant p values (>0.05) from the graphs to stop them getting too cluttered. All p values are documented in the supplementary tables. However, following the reviewer’s comment, we will indicate all non-significant statistical comparisons on the graphs.

      vii. On page 13, the authors state that increased E-P separation occurs "before nascent transcription of the gene is detected by either TT-seq or RNA FISH". This does not appear to be correct given that baseline levels of transcription are observed in the absence of ER stimulation by both methods (Fig. 1). This should be clarified in the text.

      We have amended this statement to now indicate that “This is before an induction of nascent transcription of the gene….”

      Reviewer #2

      1. The authors make strong claims and although these are generally reasonably well supported by the data, it is important to acknowledge that they are based on two loci. This manuscript would be stronger if the authors could include additional loci in their study design. If this is not possible, it would be good to acknowledge that the conclusions are preliminary/speculative at this stage.

      The reviewer makes a fair point, and we emphasized throughout the text – including at the end of the Discussion - that we are examining just two gene loci. In a revised manuscript we will include DNA-FISH data for a third locus comprising the CCND1 gene, for which we have preliminary data.

      *2. It would be helpful if the authors could clarify the strategy they used for their FISH probe design. The enhancer and promoter fosmid probes (which are used for the majority of the experiments) are not centered on the active elements and do not even seem to overlap in the case of the GREB1 enhancer fosmid probe. The 10 kb enhancer probe seems better placed for the GREB1 locus, but the 10 kb enhancer probe does not seem to overlap with the enhancer in the NRIP1 locus. It is conceivable that the exact location of the probes has a big impact on the measurements and it would therefore be helpful if the authors could comment on the location of the probes and add additional probes if required to strengthen their conclusions. In addition, the fosmid probes are very large (40 kb). Although the authors acknowledge this, it would be helpful if they could comment on how overlap between 40 kb probes should be interpreted in relation to a potential rather focal contact between (proteins bound to) regions of In the case of GREB1, the fosmid probes were chosen to maximize the distance between them as the promoter and the enhancer of the gene are genomically relatively close to each other. This was not an issue in the case of the NRIP1 locus where fosmid probes could be placed centered on the TSS and the enhancer region. In the case of the 10 kb probes, these were designed to be centered on the regions where higher E2-induced C-TALE contact frequencies were detected. Virtual 4C plots using the TSS regions as viewpoints (incorporated into the revised manuscript) clearly show that, in the case of NRIP1, the contact frequency peak does not fall on the main ER peak.

      1. It is not clear to me why the authors would choose to work with a locus that is present in 4 copies in their cell line. Is the entire regulatory region (incl. enhancers) preserved for the two additional copies of the gene? Can the authors comment on how this may impact on their measurements?

      See response to Reviewer 1, point 1. Our Hi-C data would have revealed if there were genomic rearrangements in the 600kb window surrounding GREB1.

      4. Figure 2D shows an increase in E-P separation for the NRIP1 locus across all timepoints, with cumulative frequency plots shown for the 10 min timepoint. However, the data for the second replicate shown in Figure S2D are a lot less robust and not significant for the 10 min timepoint. It is important that the authors either provide additional data to support the robustness of this experiment or acknowledge that the results are not fully reproducible.

      We acknowledge this, but we would like to note that there is an increase in the median distance for all time points, although this difference is not significant in some of the timepoints. Additionally, DNA-FISH data obtained using the 10 kb probes confirm these observations.

      5. The data presented in Figure 2F for clone 2 of the GREB1 enhancer deletion still show increased E-P distance upon activation. How do the authors explain this?

      This increase in distance is not statistically significant (p-0.33 – see Table S2) and is not seen for the replicate data in Fig. S4.

      Minor comments:

      i. Could the authors comment on the observation that the NRIP1 promoter is not bound by ERa or p300 upon estrogen activation? Are there ATAC-seq or H3K27ac ChIP-seq data available for these conditions?

      We included ATAC-seq tracks in Figure 1A where a peak on the NRIP1 promoter is clearly seen.

      ii. It is not obvious which timepoint is shown in Figure 1D.

      Pre-mRNA FISH in enhancer deleted clones was done in cells treated with vehicle or E2 for 60 minutes. This will be made clearer in the figure legend.

      iii. Why did the authors choose e-i and p-i instead of e-c and p-c in Supplementary Figure 3B?

      We apologize as it was an oversight not to include the e-c data for this experiment. This is now included in Supplementary figure S4B.

      iv. "We treated hormone starved MCF-7 cells with flavopiridol or triptolide for 5 min before adding E2 for 30 min (Fig. 4A)." Does this mean that the FLV/TRP treatment lasted for 35 min or did the authors wash it out before adding E2? Please clarify.

      This observation is correct, and it was made clear in Figure 4A and in the figure legend.

      v. The authors refer to their Capture-C data as "high-resolution". However, the methods section mentions that the data for the GREB1 and NRIP1 locus are 5 kb and 10 kb resolution, respectively. This is not particularly high for a targeted approach, certainly not in light of the MNase-based approaches that have recently been developed. I therefore think that the "high-resolution" claims should be removed from the paper.

      In line with the reviewer’s suggestion, we have removed the term high-resolution when referring from our own data.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Gómez Acuña and colleagues have investigated changes in enhancer-promoter (E-P) interactions with both 3C and DNA FISH. As a model system, they have used the activation of estrogen receptor-dependent enhancers, which allows for examination of changes in E-P interactions at relatively high temporal resolution. Surprisingly, they find that gene activation is associated with increased E-P interactions as measured by 3C but reduced spatial proximity as measured by DNA FISH. The authors show that both these measurements are dependent on the presence of the enhancer. In contrast, blocking transcription with inhibitors does not have a strong effect on the 3C measurements, but abolishes the increased spatial E-P separation as measured by DNA FISH following estrogen induction.

      Overall, this is an interesting and thought-provoking study. However, the strong conclusions are not fully supported by the data, as explained in further detail below.

      Major comments:

      • The authors make strong claims and although these are generally reasonably well supported by the data, it is important to acknowledge that they are based on two loci. This manuscript would be stronger if the authors could include additional loci in their study design. If this is not possible, it would be good to acknowledge that the conclusions are preliminary/speculative at this stage.
      • It would be helpful if the authors could clarify the strategy they used for their FISH probe design. The enhancer and promoter fosmid probes (which are used for the majority of the experiments) are not centered on the active elements and do not even seem to overlap in the case of the GREB1 enhancer fosmid probe. The 10 kb enhancer probe seems better placed for the GREB1 locus, but the 10 kb enhancer probe does not seem to overlap with the enhancer in the NRIP1 locus. It is conceivable that the exact location of the probes has a big impact on the measurements and it would therefore be helpful if the authors could comment on the location of the probes and add additional probes if required to strengthen their conclusions. In addition, the fosmid probes are very large (40 kb). Although the authors acknowledge this, it would be helpful if they could comment on how overlap between 40 kb probes should be interpreted in relation to a potential rather focal contact between (proteins bound to) regions of <1 kb.
      • It is not clear to me why the authors would choose to work with a locus that is present in 4 copies in their cell line. Is the entire regulatory region (incl. enhancers) preserved for the two additional copies of the gene? Can the authors comment on how this may impact on their measurements?
      • Figure 2D shows an increase in E-P separation for the NRIP1 locus across all timepoints, with cumulative frequency plots shown for the 10 min timepoint. However, the data for the second replicate shown in Figure S2D are a lot less robust and not significant for the 10 min timepoint. It is important that the authors either provide additional data to support the robustness of this experiment or acknowledge that the results are not fully reproducible.
      • The data presented in Figure 2F for clone 2 of the GREB1 enhancer deletion still show increased E-P distance upon activation. How do the authors explain this?

      Minor comments:

      • Could the authors comment on the observation that the NRIP1 promoter is not bound by ERa or p300 upon estrogen activation? Are there ATAC-seq or H3K27ac ChIP-seq data available for these conditions?
      • It is not obvious which timepoint is shown in Figure 1D.
      • Why did the authors choose e-i and p-i instead of e-c and p-c in Supplementary Figure 3B?
      • "We treated hormone starved MCF-7 cells with flavopiridol or triptolide for 5 min before adding E2 for 30 min (Fig. 4A)." Does this mean that the FLV/TRP treatment lasted for 35 min or did the authors wash it out before adding E2? Please clarify.
      • The authors refer to their Capture-C data as "high-resolution". However, the methods section mentions that the data for the GREB1 and NRIP1 locus are 5 kb and 10 kb resolution, respectively. This is not particularly high for a targeted approach, certainly not in light of the MNase-based approaches that have recently been developed. I therefore think that the "high-resolution" claims should be removed from the paper.

      Referees cross-commenting I agree with the comments raised by Reviewer 1

      Significance

      Since 3C and DNA FISH are widely used, the discrepancy between these measurements that is described here is of potential broad interest to the field. Since these claims are rather strong and have potential far-reaching implications, it would be helpful if the authors could strengthen their conclusions further, by improving the robustness of the data and including additional loci and additional probes to show that the measurements are not specific for these two loci or dependent on the location of the probes. I think that the paper is in principle also publishable without these additional experiments, but in that case, it would be very important to explicitly acknowledge the limitations of the data throughout the manuscript and clarify that the conclusions are preliminary/speculative at this stage.

      Expertise: 3D genome organization.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We would like to thank the 3 reviewers for their comments and suggestions for our manuscript. We believe that the revisions we plan to make, based on the comments by the reviewers, will greatly enhance the quality of our manuscript.

      We would like to respond to some reviewer comments here, since they do not fit into any of the subsequent sections.

      Reviewer #3

      In the Results section that describes the delay in gata2b expression (page 4 and Supp. Fig. 4), the authors show that the mutant embryos start expressing more gata2b at 30 - 36hpf after the decreased expression at earlier time points, with no difference at 48hpf. What could explain that recovery?

      We thank reviewer 3 for this question. The partial functionality of the Cx41.8 channel in cx41.8tq/tq mutants may explain why the HSPC program is eventually induced (leading to sufficient mitochondrial ROS production for Hif1/2α stabilisation). However, this could also result from functional redundancy between Cx41.8 and other connexins such as Cx43 or Cx45.6 in the mitochondria, since they are also expressed in zebrafish arterial ECs at 24hpf (Gurung et al, Sci Rep, 2022) and cx43 knockdown has previously been shown to result in an HSPC specification defect in zebrafish (Jiang et al, Fish Physiol Biochem, 2010). Together, these aspects may explain the recovery, although delayed, of gata2b expression in the cx41.8tq/tq mutant, as discussed in detail in our manuscript.

      The authors showed that gata2b expression can be rescued by ROS induction in the dose-dependent manner (page 6 and Fig.3 and Supp. Fig. 6). Is this what rescues gata2b expression at 30hpf in the cx41.8 mutants?

      This is exactly right, we hypothesize that in cx41.8tq/tq mutants, it takes longer for mitochondrial ROS production to reach above the threshold required to stabilise Hif1/2α and hence induce gata2b expression, which is supported by the data referred to by this reviewer.

      Are any vascular defects in the mutant embryos?

      Our lab previously reported that cx41.8tq/tq embryos have faster ISV growth rate (Denis et al, Front Physiol, 2019). However, we found no evidence of a link between the ISV growth rate increase and the HSPC specification defect in these embryos. Importantly, we show that aorta specification is normal in cx41.8tq/tq mutants, as determined by dll4 expression at 24 (Supp. Fig. 1C) and 28 hpf (Supp. Fig. 1D).

      2. Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary

      The manuscript by Petzold et al. explores the functions of connexin 41.8 (cx41.8) (mammalian homologue Connexin 40) in hematopoietic stem cell (HSC) formation in the zebrafish dorsal aorta. The authors use a cx41.8 allele that appears to be hypomorphic, as the phenotype is milder than a previous cx41.8 allele that the same group published (Cacialli et al., 2021). cx41.8tq/tq mutants exhibit delayed onset of hemogenic endothelial specification, as marked by gata2b at 24 hpf, but HSPC development proceeds normally from 48 hpf onwards. A new reporter line for cx41.8, Tg(cx41.8:GFP), was generated and is expressed in the floor of the dorsal aorta, consistent with the location of hemogenic endothelial cells. Lower ROS production in the whole cell and in the mitochondria was reported in the cx41.8tq/tq mutants, and treatment with ROS enhancers, H2O2 and menadione, appeared to rescue the mutant phenotype of reduced HSPCs at 28 hpf. Finally, the authors tested a link between cx41.8 and Hif1α by pharmaceutically (DMOG/CoCl2) or genetically (vhl morpholino) inhibiting Hif inhibitors, and observed a rescue of HSPC formation in cx41.8 mutants.

      I think it would be important for the authors to address the mechanisms of why cx41.8tq/tq and the other cx41.8-/- (leot1/t1) mutant phenotypes are different, with the latter allele showing more severe phenotypes of increased HSPC apoptosis and reduced HSPCs during later development. The authors speculate the cx41.8tq/tq allele encodes a missense mutation in one of the channel domains, and as such, might be a hypomorph. The authors cited the original paper by Watanabe et al. (2006); however, this paper actually noted that the cx41.8tq/tq allele is likely to be a dominant negative - and as such, should have exhibited a stronger phenotype than the leot1/t1 mutant allele. From the paper: "leotw28 and leotq270 heterozygotes have phenotypes different from that of WT; thus, they represent dominant-negative alleles." Importantly, no data are shown to provide evidence that the allele is a hypomorph - at minimum, qPCR data should be provided to show whether there is NMD of the mRNA in cx41.8tq/tq mutants.

      We would like to thank the reviewer for this comment and suggestion. As the reviewer has rightly pointed out, the cx41.8tq/tq mutation is thought to result in a protein with dominant-negative function (Watanabe et al, EMBO Rep, 2006; Watanabe et al, J Biol Chem, 2016).

      In fact, we agree that the mutant cx41.8tq/tq protein acts as a dominant-negative and although the reviewer is right to point out that the cx41.8t1/t1 mutant may thus exhibit a stronger phenotype which we found not to be the case (runx1 expression was found to be normal in the cx41.8t1/t1 mutant, Cacialli et al, Nature Commun, 2021), we provided our explanation for this in the discussion of the manuscript:

      “The partial functionality of the Cx41.8 channel in cx41.8tq/tq mutants [14] may explain why the HSPC program is eventually induced. However, this could also result from functional redundancy between Cx41.8 and other connexins such as Cx43 or Cx45.6 in the mitochondria, since they are also expressed in zebrafish arterial ECs at 24hpf [18] and cx43 knockdown has previously been shown to result in an HSPC specification defect in zebrafish [36]. This potential functional redundancy may also provide an explanation as to why HSPCs are specified normally, without any delay, in cx41.8t1/t1 embryos [12]. In these null mutants, cx41.8 expression is completely absent but may be functionally compensated by other connexins, whereas in cx41.8tq/tq mutants, although cx41.8 is expressed, its channel function is reduced [14]. Moreover, as Cx41.8 may form heterotypic channels with Cx43 and/or Cx45.6 (and potentially also with others), the function of these chimeric channels would also be altered”

      We believe this addresses the reviewers concern regarding this, especially given the fact that Cx43 and Cx45.6 have been found to be expressed in arterial ECs at 24 hpf, as cited in the manuscript. With regards to the reviewer’s question about whether there is NMD of the cx41.8 transcript, given that the cx41.8tq/tq mutation is missense and does not result in a premature stop codon (usually required for NMD to be induced, Kurosaki et al, J Cell Sci, 2016), we do not believe that there is NMD of the cx41.8 transcript in cx41.8tq/tq mutants. We will however verify this by carrying out the experiment suggested by this reviewer, qPCR analysis of cx41.8 expression in cx41.8tq/tq embryos and wild-type controls.

      The quantification data in this manuscript are not satisfactory. The authors only provide graphs that show embryos with "low", "medium" and "high" numbers of HSPCs, which is incredibly subjective. Considering that the authors already have the cx41.8tq/tq in the Tg(myb:GFP) background (Figure 1E), they could have quantified the precise numbers of Tg(myb:GFP)-positive cells at different timepoints and with the different pharmaceutical rescue experiments. Ideally, this should be combined with other HSPC markers such as Tg(cd41:GFP) or Tg(runx1:GFP) - although this could be limited by the authors' access to the lines or time it takes to cross the mutants to the transgenes.

      We thank reviewer 1 for their concern regarding this. Indeed the reviewer is correct, it would take us too long (at least 6 months) to generate the cx41.8tq/tq cd41:GFP or cx41.8tq/tq runx1:GFP lines, however, as stated, we do already have the cx41.8tq/tq cmyb:GFP zebrafish line. That said, repeating the pharmacological experiments using the cx41.8tq/tq cmyb:GFP zebrafish line would demand months of work and we do not currently have the personnel to perform all of this. However, we will perform the same experiment as performed previously to generate figure 1E but also at earlier timepoints. The cmyb:EGFP transgene marks nascent HSPCs from 28 hpf, and so we will aim to image, and quantify differences in budding HSPCs in cx41.8tq/tq cmyb:EGFP and cmyb:EGFP controls between 28 hpf and 36 hpf. We agree with the reviewer that this will add depth to our study and will provide evidence to back up our conclusions.

      The link between cx41.8 and Hif1α is tenuous. The authors should perform in situ hybridization for the hif1 genes and their downstream effector notch1 which is known to be important for the HSPC specification (Gerri et al., 2018).

      We thank the reviewer for this point. we do not expect hif1/2α expression to be affected in this mutant. Mitochondrial ROS has been shown to stabilise Hif1/2α at the protein level, not the mRNA level. Our data, and that of others (Harris et al, Blood, 2013), suggest that in the absence of mitochondrial ROS, prolyl hydroxylases are not inhibited by mitochondrial ROS, and they target Hif1/2α for ubiquitination and subsequent destruction in a Vhl-dependent manner (as shown in Fig. 4D). We have changed the text in the manuscript to clarify that Hif is stabilised on the protein level (please see the section below).

      Since we do however expect notch1a and notch1b expression to be altered in our mutant embryos, as they are transcriptionally regulated by Hif1/2α (Gerri, Blood, 2018), we will perform in situ hybridisation and qPCR analysis of these 2 genes at 18-24 hpf in cx41.8tq/tq mutants and controls to clarify this point and solidify our model.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary

      Petzold et al are here addressing the potential function of the connexin Cx48.1, a protein involved in the structure of gap junctions, in the specification of future hematopoietic stem cells and progenitors (HSPCs). This piece of work is complementing their previous results showing the function of this connexin isoform in HSPC expansion in the transient hematopoietic niche in the caudal tissue of the zebrafish embryo. They explore phenotypes triggered by the expression of a mutant form bearing a single amino-acid substitution in the fourth transmembrane domain of the protein. Using whole mount in situ hybridization (WISH) of the two transcription factors Gata2b and Runx1, a novel transgenic fish line that expresses eGFP under the control of the Cx48.1 promoter region, and a series of drug treatments interfering with, or promoting, the formation of reactive oxygen species (ROS) production and oxidative stress, they propose that Cx48.1 is also involved upstream of HSPC amplification, rather in their specification at the level of the hemogenic endothelium constituting the ventral floor of the dorsal aorta. Mechanistically, they hypothesize that this function relies on mitochondria-derived ROS that would destabilize the VHL protein involved in mediating the degradation of Hif1/2a transcription factors, thereby stabilizing the Hif1/2a-Notch1a/b signaling axis involved in specification of the hemogenic endothelium.

      The WISH and quantitative analyses.

      Most of the quantitative analyses in the work are based on chromogenic WISH, which is not sufficiently accurate because leading to highly variable results, in addition to its lack of linearity. WISH is also subjected to important variations, particularly for transcription factors that are expressed at low levels such as Runx1, and to some extent Gata2b also. One obvious example in the paper is the inconsistency of signals that are observed Fig1C (Gata2b, left, wt, 24hpf) and FigS3B (Gata2b, left, wt, 24hpf) in which the signal is barely visible and is comparable to the signal for the cx41.8tq/tq mutant Fig1C, right.

      In addition, in the timings that are analyzed in FigS3 (Gata2b, 26 and 28hpf) to argue on temporal delay of expression in the cx41.8tq/tq mutant, the Gata2b signal is masked by the strong increase in tissues other than the hemogenic endothelium in the dorsal aorta (including signal in the somites as well as, possibly, increase in background). In this very example, it is legitimate to question the accuracy of the quantification methodology when the signal in the tissue of interest is drowned in the overall signal from surrounding tissues; how can the authors explain the 100% of embryos that have a 'Low' signal in the region of interest (FigS3C, cx41.8tq/tq mutant in comparison to WT)? This point is also valid for the data quantified FigS4 in which the fitting between WISH data and the quantifications appears to be questionable (for all timing points: 30, 32, 36, 48hpf and comparing mutant with the WT.

      My suggestion would be to complement the WISH data and improve the quantitative analyses using another, more accurate approach such as qRT-PCR for example (on dissected trunk regions and, if necessary because of expression in other surrounding tissues (in the case of Gata2b at later time points), after FACS-sorting using a fish line expressing a fluorescent reporter driven by a vascular promoter, ex: the kdrl:mCherry line used in the work). This is particularly important for the expression of the two transcription factors Runx1 and the more upstream Gata2b, the latter being involved in HSPC specification which is taken as a reference. qRT-PCR experiments should be feasible relatively easily and in a reasonable time frame as the technics is not very time consuming and easily accessible.

      We thank reviewer 2 for their concerns regarding the in situ quantifications used during this study. Although the approach we have used is widely used in the field to quantify gene expression differences, we appreciate that our data could be strengthened by complementing it with another approach. As such we will do the following:

      • We will complement our in situ hybridisation characterisation of delayed hemogenic endothelium formation and HSPC specification with qPCR experiments. For this, we will dissect the trunks of 8tq/tq embryos and controls and perform qPCR analysis of gata2b expression at the timepoints analysed during development (Supp. Fig. 3 A-D and Supp. Fig. 4 A-D), whilst also using the same approach to compliment the data for gata2b and runx1 expression at 24 hpf (Figure 1C and D). We agree with the reviewer that this is a feasible approach and would add robustness to the data we already show.

        2- Fluorescence imaging and associated interpretation/conclusions.

      The fluorescence images (Fig1E; Fig2B,D; Fig3A) are very difficult to analyze; they lack resolution because they appear to be epifluorescence images and not confocal images. When the signal is low, which is in particular the case for the novel Cx41.8:EGFP fish line, Fig2B (which is confirmed with the FACS GFP signal in comparison to the mCherry of the kdrl:mCherry fish line), it is not possible to provide convincing images on the vascular/aortic expression because of the high background of diffusion (the authors state 'likely to be the aortic floor', indeed it is not possible to validate the fact that the expression is truly in potential hemogenic cells). The double positive population in the FACS (Fig2C, right) does not resolve the issue because if indeed cx41.8 is expressed in endothelial cells (as expected from previous studies), the double positive population could equally be endothelial cells from inter-somitic vessels, for example (not to mention the underlying vein which is very close to the aorta in the trunk)). Fig2D, images are too small and, again, the resolution is not good enough to say that double positive cells are on the aortic floor. It is recommended to convince the reader that the authors try to confirm their statements by using confocal microscopy and increase the magnification of the relevant regions of interest.

      We thank this reviewer for this point. We will address this concern by using, as they suggest, confocal microscopy to try to get higher resolution images. In particular, we will do the following:

      • We will use confocal microscopy to image the 8:EGFP line as was done previously (Fig 2B), in order to obtain higher resolution images of expression of cx41.8 in the floor of the aorta.
      • We will also use confocal microscopy to image the 8:EGFP;kdrl:mCherry line as was done in Fig 2D, in order to gain higher resolution images.
      • We will also increase the magnification of the relevant regions of our confocal microscopy images as suggested by this reviewer.

        There is an inconsistency in the data between Fig1E (40hpf, in vivo imaging using the cmyb:GFP fish line) and FigS2 (48hpf, WISH cmyb); how can we observe 'HSPCs budding from the dorsal aorta' (see legend Fig1, arrowheads) which seems very much decreased in the imaging experiment for the cx41.8tq/tq mutant in comparison to WT, and have no effect on the cmyb signals FigS2B? What are the GFP+ cells that are aligned along the elongated yolk Fig1E and that appeared to be decreased in number in the mutant?

      We agree that this disparity is confusing for the reader. We believe the disparity between these results is due firstly to the fact that the experiment in Supp. Fig 2C was performed 8 hours after that in Fig 1E and secondly due to the time it takes for GFP to fold (in the case of Fig 1E). It is also important to keep in mind that the phenotype is not a complete absence of HSPC budding, but only a delay in the onset of EHT.

      • We will however address this concern by carrying out the experiment described above - we will perform the same experiment as performed previously to generate figure 1E but also at earlier timepoints. The cmyb:EGFP transgene marks nascent HSPCs from 28 hpf, and so we will aim to image, and quantify differences in budding HSPCs in 8tq/tq cmyb:EGFP and cmyb:EGFP controls at numerous timepoints from 28 hpf to 36 hpf. This will add depth to our study by providing evidence to back up our conclusions.
      • We will remove the 40-hpf timepoint (Fig 1E) to avoid confusion regarding the disparity with cmyb expression by WISH in Supp. Fig 2C.
      • Regarding the GFP+ cells aligned along the yolk in 1E, we thank the reviewer for pointing this out. These cells are multiciliated cells, from the kidney tubules (Wang et al, Development 2013). We will determine whether their numbers do indeed differ between 8tq/tq;cmyb:EGFP and cmyb:EGFP controls in our new confocal experiments and will mention this in the manuscript if they do.

        It would be important to investigate/show, at least with qualitative WISH experiments all along the time-window of HSPC specification as stated by the authors (26-54hpf, see main text third paragraph of Results), that Cx41.8 is detected in arterial endothelial cells (and perhaps enriched in the hemogenic endothelium?), in complement to the work they are referring to on transcriptomic data at 24hpf (Ref18 Gurung et al Sci Rep 2022). Ideally, these WISH data should be resolutive enough to provide clear localization in aortic cells versus cells in the aortic floor to bring significant added value to the work that lacks spatial resolution (ex: fluorescent WISH using confocal microscopy, allowing to superpose signal with cell types (either by double fluorescent WISH (vascular marker + Cx41.8) or superposing fluorescence signals with transmitted light)).

      We agree with this reviewer regarding this point. The way we will address this is to use confocal microscopy at different timepoints from 24-40 hpf using the cx41.8:EGFP; kdrl:mCherry line to show that expression of cx41.8 is indeed present and enriched in the floor of the dorsal aorta during the timeframe of HSPC specification. We believe that imaging this line using confocal microscopy will be sufficient to clearly show this.

      It would be more informative and secure, Fig2D, to show images of the double transgenics (Cx48.1:eGFP;kdrl:mCherry) at 28-30 hpf (rather than 48 hpf) which is more narrowed down to the specification of the hemogenic endothelium thus preventing any risk to visualize the fluorescence signals coming from recently born HSPCs rather than signals from cells embedded in the aortic floor.

      We thank the reviewer for this suggestion, which we believe would indeed improve the manuscript. As discussed above, we will indeed use confocal microscopy at different timepoints, including 28-30 hpf using the cx41.8:EGFP;kdrl:mCherry line to show that expression of cx41.8 is indeed present and enriched in the floor of the dorsal aorta during the timeframe of HSPC specification. We believe that imaging this line using confocal microscopy will be sufficient to clearly show this and so thank the reviewer for this excellent suggestion.

      To make the data more convincing on the ROS production in the ventral side of the cord in wild type embryos (which suggests that future hemogenic cells are already ventralized at that stage), it would be important to obtain confocal images of the region of interest and perform reconstitution of Z-stacks with a sagittal view (rather than longitudinal). It would be nice also to obtain comparable images later on, after lumenization and before initiation of HSPC emergence (before 28hpf).

      We thank the reviewer for this suggestion and agree that the suggested approach will solidify our data. As such, we will carry out the proposed experiment, using confocal imaging to gain longitudinal and sagittal images of mitoSOX staining in WT embryos and cx41.8tq/tq mutants at both 16 hpf and 26 hpf.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1

      Related to the above point, the authors should test whether the gap junction function of Cx41.8 is intact in the cx41.8tq/tq mutants by assessing calcium waves in the GCamp transgenic line.

      …we have also now found additional published in vivo evidence that Cx41.8 channel function is reduced in the cx41.8tq/tq mutant, which is now also cited in the new version of the manuscript (please see our full response to this point below).

      Please see the section “Description of analyses that authors prefer not to carry out” for additional information regarding the GCamp experiment suggestion.

      The link between cx41.8 and Hif1α is tenuous. The authors should perform in situ hybridization for the hif1 genes…

      We thank reviewer 1 for making this point. To clarify this, we do not expect hif1/2α expression to be affected in this mutant. Mitochondrial ROS has been shown to stabilise Hif1/2α at the protein level, not the mRNA level. Our data, and that of others (Harris et al, Blood, 2013), suggest that in the absence of mitochondrial ROS, prolyl hydroxylases are not inhibited by mitochondrial ROS, and they target Hif1/2α for ubiquitination and subsequent destruction in a Vhl dependent manner (as shown in Fig. 4D).

      To clarify this in the manuscript, we have adjusted the text in three places (including in the abstract) to clarify that Hif1/2α is stabilised at the protein level, as shown below. We believe these changes have made this important point more understandable for the reader:

      1. “… Mitochondrial-derived reactive oxygen species (ROS) have been shown to stabilise the hypoxia-inducible factor 1/2a (Hif1/2a) proteins, allowing them..”
      2. “Recent research has demonstrated that hypoxia and mitochondrial ROS are required for the stabilisation of the transcription factors Hif1/2a at the protein level”
      3. “… as mitochondrial ROS generation may eventually reach the threshold required to sufficiently stabilise the Hif1/2a proteins for downstream”

        Reviewer #2

      Importantly, it appears also that all over the WISH quantifications, the reader cannot appreciate the accuracy of the categories High/Medium/Low, which is not at all developed in the Methods section (paragraph Image processing and WISH phenotypic analyses).

      We have developed the Methods section (paragraph Image processing and WISH phenotypic analyses), which was highlighted as a concern by this reviewer, in order to detail exactly how we performed our image analysis and statistical analyses using this approach. We believe this will satisfy the concerns reviewer 2 has regarding this and appreciate that they have a point that this was indeed underdeveloped in the original submission.

      Finally, there is a confusion in the quantification regarding the number of HSPCs (see the beginning of the second paragraph of Results 'The HSPC specification defect in cx41.8tq/tq mutants is due to a delay in Gata2b expression') and the % of embryos falling into the 3 categories High/Medium/Low FigS2, cmyb 48hpf. The authors use this argument (based on the WISH cmyb signals) to infer that the deficit in the cx41.8tq/tq mutant is not due to controlling HSPC number (no difference in cmyb between WT and mutant) but rather upstream, at the level of the hemogenic endothelium, which is not a thorough argument at that point.

      We thank reviewer 2 for pointing this out to us and agree that the wording we used is a little confusing. We have therefore added to the first sentence of the second paragraph in the results section “'The HSPC specification defect in cx41.8tq/tq mutants is due to a delay in gata2b expression” which now reads:

      “Hence, since HSPC specification is initially reduced, but then recovers in cx41.8tq/tq embryos, we suspected a delay in the formation of the haemogenic endothelium in these mutants. To test this hypothesis…”

      We believe this change to the manuscript will satisfy the reviewers concern by making this section more logical for the reader.

      The authors should take care of the fact that at 16hpf, it is an overstatement to speak of an aorta when the cord is starting to lumenize at around 18hpf, Jin et al Development 2005 (see Main text referring to Fig3).

      We thank the reviewer for this clarification. We have changed the relevant text to state “vascular cord” instead of “aorta” and have mentioned that it begins to lumenize around 18hpf for clarification. We have also added the suggested reference.

      Reviewer #3

      As Gata2 has been shown to be a positive autoregulator of itself in mice (Nozawa 2009, Katsumura 2016) and might do so in zebrafish (Dobrzycki 2020), so could gata2b recover itself, in a dose-dependent manner, without the Hif-Nocth1 axis once enough of it is expressed?

      We thank reviewer 3 for this suggestion. We believe that our data show that Cx41.8 is required for mitochondrial ROS production, which stabilises Hif1/2α and switches on downstream gata2b via Notch1a/b (which will be added, see previous section). As such, we believe that the Hif1/2α/Notch1a/b axis is required, at least for the initial induction of gata2b expression. However, reviewer 3 makes a very interesting point regarding the potential for gata2b to positively autoregulate itself, which may of course occur once gata2b expression has been induced by the Cx41.8-mitoROS-Hif1/2α-Notch1a/b-gata2b pathway. We thank the reviewer again for this interesting proposition and have added this suggestion into our discussion in the following paragraph:

      “GATA2 has been shown to positively autoregulate its own expression in mice (Nozawa et al, Genes to Cells, 2009; Katsumura et al, Cell Reports 2016), and Gata2b may also act in this way in zebrafish (Dobrzycki et al, Commun Biol, 2020). Therefore, it is interesting to speculate that once gata2b expression has been induced by the Cx-mitoROS-Hif1/2α-Notch1a/b-gata2b pathway, it may also further induce its own expression, which would make the induction of the haematopoietic transcriptional program more robust”

      Is Hif1/2a expression affected in the mutant? Is it expressed normally but then degraded faster due to the absence of mitochondrial ROS or is it less Hif1/2a expressed overall?

      We thank reviewer 3 for this question, which is similar to a point made by reviewer 1. To clarify, we do not expect hif1/2α expression to be affected in this mutant. Mitochondrial ROS has been shown to stabilise Hif1/2α at the protein level, not the mRNA level. Our data, and that of others (Harris et al, Blood, 2013), suggest that in the absence of mitochondrial ROS, prolyl hydroxylases are not inhibited by mitochondrial ROS, and they target Hif1/2α for ubiquitination and subsequent destruction in a Vhl dependent manner (as shown in Fig. 4D).

      To clarify this in the manuscript, we have adjusted the text in three places (including in the abstract) to clarify that Hif1/2α is stabilised at the protein level, as shown below. We believe these changes have made this important point more understandable for the reader:

      1. “… Mitochondrial-derived reactive oxygen species (ROS) have been shown to stabilise the hypoxia-inducible factor 1/2a (Hif1/2a) proteins, allowing them..”
      2. “Recent research has demonstrated that hypoxia and mitochondrial ROS are required for the stabilisation of the transcription factors Hif1/2a at the protein level”
      3. “… as mitochondrial ROS generation may eventually reach the threshold required to sufficiently stabilise the Hif1/2a proteins for downstream”

        Does MO-mediated knockdown of vhl in the wildtype and mutant (page 7and Fig. ) result in more HSPCs, following the increase in gata2b expression from WT baseline? Does that high expression persist, or does it drop?

      This is an important question. We had already clarified this in the case of cx41.8tq/tq, since we showed that the vhl MO results in more HSPCs (as determined by runx1 expression) at 28 hpf (Supp. Fig. 8A) but we have now added data for the same marker at the same timepoint for WT embryos (Supp. Fig. 8B).

      Although the vhl MO results in an increase in runx1 signal in WT embryos, since the majority of WT embryos injected with the control MO already have “high” runx1 WISH signal at 28 hpf, the difference between injected and control MO injected WT embryos is not significant (Supp. Fig. 8B), as can be expected. This is now explained in the manuscript following the relevant data addition.

      4. Description of analyses that authors prefer not to carry out

      Reviewer #1

      One major missing component is experimental data that distinguish the gap junction/plasma membrane- related and the mitochondrial membrane-related functions of Cx41.8. This is critical, as the role of Connexins in the mitochondria remains poorly understood (and Connexin 43 is the best understood one). Thus, it is a big claim by the authors that Cx41.8 primarily acts through the mitochondria and not the gap junctions. Suggested experiment: The authors should generate a fluorophore-tagged Cx41.8 - under a ubiquitous (ubb or actin) or HSPC-/hemogenic endothelium-specific (gata2b) promoter to monitor the protein localization of Cx41.8. Providing data on whether Cx41.8 protein indeed localizes to the mitochondria is important to support their claim.

      We thank the reviewer for this suggestion, which we agree would be a nice experimental approach to try to investigate whether Cx41.8 does indeed localise to the mitochondria in zebrafish endothelial cells.

      However, EGFP fused full-length cx41.8 has previously been generated and was reported to be nonfunctional, and it was suggested that the amount of localised Cx41.8 is also too small to detect using this approach (Watanabe et al, Pigment Cell Melanoma Res, 2012; Usui et al, BBA Advances, 2021). An EGFP tagged CT-truncated Cx41.8 construct has also been generated and shown to rescue the cx41.8t1/t1 mutant (Usui et al, BBA Advances, 2021), but EGFP expression again could not be detected using this construct in zebrafish.

      As such, since efforts to carry out such an approach have failed in previous attempts and since it has already been demonstrated that CX40 (orthologous to cx41.8) localises to the mitochondria of endothelial cells (Guo et al, Am J Physiol Cell Physiol, 2017), we believe that confirmation of Cx41.8 localisation to the mitochondria in vivo in zebrafish endothelial cells will be very difficult and too time-consuming in the context of this manuscript.

      Related to the above point, the authors should test whether the gap junction function of Cx41.8 is intact in the cx41.8tq/tq mutants by assessing calcium waves in the GCamp transgenic line.

      We agree with the reviewer that this would be a very elegant approach in order to analyse whether Cx41.8 channel function is affected in cx41.8tq/tq mutants. However, we feel that this experiment is definitely beyond the scope of this manuscript. Furthermore, carrying out this experiment would require the acquisition of the GCamp line as well as multiple crosses with the cx41.8tq/tq line which, together, we envisage would take at least 9 months before the experiments can be performed, as so this experiment would also be too time consuming for this manuscript. Finally, we believe there is already strong published evidence that the cx41.8tq/tq mutant results in disrupted channel function (Watanabe et al, EMBO Rep, 2006), as already cited in our manuscript. However, since then, we have also now found additional published in vivo evidence that cx41.8tq/tq channel function is reduced, which is now also cited in the new version of the manuscript.

      The authors might also want to consider performing transcriptomic analysis (bulk RNA sequencing) from purified HSCs in wild types and cx41.8 mutants and assess the downstream pathways affected by the loss of this gene.

      Although this is an interesting proposition, we consider this suggestion to be out of the scope of this manuscript, especially since our model involves changes in gene expression upstream of HSPC induction, and, expression of the key genes thought to be affected (notch1a/b and gata2b) can be checked using a much more cost and time efficient approach, by qPCR, which we will do, as discussed above.

      Are the authors sure of their statement on budding HSPCs when the GFP signal pointed by arrows could in majority be hemogenic cells? (which would be in favor of their hypothesis on Cx41.8 being involved rather in hemogenic endothelium/HSPC specification).

      Since cmyb is a marker of HSPCs and not of the haemogenic endothelium as demonstrated in numerous publications (North et al, Nature, 2007; Bertrand et al, Development, 2008; Bertrand et al, Nature, 2010 and others). Hence, we are confident that this transgene is marking nascent HSPCs and not the haemogenic endothelium.

      As mentioned by the authors in the Discussion, the other connexin Cx43 (Ref 36, Jiang et al 2010) is playing a significant role in HSPC specification in the zebrafish and is expressed in zebrafish arterial cells at 24 hpf. Hence there may be some functional redundancy between Cx43 and Cx48.1, as supported by previous work from the authors showing that a null mutant of Cx48.1 does not exhibit any phenotype in HSPC specification (Ref12, Cacialli et al 2021). This may be problematic for the experiments using drug treatments in the present work, because they are not selective for the different connexins (ex: anti-oxydants (NAC), connexin blockers (heptanol, CBX)), thus blurring interpretations on the specific function of Cx48.1 versus the ones exerted by Cx43 (this should be also valid for the vhl MO treatments).

      This comment is strengthened by the fact that the authors do not systematically address, for both WT and mutant embryos (Fig3 E, F; FigS6; FigS8), if expression levels with drugs/H2O2/MO are different for the 2 conditions (if relatively equal, it would indeed indicate that these drugs/conditions possibly act on another connexin, which would help the authors in their analyses and interpretations).

      We thank the reviewer for these comments and we agree with their concerns regarding the possibility of other Connexins being affected by our experiments using drug treatments. However, we do not rule this out in our manuscript and actually discuss it as being a very realistic prospect, as written about in the discussion section.

      Sadly, to the best of our knowledge, no selective Cx41.8 inhibitors have been described for use in zebrafish, otherwise we would of course have used this. Hence, this was the reason for our choice of compounds, many of which we also used in our previous publication (Cacialli et al, Nature Commun, 2021).

      The haemogenic endothelium/HSPC phenotype in cx41.8tq/tq embryos confirms that this connexin plays a role in HSPC specification, whilst we believe disentangling which other connexins are also involved in this process will be interesting to look into in other future studies but is beyond the scope of this one – we believe that together, the data presented in our manuscript, along with the revisions we plan to carry out, will be convincing to demonstrate a role for Cx41.8 in the mechanism we describe.

      The authors may try to rescue the wt phenotype by expressing, in the Cx48.1tq/tq mutants, the mRNA encoding for the wt protein.

      Although we appreciate this suggestion, we do not believe this experiment will add much in terms of value to the conclusions of our manuscript and, as such, we believe this suggestion is surplus to requirements for this manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We have thoroughly revised the manuscript, taking into account all comments from all four reviewers. We have added new data (Supplemental Figure 2 and Supplemental Figure 4) in response to these comments.

      Reviewer 1

      The assessment of data reproducibility is currently uncertain due to the absence of replication and statistical analysis in the dataset. It is essential to provide explicit information regarding sample sizes or replicates for all data and figures, data should be presented as mean +/- SD/SEM, and the interpretation of results should be grounded in rigorous statistical analysis. The lack of experimental replicates and statistical analysis in most of the figures presented raises major concerns regarding the validity of the result.

      We have now added error bars for the graphs in Figure 3D, E, F, G, H; Figure 4 D, F, G, H, I, J; Figure 5 B, C, D, E, F, G; and Figure 6B, C, D. All GTPase assays have repeated three times. The mean ± S.D. (n = 3) is plotted for each condition. For high-speed pelleting assays, all assays have been conducted three times, and a representative assay is shown.

      Why was only one of the MiD proteins, specifically MiD49, studied, while MiD51 was not includedin the investigation?

      This is an excellent point. In our previous work (doi:10.1101/2023.07.31.551267), we found that MiD49 and MiD51 were strikingly similar in their abilities to activate Drp1 after their own activation with fatty acyl-CoA. We feel that the demonstration here with MiD49 suggests that a similar effect would occur with MiD51. Due to time constraints for the lead author, preparing more MiD51 protein was out of the scope of what could be done. We now add a line in the Discussion that results for MiD51 may be different.

      The author suggestion of Drp1 phosphorylation, based on the mobility of protein observed in SDS-PAGE gel (fig 4A, 5A, 6A), is not a sufficiently valid assessment. While western blot analysis is a valid method to assess Drp1 phosphorylation, it is essential to include replicates for semi-quantitation and demonstrate the reproducibility of the results. Moreover, it is recommended to incorporate Western blot analyses to provide additional support for the findings presented in Figures 5 and 6.

      • We agree with the reviewer that additional information on the phosphorylation state of these proteins should be provided. We now include phospho-proteomic analysis for Erk2 phosphorylation of WT Drp1 and Drp1-S600D (Supplemental Table 1), showing that S579 is by far the predominant phosphorylation site. For WT Drp1, three lines of evidence now suggest efficient Erk2 phosphorylation of S579:
      • Western blot using anti-phosphoS579
      • Phosphoproteomic analysis
      • Gel shift

      For the Drp1-S600D phosphorylation, we have phosphoproteomic and gel shift analysis. For isoform 6, we regrettably only have gel shift. However, given the fact that the effect of Erk2 treatment on actin-stimulated GTPase activity mimics what we found for WT-Drp1 and for Drp1-phosphoS579/S600D, we think it is highly likely that the equivalent phosphorylation (S629 in this case) has been affected.

      Data on phosphorylated peptides with replicates experiments should be presented.

      We now present these data, which have been significantly expanded since the initial submission (new Supplemental Table 1). While non-phophorylated S579 is still detected in both the WT and S600D phosphorylation reactions, the phosphorylated peptide is 2.2 and 2.3-fold more abundant, respectively. Our conclusion is that Erk2 efficiently phosphorylates S579, although stoichiometric phosphorylation was not obtained here. We have added statements in the relevant sections of the Result, and in the Methods. We have also added Supplemental Table 1 to show the spectral counts obtained from phospho-proteomic analysis, and have deposited the raw data files with the PRIDE consortium (access information in the Methods).

      Please provide additional context or specific details about the GFP-tagged Drp1 protein, such as the protein site where GFP was attached, as well as whether this tag could potentially impact the Drp1 GTPase activity and oligomerization. Figure 7C and D appear to suggest an increase in the GTPase activity of the GFP-Drp1 protein.

      We have now added these details to the Methods section, and have also added the complete amino acid sequence for the final purified construct in Supplemental Figure 4. We have also added that a previous study (PMID: 32901052) found that inclusion of GFP strongly inhibited Drp1 GTPase activity. We do not observe this effect here or in a previous study (PMID: 27559132), and provide possible reasons for this difference in the Methods. The reviewer points out that the activity of GFP-Drp1 appears higher than that of un-tagged Drp1 (comparing 7C with 7D). We find that the GTPase activity of Drp1 alone varies between 1 and 2 uM/min/uM protein depending on the preparation. This variation occurs for both untagged and GFP-tagged Drp1. This difference in basal activity from prep-to-prep might relate to differences between protein preparations, or exact amount of time required to freeze the aliquots of purified protein (we freeze small aliquots ( An optional experiment that would significantly enhance the biological relevance of the findings presented in the current study is to assess the morphology of mitochondria in cells expressing the phospho-mimetic mutant Drp1 proteins. This experiment would provide valuable insights into the functional consequences of Drp1 S579 and S600 phosphorylation on mitochondrial structure and dynamics.

      We fully agree that these would be valuable experiments. The issue is that a large number of experiments using phospho-mimetic mutants in cells have already been conducted, with varying results (Taguchi et al., 2007; Qi et al., 2011; Yu et al., 2011; Strack et al., 2013; Kashatus et al., 2015; Serasinghe et al., 2015; Xu et al., 2016; Brand et al., 2018; Han et al., 2020, Chang and Blackstone, 2007; Cribbs and Strack, 2007; Cereghetti et al., 2008; Wikstrom et al., 2013, Han et al., 2008,Wang et al., 2012 Jhun BS, Sheu, 2018, J Physiol). To conduct more targeted tests examining specific forms of Drp1 activation in cells (for example, through Mff, MiD proteins, actin, or cardiolipin) will require extensive work that is outside the scope here. Our feeling is that S579 phosphorylation is likely to recruit another molecule (probably a protein) that has an activating effect. We tried to test one possibility (NME3, mentioned in the Discussion) but failed to produce useable NME3 protein for these tests and, given time constraints for the lead author, could not address this further.

      Provide reference for method on actin polymerization.

      We have now added a reference in the ‘Actin preparation for biochemical assays’ section of the Methods (PMID 16472659).

      Rectify the error in referencing figure 3 panels within the figure legends of Supplemental Fig S1.

      Thank you, we have changed this.

      The inclusion of full length isoform 6 is commendable. However, there is no mentioned of isoform6 in the method section.

      Thank you for pointing this out. We have added description of the construct and referenced our previous paper that used it.

      Since papers deposited in bioRxiv have not undergone peer review, reference #7 should not becited as references in scholarly work.

      Reference 7 has so far been reviewed by a peer-review journal. e are addressing reviewers’ concerns and will re-submit soon. We do not know how to rectify the issue of referencing this work, because it describes an extensive amount of groundwork for the MiD proteins. Our hope is that this work will be in press by the time the work reviewed here is ready for publication.

      Please provide details about the calculation of GTPase activity and the distinctions between the specific GTPase activity and total GTPase activity shown in figure 8D-F.

      We now describe these calculations in the “GTPase assay” section of the Methods.

      Reviewer 2

      Overall, the experiments described here are carried out with rigor and the conclusions drawn are of significance to understanding how phosphorylation regulates Drp1 functions.

      Thank you for these kind comments!

      Phosphorylation of both the serine residues appears to elicit a common effect in that they inhibitDrp1's stimulated GTPase activity. This would suggest that phosphorylation affects Drp1's self-assembly as tightly packed helical scaffolds. Instead of sedimentation analysis, an EM analysis of helical scaffolds on cardiolipin-containing membrane nanotubes or in the presence of soluble adaptors causing Drp1 to form filaments would provide a direct readout for defects in self-assembly.

      This is an excellent point, and we would love to conduct this work. Given our current EM infrastructure and expertise, these experiments would take extensive time for us to do. We do have a collaborator who could carry these out, but feel that the time it would take even for them to do this correctly is beyond that which we have (the lead author is transitioning to their next career phase). We have added the point that further EM studies of this type are necessary to test the effect on Drp1 assembly more directly.

      I am not sure of the rationale for experiments reported in Fig. 7 and 8. If the idea was to test if hetero oligomerization with WT Drp1 rescues defects associated with phosphorylated Drp1 then this could be stated explicitly in the manuscript. GFP-Drp1 is used as a WT mimic but a previous report (PMID: 30531964) indicates that this construct is severely defective in stimulated GTPase assays, much like the K38A mutant. But the rationale of using these constructs is not quite apparent. Is the intention to test if defects seen in the phospho-mimetic mutants of Drp1 can be rescued by the presence of a 'seed' of WT Drp1. If so, then this could be stated explicitly in the manuscript. But regardless, I am not quite sure what this data set achieves in terms of addressing mechanism.

      We apologize for not being clearer in our explanation of these experiments. Our goal was to test the effects of partial Drp1 phosphorylation on overall Drp1 activity, which likely mimics more accurately the cellular situation (wherein only a portion of the Drp1 population is likely to be phosphorylated even upon kinase activation). We now discuss these experiments in a clearer manner. For the GFP-Drp1, we do not observe the effect on GTPase activity shown in that previous manuscript by another laboratory, either here or in previous studies (eg, PMID: 27559132). In the Methods, we now provide a discussion of these differences and possible reasons for them, as well as providing the complete amino acid sequence of our GFP-fusion construct in Supplemental Figure 4.

      Finally, it would have been nice to see if the phospho-mimetic mutants of Drp1 produce the same effects on mitochondrial structure as those reported earlier. Reanalyzing their effects in a cellular assay becomes important because it would consolidate this work for the readers to evaluate the'true' effects of phosphorylation on Drp1 functions. If the phospho-mimetic mutants fare in a manner like those previously reported, then it signifies that stimulation in GTPase activity is not a readout that directly correlates with Drp1 functions. If not, then the results presented here would establish a comprehensive analysis of in vitro biochemical activities and in vivo functions of the phospho-mimetic mutants.

      We fully agree that these would be valuable experiments. The issue is that a large number of experiments using phospho-mimetic mutants in cells have already been conducted, with varying results (Taguchi et al., 2007; Qi et al., 2011; Yu et al., 2011; Strack et al., 2013; Kashatus et al., 2015; Serasinghe et al., 2015; Xu et al., 2016; Brand et al., 2018; Han et al., 2020, Chang and Blackstone, 2007; Cribbs and Strack, 2007; Cereghetti et al., 2008; Wikstrom et al., 2013, Han et al., 2008,Wang et al., 2012 Jhun BS, Sheu, 2018, J Physiol). To conduct more targeted tests examining specific forms of Drp1 activation in cells (for example, through Mff, MiD proteins, actin, or cardiolipin) will require extensive work that is outside the scope here. Our feeling is that S579 phosphorylation is likely to recruit another molecule (probably a protein) that has an activating effect. We tried to test one possibility (NME3, mentioned in the Discussion) but failed to produce useable NME3 protein for these tests and, given time constraints for the lead author, could not address this further.

      Previous work reports that the effect of actin on the GTPase activity of Drp1 is biphasic but the binding to actin is not. This is quite confounding, and the authors could perhaps explain why this is the case.

      The reviewer makes an excellent point, which we now explain further in the manuscript. We have also discussed this in doi:10.1101/2023.07.31.551267 (see Figure 2D in that work). Our interpretation is that it is the density of Drp1 bound to the actin that provides the activation, by positioning the GTPase domains in close proximity. As the amount of actin increases, the Drp1 becomes more dispersed on the filaments, and activation decreases. We observe the same effect for MiD49 and MiD51 oligomers (see the above-mentioned reference).

      The manuscript cites PMID: 23798729 for expression analysis of slice variants but PMID:29853636 provides a more compressive analysis. The authors could cite this work.

      Thank you for this reference. We were unaware of it, but are very glad to know of it now. We now include this reference. In particular, in the legend to Figure 1C (table of splice variants), we now state that this table is for human Drp1, and that additional splice variants have been identified for murine Drp1 (PMID 29853636).

      Reviewer 3

      The splendid results of the manuscript willbe interesting to the researchers in the related fields.

      Thank you for this nice comment!

      The manuscript provided well-organized biochemistry results for comparisons between phosphorylation of Drp1 S579 and S600. It is the reviewer's comments that the authors may include experiments that manipulate Drp1 phosphorylation at different amino acids in cells. Such experiments will provide strong support for this manuscript.

      • We fully agree that these would be valuable experiments. The issue is that a large number of experiments using phospho-mimetic mutants in cells have already been conducted (Taguchi et al., 2007; Qi et al., 2011; Yu et al., 2011; Strack et al., 2013; Kashatus et al., 2015; Serasinghe et al., 2015; Xu et al., 2016; Brand et al., 2018; Han et al., 2020, Chang and Blackstone, 2007; Cribbs and Strack, 2007; Cereghetti et al., 2008; Wikstrom et al., 2013, Han et al., 2008,Wang et al., 2012 Jhun BS, Sheu, 2018, J Physiol). To conduct more targeted tests examining specific forms of Drp1 activation in cells (for example, through Mff, MiD proteins, actin, or cardiolipin) will require extensive work that is outside the scope here. Our feeling is that S579 phosphorylation is likely to recruit another molecule (probably a protein) that has an activating effect. We tried to test one possibility (NME3, mentioned in the Discussion) but failed to produce useable NME3 protein for these tests and, given time constraints for the lead author, could not address this further.

        The authors discussed the known factors that involved in Drp1 activation, such as its receptors, actin and cardiolipin. Recent JCB paper (J. Cell Biol. 2023 Vol. 222 No. 10 e202303147) indicates that intermembrane space protein Mdi1/Atg44 may play a role in coordinating mitochondria fission with Dnm1 (Drp1 in yeast cells). It will be valuable if the manuscript could also discuss the potential factor.

      • Thank you for this comment. We now include Mdi1/Atg44 as a possible factor that might be influenced by Drp1 phosphorylation. Two points we would like to make here are: there doesn’t seem to be an Mdi1 homologue in mammals, so the equivalent factor must be identified before testing; and Mdi1 is an inter-membrane space protein, so any effect of Drp1 phosphorylation on coordinated functioning with Mdi1 would either require an intermediary factor or exposure of the IMS in some way.

        Keywords cannot represent the manuscript. It is recommended that the authors use other words to for the current manuscript.

      We have removed K38A from this list. The other key words are not mentioned in the Abstract.

      Reviewer 4

      The authors showed that the binding of Drp1 to actin depends on salt concentrations (Fig. 2Band C). In the presence of 65 mM NaCl, the phosphomimetic mutants showed decreased binding to actin. The GTPase assay is performed with 65 mM KCl, in which actin did not stimulate GTP hydrolysis of the phosphomimetic mutants. In contrast, with 140 mM NaCl, the S579D Drp1 exhibits slightly enhanced actin binding compared to WT Drp1. Could the authors assess the actin-activated GTPase activity in the 140 mM salt condition to test if actin activates GTP hydrolysis ofS579D Drp1 more potently than WT?

      This is a good point by the reviewer. However, with limited time for the first author, we chose to focus on the reviewer’s other comments (see below).

      Both phosphomimetic mutants show reduced activation for GTP hydrolysis in the presence of cardiolipin, Mff, and MiD49. Is this because the mutants have a lower affinity for these interactors? Or do they bind with the same affinity but experience diminished activation? The data suggests the latter scenario, potentially resulting from decreased oligomerization properties. Can the authors provide more insights on this, for example, by measuring their interaction in the presence of GMP- PCP, which fully induces oligomerization in all three forms of Drp1?

      • These are interesting ideas, and we conducted experiments similar to what the reviewer described: co-sedimentation experiments with combinations of Drp1 and Mff under three nucleotide states: no nucleotide, GMP-PCP, and GTP. We used Mff for these experiments because we have this protein in abundance, and have previously characterized this construct as a trimer in PMID 34347505. We use a high concentration of Mff (50 mM) versus Drp1 (1.3 mM) because of the relatively low affinity between the two proteins (shown in PMID 34347505). We find the following:
      • In the absence of nucleotide, Mff does not cause an increase in pelletable Drp1 for any of the Drp1 constructs.
      • In the GTP state, the presence of Mff greatly increases the amount of Drp1 in the pellet, suggestive of increased Drp1 oligomerization. This effect occurs for all Drp1 constructs (WT, S579D and S600D mutants), but the amounts of both Drp1 and Mff in the pellets are about 50% less for both mutants than for the WT construct. This result suggests a decrease in oligomerization for the mutants, but not necessarily a decrease in Mff binding.

      I'm curious what happens to oligomerization if GTP is added instead of nonhydrolyzable GMP-PCP (Fig. 1D). Does this lead to higher oligomerization in the mutants compared to WT since the mutants seem to have lower GTPase activity? This might explain why phosphorylation increases mitochondrial localization of Drp1 in cells seen in some studies.

      This is another interesting thought, and we describe the new experiments we conducted in the response to the previous comment. Essentially, while GTP does cause a slight increase in pelletable Drp1, the increase is somewhat similar for all constructs. As described in the last comment, the addition of Mff causes a substantial increase in pelletable Drp1 for both WT and the mutants. This result suggests that, while the basal oligomeric state of Drp1 (in the absence of nucleotide) is reduced for the mutants (our original analytical ultracentrifugation data), the mutants appear to be capable of responding to GTP and Mff in a similar manner to WT. We acknowledge that the assay used here (pelleting) lacks the precision required to draw detailed conclusions on oligomerization or interaction with Mff, and we try to reflect this in our discussion of the data. We do feel, however, that these data are useful to report, in guiding future study.

      Please include the number of experimental repeats and error bars where applicable.

      We have now added number of experimental repeats and error bars for the graphs in Figure 3D, E, F, G, H; Figure 4 D, F, G, H, I, J; Figure 5 B, C, D, E, F, G; and Figure 6B, C, D.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Replies to Reviewers

      Thank you for inviting us to submit our revised manuscript titled, “Diffusive mediator feedbacks control the health-to-disease transition of skin inflammation.” We appreciate the time and effort the editor and each of the reviewers have dedicated to providing insightful feedback on ways to strengthen our manuscript. The revisions in the main text in response to the detailed comments are highlighted in red and were proofread by professional English editors. We hope that our revision and responses address all the concerns raised by the reviewer, and we look forward to hearing from you regarding this submission.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript provides a model of interacting populations of pro- and anti-inflammatory mediators to explain spatial patterns associated with various inflammatory conditions. The work is robust and articulated well, and is certainly scientifically relevant.

      Authors: Thank you for your positive evaluation and many insightful comments on our manuscript. We have incorporated your feedback, and hope that our revisions satisfy all the comments.

      Minor amendments:

      Personally, I feel that the model should be reported prior to the results, as the choice of model is likely to have great significance on the observations. It would be preferable for the reader to have a clear picture of the governing equations in their mind as they digest the results.

      Au: Following this reviewer's suggestion, we have relocated the Method section including the model description to be written prior to the Result section (p.9-14 lines 152-232; revised manuscript).

      The literature review is largely relatively thorough; however, I think it is important that the previous works of Joanne Dunster (University of Reading) and collaborators are included, as these are very closely related to this work. In particular, the authors should note the following two papers, which take a spatial approach:

      • Bayani, A., Dunster, J.L., Crofts, J.J. et al. Mechanisms and Points of Control in the Spread of Inflammation: A Mathematical Investigation. Bull Math Biol 82, 45 (2020). https://doi.org/10.1007/s11538-020-00709-y

      • Bayani A, Dunster JL, Crofts JJ, Nelson MR (2020) Spatial considerations in the resolution of inflammation: Elucidating leukocyte interactions via an experimentally-calibrated agent-based model. PLoS Comput Biol 16(11): e1008413. https://doi.org/10.1371/journal.pcbi.1008413

      Au: We have incorporated this comment by adding the two suggested papers to the relevant sentences in the literature review (p.6 line 118-119; revised manuscript) as follows: “Previous reaction-diffusion models, including chemotactic cells, have reproduced the resolution of inflammation in the lung [Bayani et al. 2020a, Bayani et al. 2020b]”

      One key point that should be mentioned in the discussion is that the model neglects any immune cells (e.g. neutrophils, macrophages) which contribute greatly to the inflammatory condition. Since these cells are motile, and also can contribute both pro- and anti-inflammatory effects, they are likely to influence spatial patterns significantly. It is not necessarily a problem that these aren't included in the model, but I feel that it is important that their omission be discussed in the manuscript.

      Au: We have now discussed the immune cells in the “Future implications” as the reviewer suggested (p.29 line 477-483; revised manuscript) as follows: “This is probably because the present model focuses on the non-chemotactic cells (e.g., including keratinocytes), whereas chemotactic cells (e.g., macrophages and neutrophils) also contribute to skin inflammation [Zhang and An 2007, Coondoo 2011]. Moreover, the present model focuses on the innate immune response, whereas the skin initiates an acquired immune response in the persistence of the innate immune response. Therefore, incorporating the chemotactic cells and acquired immune response into the model will reproduce the end of the expansion.”

      Reviewer #1 (Significance (Required)):

      The manuscript advances our current understanding of spatially spreading inflammation and corresponding patterns, but needs to be contextualized against existing literature as described above.

      This manuscript will appeal to theoreticians (Mathematicians) and clinicians/experimentalists alike.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors propose a minimal mechanistic mathematical model able to reproduce qualitatively different spatial patterns observed in healthy and disease epidermis. The starting point is a systematic review of medical images of different dermatological conditions, which they classify and successfully capture according to the spatial patterns. It is an interesting piece of work, but I consider that it will gain significance if the theoretical results are compared again with the clinical data. Specifically, the authors show a very interesting map between parameter regions and different spatial patterns; this result should be compared back to clinical data, to confirm that specific changes in spatial patterns indeed result from predicted changes in a specific parameter (e.g., due to a genetic condition that affects a feedback strength).

      Authors: We thank you for providing your valuable comments on our manuscript.

      Following your suggestion about the comparison of theoretical results with the clinical data, we have predicted which specific parameters including the feedback strength cause specific transitions of spatial patterns in the respective diseases. The discussion was added on p.26 lines 415-438 in the revised manuscript as follows: “The parameter-to-patterning correspondence (Fig. 4A, B, S2 Fig., and S3 Fig.) allows us to infer the pathogenesis mechanism in various diseases exhibiting each of diverse expanding patterns (seen in Table 2). For instance, psoriasis exhibits all five expanding patterns (Table 2) and increased levels of pro-inflammatory mediator (TNF-α) [Ringham et al. 2019], which is consistent with our theoretical results. The elevated pro-inflammatory mediator in psoriatic skin has been suggested to be caused by genetic mutations affecting regulatory feedback [Valeyev et al. 2010]. Considering these previous studies, our model predicts a psoriasis progression where fading pattern transits to arcuate, polycyclic, gyrate, annular, and circular pattern where increase in the TNF-α level is possibly due to mutation-induced alteration in the feedback parameters, e.g., increase of the production of pro-inflammatory mediator qa (Fig. 4A). Alternatively, Lyme disease exhibits circular, annular, and polycyclic patterns (Table 2). A clinical report showed that patients in Missouri predominantly exhibit an annular pattern without prognostic symptoms, while those in New York tend to exhibit a circular pattern with prognostic symptoms following the same treatment [Wormser et al. 2005]. Considering our theoretical result that the overproduction of pro-inflammatory mediators and the depletion of anti-inflammatory mediators leads to the annular and circular pattern, respectively (Fig 4, 5A, and B), altered levels of pro-inflammatory and anti-inflammatory mediators may significantly impact the development and prognosis of Lyme disease in Missouri and New York patients, respectively.

      These qualitative parameter estimations will be verified in the future through parameter quantification in each diseased skin exhibiting any expanding patterns. By incorporating this quantitative correspondence between patterns and parameters measured in each disease into the present model, we would develop each disease-specific model with a quantitative predictability of how much change of the skin parameters transit from healthy to diseased pattern or vice versa. Therefore, this study provides the first step to controlling the healthy-to-diseased transition of skin inflammation via diffusive mediator feedback.”

      Another shortcoming of this work is that some of the conclusions are rushed: the parameter-to-spatial patterns analysis would strongly benefit from adding a quantitative to the qualitative description, e.g., mapping how changes in a given parameter value results in gradual changes in fading speed. Along the same line, the stability analysis for the different fading pattens was performed only for selected parameter values, it is not clear how variations in parameter values affect the sizes of the basins of attraction of the different steady states; we want to make sure that the parameter values were not cherry-picked. Further, given that the authors show bistability for some parameter values, then the dependency on initial conditions on the final spatial pattern should be more extensively investigated.

      Au: We have incorporated these comments by adding a quantitative description including new results and future research strategies following each of the three constructive suggestions raised by the reviewer.

      First, regarding “the fading speed” the reviewer suggested, fading speed is affected by changes in parameters involved in mediator production. In particular, the speed is reduced by an increase in the production parameters of pro-inflammatory mediators (pa, qa) and a decrease in those of anti-inflammatory mediators (pi, qi) (Fig.2. C and D). Moreover, “the size of the basins” the reviewer pointed out corresponds to the distance between ST (Threshold) and SH (Healthy state) in the cases with excitability. The distance between ST and SH becomes closer indicating the health state being less stable when pro-inflammatory mediators (pa, qa) increase or anti-inflammatory mediators (pi, qi) decrease from the healthy fading pattern. The imbalance of the mediator production transits the fast fading pattern with a small trajectory into a slow fading pattern with a larger trajectory. As imbalance goes on, the expanding pattern appears in the order of arcuate, polycyclic, and gyrate (Fig. 5). In cases with bistability, the size of basins corresponds to the relative distance ST to SH and ST to SI (Inflamed state). The circular and annular patterns appear when the distance between ST and SH is closer. On the other hand, when the distance between ST and SI was closer, the inflamed area shrank rather than expanded. The shrinking pattern appeared by reducing the production of pro-inflammatory mediators (pa, qa) or increasing the production of anti-inflammatory mediators (pi, qi) under conditions of stability. We have added a new figure and described this finding in Results (p.24 lines 384-388; revised manuscript) as follows: “As a result, we found that the distance between the healthy state (SH) and the threshold state (ST, a closer unstable steady state to SH) was the smallest in the gyrate pattern and increased in the order of polycyclic, arcuate, slow fading pattern, and fast fading pattern (Fig. 5C–F, S4 Fig. B and C). The fast fading pattern showed a smaller trajectory (green curve in S4 Fig. B and C) of change in the mediator concentration than the slow fading pattern.”

      Second, regarding “the dependency on initial conditions”, we have further added a new result (p.24 line 374-382; revised manuscript) as follows: “The number of stable states determines the pattern regardless of the initial condition in the spatial distribution of mediator concentration. Similar to the fading pattern (Fig. 2), the arcuate, polycyclic, and gyrate patterns with the excitability appeared reproducibly, independently of the initial conditions due to a single stable state SH (Fig. 5C-F). Even in circular and annular patterns with bistability where the threshold ST was closer to the inflamed state SI than the healthy state SH (Fig. 5A-B), the final spatial pattern was dominated by the SI independently of the initial condition. On the contrary, when ST was closer to the SH than the SI, the inflamed area shrank rather than fading (S4 Fig. A). These results are general outcomes of the traveling wave of bistable systems [Murray 2002], and consistent with the previous theoretical studies on inflammations [Sudo and Fujimoto 2022, Volpert 2009]. ”

      Finally, we have added “a quantitative to the qualitative description as a future research strategy (p.27 line 432-438; revised manuscript) as follows: “These qualitative parameter estimations will be verified in the future through parameter quantification in each diseased skin exhibiting any expanding patterns. By incorporating this quantitative correspondence between patterns and parameters measured in each disease into the present model, we would develop each disease-specific model with a quantitative predictability of how much change of the skin parameters transit from healthy to diseased pattern or vice versa. Therefore, this study provides the first step to controlling the healthy-to-diseased transition of skin inflammation via diffusive mediator feedback.”

      For reproducibility it is essential that the authors add a much more detailed description of the methods, including the software tools / numerical analysis tools used. Making the code publicly available would also be very beneficial to ensure the reproducibility of the results.

      Au: Following your suggestion, we have added a description of the methods, including the simulation code, to the “Methods” (p.13 lines 231-232; revised manuscript) as follows: “A simulation code written in C language is available from GitHub: https://github.com/MakiSudo/Erythema-Patterns/blob/main/AInondim.c.”

      In conclusion, the work is very interesting and worth publishing, but requires (a) to come back to the clinical data for validation of model predictions, (b) a more thorough and quantitative investigation of the effects of parameter variations on model behaviors, (c) a more rigorous and systematic presentation of the methods, (d) carefully explaining how the proposed model is similar / differs to the classical activator -inhibitor model proposed by Turing, and (e) discussing / showing if the fading patterns result from a turning instability.

      Au: For (a) “validation of model predictions,” (b) “model behaviors,” and (c) “a more rigorous and systematic presentation of the methods,” we have reflected your suggestions in the revised manuscript as described above.

      Regarding (d) and (e), we have added an explanation of “how the proposed model is similar/differs to the classical activator–inhibitor model” and “if the fading patterns result from Turing instability” after the model construction in Methods (p.11-12 line 210-216; revised manuscript) as follows: “Reaction terms of this model are similar to the classical activator-inhibitor model proposed by Turing [Turing 1952], which includes the negative feedback of the activator through the inhibitor and the positive feedback of the activator. These reaction terms potentially result in Turing instability. However, the present model setting does not show Turing instability. The reason is that Turing instability requires a large difference between the diffusion coefficients of the activator and inhibitor [Murray 2002], whereas these coefficients in the present model were set to be equal based on molecular findings that these molecular weights are close in proximity [Coondoo 2011]. ”

      **Referees cross-commenting**

      I agree with the comments from Reviewer #1.

      Reviewer #2 (Significance (Required)):

      The work aims to bridge mathematical modelling to dermatological practice, which is much needed to enable the use of theoretical and computational tools to clinical decision-making. While some mathematical models of skin inflammation have been proposed in the past (refer to papers from the RJ Tanaka group in systems dermatology), most of these do not consider explicitly the spatial component, which is crucial for modelling the clinically visible spatial patterns. Potentially interested audience includes biomathematicians, systems biologists, systems dermatologists, and, if the validation of the model predictions is achieved (as suggested above), also dermatologists.

      I am a systems biologists working on multi-scale mechanistic mathematical modelling of epithelial tissue diseases. The work I just reviewed falls exactly within my area of expertise.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for their time and both thoughtful and constructive comments. Their specific points are addressed below but a general point that we would like to comment on is that in the original version it appears we did not make our model clear enough. The dogma in the field is that Rab7 is recruited to endosomes from a cytosolic pool via exchange with Rab5 (mediated by Mon1/Ccz1). Our work instead indicates that the majority of Rab7 is delivered to Dictyostelium phagosomes by fusion with other endocytic compartments. It was not our intention to imply there was no canonical recruitment of Rab7 from a cytosolic pool, and indeed we provide data to show this happens at a low level and discuss this in the manuscript. Nonetheless, we clearly over-stated the exclusivity of Rab7 recruitment to phagosomes via fusion at several points and our original model cartoon, and have tried to better explain or more nuanced model with multiple routes for Rab7 acquisition in this revision, including a completely redrawn model figure (Fig. 7).

      2. Description of the planned revisions

      Reviewer 1:

      1. The observation that macropinosomes undergo retrograde fusion with newly formed phagosomes to facilitate phagosome maturation is an interesting notion that challenges the traditional model. However, not all phagocytes exhibit a high level of macropinocytosis, and axenic Dictyostelium cells used in the study may be an exception. Thus, it remains unclear whether fusion with macropinosomes is universally required for phagosome maturation. WT Dictyostelium cells or axenic cells cultured under SorMC/Ka condition (Paschke et al., PLoS One, 2018) exhibit significantly reduced macropinocytosis. The authors could examine whether the accumulation of Rab7 and V-ATPase on large-sized phagosomes is delayed in these cells. These experiments may help broaden the applicability of the authors’ finding.

      As our previous work (Buckley et al. PloS pathogens 2019) demonstrated that bacterially-grown PIKfyve mutants are also defective in bacterial killing and growth it is highly likely that cells also are defective in V-ATPase and Rab7 acquisition. However, we agree that formally testing this will further support our conclusions and improve the paper and should be quite straightforward.

      We will therefore co-express GFP-V-ATPase and RFP-Rab7 in both Ax2 and non-axenic cells grown on bacteria and repeat our analysis of recruitment to phagosomes – with the caveat that non-axenic cells do not phagocytose large particles such as yeast (Bloomfield et al. eLife 2015), so the imaging and quantification will be more challenging in this case.

      PIKfyve seems to play a specific role in the maturation of phagosomes but not macropinosomes. The differences may be driven by signaling from phagocytic receptors, as the author suggested. Alternatively, the large size of the yeast-containing phagosomes may require additional steps for efficient lysosomal delivery. The authors should consider examining whether PIKfyve is needed for the delivery of Rab7 and V-ATPase to phagosomes of comparable size to regular macropinosomes, such as those containing K. aerogenes or small beads. In addition, whether the process also involves fusion between phagosomes and macropinosomes should be verified.

      Whilst it is possible that large size of yeast-containing phagosomes requires additional mechanisms to process them, our previous data demonstrate that PIKfyve is also required to kill much smaller bacteria such as Klebsiella and Legionella (Buckley et al. PloS pathogens 2019). Furthermore, in this paper we also showed that loss of PIKfyve disrupts phagosomal proteolysis using 3um beads, and showed that V-ATPase recruitment was reduced on purified phagosomes containing 1um beads. We therefore find consistent defects on phagosomes of different size, with different cargos. Nonetheless, the experiments above, observing V-ATPase and Rab7 in cells grown on bacteria should directly address this point.

      As suggested, we will also perform a dextran pulse-chase prior to addition of bacteria to test if we can observe macropinocytic delivery to bacteria-containing phagosomes - perhaps using E. coli as their elongated shape may help phagosome visualisation.

      In the previous study from the authors' group (Buckley et al., PLoS Pathog, 2019), it was shown that the accumulation of V-ATPase on phagosomes begins immediately after internalization in both PIKfyve mutant and WT, although V-ATPase accumulation reaches only half of the levels seen in WT. This partial accumulation of V-ATPase differs from the almost complete absence of Rab7 recruitment found in this study, which raises the question of whether there exists yet another population of fusogenic vesicles that are positive for V-ATPase but negative for Rab7. This could be checked by simultaneously examining the dynamics of V-ATPase and Rab7 during yeast phagocytosis in the PIKfyve mutant.

      We agree with the referee that there are multiple pools of V-ATPase, and we show that there is both a very early PIKfyve-independent recruitment of both V-ATPase and Rab7 as well as a later and more substantial pool delivered in a PIKfyve-dependent manner. It is clear that V-ATPase and Rab7 do not always co-localise however - the clearest example being on the contractile vacuole, which has lots of V-ATPase but no Rab7 (the large bright magenta structure in Fig 2G.).

      We suspect that the dramatically reduced, but not completely absent levels of both V-ATPase and Rab7 recruitment in the absence of PIKfyve are similar, but the challenges with imaging these very small low levels means we cannot formally exclude that there is a pool of V-ATPase vesicles that lack Rab7 which fuse to very early phagosomes. Nonetheless, as we will already be looking at V-ATPase and Rab7 in PIKfyve KO's in the experiments above will also attempt to unequivocally differentiate a pool of V-ATPase positive/Rab7 negative vesicles fusing with phagosomes.

      Reviewer 2:

      (1) The authors show that deletion of PIKfyve results "in an almost complete block in Rab7 delivery to phagosomes" (page 17) indicating that the delivery of Rab7 depends on fusion with Rab7-positive structures. This would suggest that the Rab7-GEF Mon1-Ccz1 is not localized to the membrane of the phagosomes. Could the authors test for the presence of Mon1-Ccz1 in either fluorescence microscopy experiments or on purified phagosomes to exclude the possibility of a "canonical" Rab7 recruitment by its GEF? If the GEF is found on phagosomal membranes it would indicate that a Rab-transition from Rab5 to Rab7 occurs on the phagosome during maturation, but on a low level. The later fusion event might be a homotypic fusion of two Rab7-positive compartments. The observed fusion events could still deliver the bulk of Rab7 and other endolysosomal proteins to the phagosome. If the Rab7-GEF is not found on phagosomes how do the authors envision that the organelle keeps its identity? Is it solely dependent on PI(3,5)P2? What is the fate of the Rab7-negative phagosome in ∆PIKfyve cells if Rab7 is not delivered to the membrane, is there degradation happening over longer periods of time?

      This is an excellent suggestion, for which we thank the reviewer. Mon1 and Ccz1 are highly conserved, with clear Dictyostelium orthologues that have never been studied. Our model is that there is a small proportion of Rab7 driven by this canonical pathway so would expect Ccz1/Mon1 to coincide with loss of Rab5 and be unaffected by loss of PIKfyve - although subsequent Rab7 delivery would be lost. This is easy to test by cloning and expressing GFP-fusions of both Ccz1 and Mon1 and would be highly informative. Note we do not exclude canonical Rab7 recruitment in our model (see discussion), our data just indicate this has a minor contribution.

      Reviewer 3:

      The focus is on their manuscript is loading of Rab7 on phagosomes, but there's no indication about Rab7 activation (GTP-loading). Would the RILP-C33 probe work in Dictyostelium? If not possible, the activation state of Rab7 should still be discussed. Despite Rab7 on other organelles in PIKfyve-inhibited cells, is this active or not?

      The GTP-loading status of Rab7 is a good question, although the general dogma is that membrane-localised Rabs are active. We will try the RILP-C33 probe in Dictystelium as suggested, but as these cells lack an endogenous RILP orthologue there is a high chance it will not work. Sadly, reliable tools to asses active Rab status are a general limitation for the field, so if the RILP-C33 probe does not work we will add this caveat to the discussion.

      The authors need to better address the confusing kinetics of early Rab7 recruitment, followed by SnxA (Fig. 4G, same for VatM - Fig. 4I ) - which is counterintuitive if PIKfyve activity is required to recruit Rab7. How do the authors explain this? Are phagosomes prevented from acquiring Rab7 in PIKfyve deficient cells because of a defect on phagosomes or the endo-lysosomes loaded with Rab7 (but not active).

      We believe this again relates to the over-simplification of our model. Our data indicate both PIKfyve dependent and independent Rab7 recruitment. In contrast to the abrupt recruitment of SnxA at ~120 seconds (Vines et al. JCB 2023), both Rab7 and VatM accumulate gradually over time starting from almost immediately following engulfment (Buckley et al. 2019, and Figure 2F). Our data indicate that the first stage of this is PIKfyve independent, and is responsible for ~10% of the total Rab7/V-ATPase accumulation by both the imaging in this paper, and Western blot for V-ATPase on purified phagosomes in Buckley et al. PLoS pathogens 2019. The arrival of some Rab7/V-ATPase prior to PI(3,5)P2 therefore supports our model where there are multiple sources of Rab7.

      As the reviewer quite rightly points out, interpretation of the defects observed in the absence of PIKfyve becomes complex and we cannot completely differentiate between a defect on the phagosome, or the Rab7 compartments that fuse with them (or indeed both). In fact, we already note that small Rab7 compartments that we observe in wild-type cells are much more sparse in PIKfyve mutants. Therefore whilst the requirement for PI(3,5)P2 in the clustering and fusion of macropinosomes with phagosomes is clear, additional effects on the PI(3,5)P2-independent Rab7 compartments cannot be excluded.

      The experiments above using the RILP-C33 active Rab7 biosensor as well as observation of the Mon1/Ccz complex should further clarify this, but we will also add further discussion of these points.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer 1:

      Minor comments.

      1. It is unclear how the experiment in Figure 3G was conducted. If microscopic analysis was involved, the corresponding images should be included.

      We apologise that we overlooked this and have now added a full description in the materials and methods (P8 L16-21). Fluorescence measurements were performed using a plate reader, so there are no images.

      Page 11-Line 2, the sentence "there was no obvious clustering around the nascent phagosome (Figure 2D)." It is Figure 2E, not Figure 2D.

      Corrected.

      There is an inconsistency regarding the description of fluorescent fusion proteins. For example, both GFP (RFP)-2xFyve and 2xFyve-GFP (RFP), as well as GFP-Rab5 and Rab5-GFP, were used. Typically, placing GFP (or RFP) before a gene suggests N-terminal tagging, while placing it after the gene implies C-terminal tagging. The authors should clarify the position of the fluorescent tag and ensure consistency in their descriptions.

      We apologise for this oversight, and have been through and corrected all fusion protein references accordingly.

      One of the videos was not referred in the manuscript or described in the Video legends. This video seems to correspond to Figure 5A, albeit with a different pseudo-color scheme.

      This has been corrected. Video 7 does correspond to Fig 5A, and we have corrected the colour scheme to match and added references to the video in the text and figure legend.

      Reviewer 2:

      (2) In their abstract, the authors state that they "...delineate multiple subpopulations of Rab7-positive endosomes that fuse sequentially with phagosomes" (page 2, line 14,15). However, the data provides only evidence for V-ATPase or PI(3,5) P2-containing structures and the authors conclude to my understanding that macropinosomes are the main source for vesicular structures fusing with phagosomes. I would ask the authors to please be clear on the identity of the "Rab7-donor"-structures throughout the manuscript. Saying that they delineate multiple subpopulations of endosomes seems to be overstated.

      We identify that macropinosomes are one source (subpopulation) of Rab7/PI(3,5)P2 vesicles but our data clearly show that they are the only source of Rab7 - there is clearly an additional early Rab positive / PI(3,5)P2-negative subpopulation of vesicles that cluster and fuse too at earlier stages. For example, in Figure 4F we co-express Rab7a/SnxA and show that whilst all the SnxA vesicles also contain Rab7 (and dextran), there is a clear separate population of small and early-fusing population of Rab7-containing vesicles that do not possess PI(3,5)P2. This is further validated in Figure 5B and C. To our mind this clearly demonstrates and defines different Rab7 endosomal populations, although we do not yet know the origins of the initial Rab7-positive/PI(3,5)P2 negative population - as discussed in our response to their point (3) below.

      Minor points:

      (1) The sentence "...which both deactivates and dissociates Rab5, and recruits and activates Rab7 on endosomes" is at least problematic as it suggests that Mon1-Ccz1 directly drives GTP-hydrolysis of Rab5 and dissociates it from the membrane. Indeed, Mon1-Ccz1 is shown to interfere with the positive feedback loop of the Rab5-GEF by interacting with Rabex (Poteryaev et al., 2010), so a rather indirect effect of Mon1-Ccz1. A GAP and the GDI are needed for Rab5 deactivation and dissociation from the membrane. How both are involved in the endosomal Rab-conversion is not clarified.

      We have changed the text to better represent this complexity (P4 L4-6)

      (2) Signals of RFP-labeled proteins are difficult to interpret throughout the experiments. What are the structures that show a strong accumulation of red signal in Fig. 1A,B, Fig 2G and Fig4A (20sec.) If these are fluorescently labeled proteins it would suggest that most of the proteins cluster/accumulate in the cell. Can the authors provide better images?

      We appreciate that some of these reporters with multiple localisations can be difficult to interpret. This is major challenge for these sort of studies and main reason we use the large and easily-identified yeast containing phagosomes for quantification. In Fig. 1 the large structure is the large peri-nuclear cluster of Rab5 previously reported (Tu et al. JCB 2022). In Fig. 2G the bright structure is the recruitment of V-ATPase on the CV. Both these large structures easily distinguished from the phagosomal pool we are interested in. Whilst we would love to provide better images, this is simply not possible - both these other structures are unavoidable and we are already using some of the best microscopy methods available. We have however clarified the additional localisations seen in these images in the revised figure legends.

      (3) On page 11 the authors state "...macropinosomes in ∆PIKfyve cells still appeared much larger. Quantification of their size and fluorescence intensity demonstrated that although macropinosomes started off the same size,...". This statement is not reflected in the data depicted in Fig. 3A,B. The size of the single labeled macropinosome appears to be larger in wildtype than in ∆PIKfyve cells from the beginning on. However, the quantification in Fig 3F is clear. So, are these bad examples in 3A,B, are they swapped or is this due to the additional expression of GFP-Rab7A? Could you please comment on the effect that the (over-)expression of GFP-tagged Rab-GTPases might have on the observations described in this paper in the discussion part?

      As you can see from the error bars in Figure 3F, macropinosomes are extremely variable in size - ranging from ~0.2-5 microns in size in axenic Dicytostelium. The image in Figure 3B is therefore indicative of this heterogeneity, rather than being a "bad example". This is why we designed the experiment to quantify several hundred vesicles in order to make any conclusions - as well as doing it in the absence of any GFP-fusion expression.

      Although we have not noticed any issues (enlarged vesicles are also clear in GFP-Rab7 expressing cells in Figure 1B), we do of course accept that GFP-Rab7 expression itself may have some detrimental effects on maturation and this is why we quantified macropinosome size in untransformed cells. We have clarified this in the results section (P12 L28).

      (4) In Fig. 6E it is hard to distinguish if the dextran is accumulating inside the phagosome. I would suggest conducting a 3D reconstruction of these images to allow judging if macropinosomes fused with the phagosomes or if they cluster around the neck of the phagosome.

      This would be nice, but not possible as these images are from single confocal sections, rather than a complete high-resolution Z-stack. We have however added an enlargement of both Figure 6D and E which we feel now more clearly shows the presence of dextran within the bounding PI(3)P membrane of the phagosome.

      (5) In the discussion, the authors state that the small pool of "PIKfyve-independent Rab7" is "insufficient to for subsequent fusion with other Rab7A-positive compartments, further Rab7 enrichment, and lysosomal fusion." What is the rationale for this conclusion? Is it shown how many Rabs are necessary to induce a tethering and fusion event? It would be good to revise this part of the discussion also in respect of the first major point of my comments above.

      Our data show that in the absence of PIKfyve, phagosomes still remove Rab5 and gain a small pool of Rab7 but progress no further. This is consistent with some block in the HOPS-mediated homotypic fusion of Rab7 compartments. However, we accept that this is not necessarily due to simply not having enough Rab's so have rephrased the discussion accordingly.

      (6) The intention of the paragraph about phagosomal ion channels is for this reviewer somehow out of context. It is not clear to me how the authors relate this to their findings. It would be could to bring this into a broader context.

      __ __We mention ion channels in the background as they represent the main class of PI(3,5)P2 effectors known so far. We feel this is important background context, even if our studies do not directly relate to this.

      Reviewer 3:

      Their disclosure and use of statistics is incomplete and/or inconsistent, and potentially wrong in some cases. For example, the authors disclose the number biological repeats in a few experiments (Fig. 3C, F) but not in the majority. Instead, they state the number of phagosomes without indicating biological repeats (eg. Fig. 2 and others). So, it is not possible to know if their data are reproducible. Despite not indicating independent experiments in some cases, they speak of SEM, which applies to mean of means from biological repeats. In other cases, none of this is disclosed (eg Fig. 3G). Often there is no indication of what statistical test was done OR if a statistical test was done (eg. Fig. 3G, Fig. 4, etc). I would recommend the authors review the excellent resource paper published in JCB on SuperPlots to better follow statistical expectations. This is essential to improve reproducibility and confidence in their observations.

      We apologise if this was unclear for the referee, but we have tried to be clear in each case. The confusion likely lies in the definition of a biological repeat, which depends on the type of experiment. For quantification of phagocytic events over time, we feel it reasonable to take each individual event (each from an individual organism) as a biological repeat. This is because events are relatively rare and taken from multiple different movies, and it is not technically possible to film both mutants and controls simultaneously. In all these sort of experiments (e.g. Figure 2) we have shown standard deviation, which indicates the reproducibility between phagocytic events. We have clarified that these events are from movies obtained on at least 3 independent days in the methods.

      In other cases, such as Figure 3C and F and Figures 5-6, we are able to take measurements across multiple cells simultaneously at each timepoint. It is therefore appropriate to average over multiple independent experimental repeats rather than individual cells. We have therefore used SEM in our analysis, and both the number of individual cells and independent repeats are stated on the graphs and legend. This was incomplete in a few cases but has now been clarified in all cases.

      Regarding statistical tests, which ones were used now been clarified in each figure legend. Note that in Fig 3G, we do not apply any test as both lines essentially overlap and it is clear there would not be any convincing differences. In Figure 4, the graphs all compare co-expression of different reporters rather than different mutants or conditions and are from single events. We therefore feel statistical tests are unnecessary and inappropriate. Comparison of the same reporters between strains averaged across multiple events, with statistical analysis is shown in Fig 2 instead. All these points have now been added to the statistics section of the methods (P9 L1-6)

      Minor Comments

      It is interesting that 2FYVE-GFP stays on phagosomes for 50 min or more - this is distinct from macrophages. Please comment. Have the authors tried other PI(3)P probes to see if the same (PX-GFP).

      We have not used other probes but we have no reason to believe 2xFYVE does not behave as predicted as it is the same probe used for most macrophage studies (FYVE domain from human Hrs), and gets removed from macropinosomes exactly as expected. We did not originally comment in this manuscript but PI3P dynamics are even more interesting as our previous data indicate that latex-bead containing phagosomes lose PI3P after 10 minutes (Buckley et al 2019, Figure 4F-G) This indicates phagosome maturation can be regulated by the cargo (under further investigation). Importantly however, both bead and yeast-containing phagosomes have comparable defects in the absence of PIKfyve. This is more fully discussed in our previous paper (Vines et al. JCB 2023) where we characterise PI(3)P and PI(3,5)P2 dynamics in more detail.

      Fig. 7 model: the macropinosome in the diagram seems like a dead end as depicted - is there any arrow or change that could be added to show that it doesn't just sit there in the middle? Also, the light green on yellow hurts the eyes!

      We apologise, there was actually supposed to be an arrow there but it was lost somewhere in the drafting process. The whole figure has now been updated to more clearly describe our full and more complex model.

      Fig. 3F, could be converted to volume assuming macropinosomes are spheres.

      This is true, however as these images are taken from single planes we cannot know where in the sphere the slices are and therefore what the maximum diameter would be. We therefore prefer to keep it as area so as not to confuse and over-interpret the data.

      Pg. 10, line 10 - Vps34 is Class III PI3K, not Class II.

      Corrected.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      • *

      Reviewer 2:

      (3) ("OPTIONAL") Optionally, the authors could also try to clarify these structures' identity by including further colocalization studies with additional early and late endosomal marker proteins. Are they for example positive for early or late endosomal markers like EEA1, ESCRT or Retromer? How about organelle-specific SNAREs? This would give further insights into the character of the "Rab7-donor" structures and would allow to clarify if multiple subpopulations are contributing to phagosome maturation in a sequential order as stated in the abstract. As I am not an expert on Dictyostellium I can`t estimate the effort that would go into such an experimental setup. However, since the time scale of the events in the cell is nicely worked out in this study, these colocalization studies would not need to be conducted as live-cell microscopy experiments.

      This is a sensible suggestion that would in theory help define these populations. However many of these markers are poorly defined with respect to phagosomes and/or Dictyostelium. Dictyostelium does not posses an EEA orthologue, but our data also indicate that these vesicles do not possess PI3P so cannot be canonical early endosomes. We have previously characterised WASH/retromer and whilst it is recruited to phagosomes at around the time of Rab5/7 transition Retromer appears to be recruited from the cytosol and drive recycling rather than being delivered on endosomes that fuse (see King et al. PNAS 2016). We have also previously looked at ESCRT (Lopez-Jimenez et al. PLoS Pathogens 2018) which also does not appear to have any recruitment to early phagosomes that would be consistent with a Rab7-sub-population. The SNAREs are yet to be studied in any detail, as they are often too divergent to assign a direct mammalian orthologue.

      Therefore, whilst this is a sensible suggestion, and something we would like to follow up in the future, this is not straight-forward and we feel outside the scope of the current study. We have however included additional discussion of this in the revised manuscript (P20 L21-26).

      Reviewer 3:

      Major Comments:

      1. Based on the current data, I am not entirely convinced that Rab7 is delivered mostly by fusion with other compartments. At least the data as provided cannot exclude other models. For example, Rab7-containing organelles that cluster with phagosomes may form contact sites that provide a local environment to load cytosolic Rab7. There's also a possibility that some of their Rab7 clusters are membrane sub-domains and not vesicles. Or perhaps, there is a first wave of cytosolic Rab7 recruitment, which then initiates fusion with Rab7 compartments, i.e., there is a two-phase Rab7 recruitment. While this last possibility is consistent with recruitment of Rab7 by fusion (the second phase), the authors present a model that is too simplistic and conclusive based on the data. The authors may be right, but they need to strengthen their evidence towards their claim. Maybe EM could help determine some of these issues. Perhaps better would be the use of FRAP, photo-activation, or optigenetics of Rab7. For example, if Rab7 is acquired on phagosomes after photobleaching clusters of Rab7, this would suggest a cytosolic Rab7 contribution, and if not, this would support their model. I recognize that these experiments are not necessarily trivial, but either the authors augment their data (as suggested or with other approaches) or significantly pare down their conclusions.

      We agree with the Referee that we cannot completely exclude other models, and as we talk about in the discussion, we do not wish to do so. We apologise if the role of fusion was over-stated but the model we propose is as the referee suggests: there is likely an early first wave of canonical Rab7 recruitment from the cytosol that is independent of PIKfyve before the majority of Rab7 is subsequently delivered by fusion in a PIKfyve-dependent manner. Our data indicate that the second wave is both quantitively and functionally more significant (see functional data in Buckley et al. 2019).

      We do however agree with the referee that we cannot formally exclude things such as contact-site mediated recruitment from the cytosol or sub-domains but not fusion however there is no data to support these either. In contrast, the hypothetical clustered Rab7 contacts/subdomains often (but not always) contain the transmembrane V-ATPase complex (Figure 2G) which must be delivered by fusion.

      However we do not wish to over-simplify our conclusions and as we state in the discussion, we do think there is probably a small amount of Rab7 recruited from the cytosol by the canonical pathway. We accept that our cartoon in Figure 7 is over-focussed on fusion so we have substantially revised this, as well as the discussion to give a more balanced and complex view.

      Regarding the proposed experiments, unfortunately, the imaging required to acquire these movies is already at the very limit of what is possible so we do not believe it would be technically feasible to employ methods such as FRAP and optogenetics on these relatively fast-moving phagosomes with the temporal resolution required. Furthermore, to differentiate recruitment from a cytosolic pool, every GFP-Rab7 cluster would need to be photobleached, which could not be reliably achieved.

      However, this point will be largely addressed by the suggestion of Reviewer 2 to look at the Mon1/Ccz complex. The presence or absence of this will give strong evidence for canonical Rab5/7 transition and Rab7 recruitment from the cytosol which would significantly clarify our model and define the two different mechanisms of Rab7 recruitment to phagosomes.

      Early macropinosomes fuse with early phagosomes more readily than 10-min old macropinosomes. Do 10-min old macropinosomes not fuse with older phagosomes? Is this not an issue of mismatched age?

      This is an interesting point that we have clarified in the text. We agree with reviewer that it appears the ages of the macropinosomes and phagosomes must match but our data indicate this only occurs when both parties possess PI(3,5)P2 as macropinosome fusions appears to happen in a single burst at about 240 seconds (Figure 6F) rather than as a continuous process. We also do not start to see any fusion of these older macropinosomes when the phagosomes get past the initial first 10 minutes of maturation (Figure 6G).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1) List of the detailed experiments we plan to perform (including aforementioned experiments):

      • Careful analysis of the daughter cell size by measuring the real volume.

      • Quantifications of PCM (pericentrin and γ-tubulin) proteins and Plk1 with respect to centrosome age in G2 and metaphase (for Plk1) cells.

      • Analysis of the amount of Plk1 of metaphase cells when cenexin protein is absent (siControl vs siCenexin), and measurements of Plk1 in WT-cenexin vs. cenexinS796A mutant to test if Cenexin controls a subpool of Plk1 at centrosomes.

      • Careful analysis of Ctrl and TPX2 depletion experiment data in 1:1 cells. We plan to repeat the experiment to confirm or infirm on the contribution of TPX2 in spindle asymmetry.

      • Measurement of the PCM volume/intensity in 2:2 and 1:1 metaphase cells, to highlight on the contribution of the daughter centrioles in recruiting PCM proteins.

      • Live cell imaging of 2:2 cells and measurements of different parameters; cortex-to-centrosome and spindle pole to metaphase plate (half-spindle (a)symmetry) distances.

      • Long-term live cell imaging of 2:2 cells to investigate whether the asymmetry in centrosome-age dependent daughter cell size also affects the duration of the ensuing cell cycle. While we have carried out such long-term movies in the past, we are aware that they can be challenging due to high cell mobility over longer time courses.

      • Investigation of the microtubule nucleation capacity under different conditions of PCM protein depletion (depletion of Cdk5rap2 and/or pericentrin).

      • Analysis of the effect of the over-expression of PCM protein (Cdk5Rap2) on the (a)symmetry of the mitotic spindle size


      2) detailed answers (in green) to the reviewers’ comments:


      __Reviewer #1: __

      __(Major points) __

      1. The discovery of differences in half-spindle size during symmetric division is intriguing. However, the methodology for quantification of the data remains unclear. Key questions, such as how the center of the metaphase plate is determined from the image data, the definition of exact pole position when centrioles are located at spindle poles, the objective determination of daughter cell diameter and width from the image data, and the referential position of the cortex, need more detailed explanation in the manuscript. Additionally, it's crucial to elucidate the specific index used to quantify differences from the image data, especially when dealing with data that only varies by a few percent. Providing clarity on these aspects and, in some cases, re-quantifying the data should be necessary.

      We have already included clearer explanations in the method parts and results part about our methodology and will include a supplementary figure on how precisely we defined and measured the half-spindle sizes, as well as the index used for the asymmetry (using a methodology that we previously used in Dudka et al., Nature Comm., 2018). In addition, we will use a second method to measure the real daughter cell volume.

      The mechanism behind the difference in half-spindle size, related to the subdistal appendage (SDA), raises questions, especially considering that SDA is believed to disassemble during mitosis. Exploring whether differences in the localization of PCM components and half-spindle size result from disparities in Plk1 and PCM loading during G2/early mitosis, prior to SDA disassembly, necessitates experimental verification.

      As suggested by the reviewer we will quantify the amounts of PCM proteins on the old and young centrosome in G2 cells (and therefore prior SDA reorganization). This will also allow us to test whether the asymmetry depends on the SDA themselves, or the corresponding SDA proteins, which still accumulate specifically on the oldest centrosomes during mitosis

      For investigating the mechanism of half-spindle size asymmetry, many perturbation experiments employ knock-down techniques. To directly address the cause of asymmetry, it might be valuable to artificially localize Plk1 and PCM factors to one spindle pole using optogenetic tools or similar approaches and then quantify half-spindle and daughter cell sizes.

      We thank the reviewers for this suggestion, as it could indeed, be of great interest and provide a direct proof of principle. Unfortunately, based on our experience in establishing such a cell line we know that just the generation of such a light-manipulated stable cell line that contains markers for centrosomes and chromosomes or kinetochores takes 6-9 months, in the best-case scenario. This experiment is therefore not possible within a normal revision round (even if extended to 6 months).

      The asymmetry in Plk1 sub-population recruitment by SDA triggers the observed effects, but the evidence for this is relatively weak, given the small difference in spindle asymmetry. Quantifying the amount of Plk1 in its activated form, particularly in the context of SDA dismantling during metaphase, could strengthen this aspect of the study.

      While the commercial antibodies against the activated form of Plk1 (phospho-T210) work very well by immunoblotting, we have not been able to get it to work by immunofluorescence. We will nevertheless, test whether variation in the fixation methods can solve this issue. Alternatively, we will test to which extend depletion of Cenexin, or the presence of Cenexin WT vs the non-phosphorylatable Cenexin mutant affects the overall population of Plk1 on both spindle poles.

      While the focus on half-spindle size asymmetry during symmetric division is intriguing, it's important to address the broader physiological significance. The primary outcome of this asymmetry is differences in daughter cell size, which limits the broader significance of the study. Furthermore, the quantification method for daughter cell size warrants scrutiny and clarification.

      As mentioned above, we will use different method to measure and investigate daughter cell size (a)symmetry. Moreover, we will attempt with long-term live cell movies to test whether the variation in centrosome-age dependent daughter cell size also affects the duration of the ensuing cell cycle.

      (Minor points)

      1. Table 1 lists factors with asymmetric localization not analyzed in detail in this paper. It would be beneficial to discuss whether these factors play a role in spindle asymmetry, and the authors should address the completeness of the data in Table 1 in terms of selecting factors for analysis.

      We agree with this comment that other factors may participate in the regulation of spindle asymmetry. However, we performed this screening to identify key drivers of spindle (a)symmetry based on an investigation of the Pearson’s correlation coefficient and the value of slope.

      In addition, some of these proteins are known to control spindle size in acting in a same pathway (TPX2/Kif2A/Katanin) and (Pericentrin/CDK5RAP2/ϒ-tubulin). We will incorporate these points and the reasons for our selection in the discussion

      In Figure 1H, the impact of centriolin knock-out on the distribution of unaligned polar chromosomes is different from the effect of cenexin S796A in Figure 6H. This difference should be explained to provide clarity on the observed discrepancies.

      We will better explain this difference.

      In Figure 2A, there is no correlation data presented between daughter cell asymmetry and the presence or absence of cenexin signal. This relationship should be elucidated for a more comprehensive understanding.

      We will clarify this point. Specifically, we plotted the daughter cell symmetry index for 2:2 and 1:1 cells with respect to centrosome age. All the daughter cells display the presence of a cenexin signal at both grandmother and mother centrioles with a difference in fluorescence intensity that enables us to assign them to “old” vs “young centrosomes. We found a significant result indicating that there is a relationship between centrosome age and the formation of daughter cell with different sizes.

      In Figure 4G and H, the mean value of spindle asymmetry increases with siRNA treatment of Cdk5Rap2 or PCNT compared to the control. The possible interpretation of this finding should be discussed.

      This is an interesting observation that needs to be discussed in our revision.

      Figure 4K shows that the asymmetry of PCNT distribution is not eliminated by centriolin knock-down. This observation requires clarification and discussion.

      It has been shown that pericentrin is directly recruited by Plk1 at centriole (Soung et al., 2009). In addition, pericentrin has a PACT-domain that directly targets pericentrin to the centriole (Gillingham and Munro., 2000). Moreover, it has been demonstrated that the grandmother centriole is slightly longer than the mother one (Kong et al., 2020). Altogether, this suggests that the old and young centrosomes, based on this intrinsic property, may recruit different amount of pericentrin.

      We will add this explanation in the discussion.

      It appears that the difference in spindle asymmetry of the control group in Figure 5A is smaller than in other data. This discrepancy should be addressed. Additionally, the influence of TPX2 depletion on spindle formation, and any corresponding spindle staining data, should be included.

      This point will be discussed in the revised version of the manuscript.

      Claiming that the daughter centriole recruits PCM based on Figure 6A data alone may require additional supporting evidence. It is essential to investigate whether there is a clear PCM signal when the daughter centriole disengages in late mitosis and maintain consistency in the interpretation.

      As suggested by the reviewer 2, we will measure PCM volume/intensity in both 2:2 and 1:1 cells to demonstrate that daughter centrioles directly recruit PCM proteins.

      The lack of difference in TPX2 distribution in Figure 7E should be explained, along with a discussion of how this observation aligns with the spindle asymmetry data and any inconsistencies.

      We will discuss this point in the revised manuscript.

      The differing N numbers between samples in all the figures may affect the validity of comparisons. The authors should discuss whether it is necessary to have consistent N numbers in each experiment for more robust conclusions.

      Indeed, this is an important point that must be discussed.

      Reviewer #2____:

      Major comments:

      1) It is not completely clear how the authors determined whether a spindle was asymmetric or not. In the methods, they say that statistical tests are described in the legends. In Figure 1 legend they say: "Each condition was compared to a theoretical distribution centered at 0 (dashed line)". How did they generate this theoretical distribution?

      As explained under point 1 of reviewer 1, we will provide a more thorough explanation of our methodology and how we decide whether a spindle is symmetric or not. In brief, a perfectly symmetric spindle would yield an asymmetry index of 0, as there is no difference between the two half-spindle sizes.

      2) The authors claim that TPX2 depletion results in loss of spindle asymmetry in 1:1 cells, but the difference is very small (1.7% in control vs 1.3% in TPX2 depletion, Fig 5B) and the data is more variable in TPX2 depletion, which makes it less likely that a statistically significant difference from 0 would be found. Firstly, perhaps the authors could check the standard error of the mean, which provides a measure of how accurate the mean is with regard to N and variation. If a dataset is more spread (such as in TPX2 depletion) a higher N is required to attain the same accuracy in the mean value. This is normally not so important when directly comparing two datasets, but in this case the authors are comparing each dataset to 0. So, are the authors measuring enough cells in the TPX2 depletion to be sure that a 1.3% value is not significantly different from 0? Secondly, I don't understand why the control cells have such a low asymmetry index (1.7%), when previous data in the paper shows an asymmetry index of 4.1% (Fig 1D) and 3.4% (Fig 4E) in control 1:1 cells. This suggests that something about the way this experiment was carried out dampens the asymmetry, which could therefore lead the authors to conclude that TPX2 is more important than it really is.

      We agree with this comment, the mean of the control condition is smaller compared to others controls. As mentioned above, we will carefully look at the data (SD vs SEM) and in case add a new replicate to confirm or infirm the involvement of TPX2 in the formation of asymmetric spindles.

      3) The authors claim that daughter centrioles are associated with some Pericentrin and suggest that this may be why 2:2 centrosomes have less of an asymmetry than 1:1 centrosomes (Fig 6A). It is unclear whether the authors consider these daughter centrioles as being prematurely disengaged (they make reference to the fact that they previously showed how disengaged daughters recruit γ-tubulin, but it's unclear if this is related to their current observations). In Figure 6A, the Centrin spots look too far apart for engaged centrioles (~750nm). I appreciate that this may be the only way to dectect Pericentrin around the daughter at this resolution, but it may also force the authors to select cells where the centrioles have prematurely disengaged. For the asymmetry measurements, the authors presumably did not select cells where they could distinguish mother and daughter centrioles. One way to address this issue would be to compare PCM size at centrosomes in 2:2 cells with centrosomes in 1:1 cells. The expectation would be that centrosomes in 2:2 cells would have more PCM, due to the contribution of the daughter centrioles.

      We agree that on those high-resolution images the daughter centrioles seem to be far from the mother ones. The metaphase cells presented in this figure, are wild-type non-treated cells for which the daughter centrioles are engaged. Indeed, our own investigation of the centriole engagement status by expansion microscopy, indicates that over 98% of centriole pairs in metaphase RPE1 cells are engaged.

      Nevertheless, as suggested by the reviewer and to validate that daughter centrioles participate in this process, we will compare PCM size in 2:2 and 1:1 metaphase cells.

      4) The authors show that Plk1 recruitment by Cenexin (via S796 phosphorylation), which happens only at mother centrosomes, is important for asymmetry. Nevertheless, they show that Plk1 is symmetrically distributed between mother and daughter centrosomes (Table 1). This does not really fit, unless daughter centrosomes recruit more cenexin-independent Plk1 than mother centrosomes or if the cenexin-bound pool of Plk1 is only a minor fraction of total Plk1. If so, do the authors think that the Cenexin-bound pool of Plk1 is more potent than the rest of centrosomal Plk1?

      As indicated in point 4 of reviewer 1 we will test which proportion of the Plk1 pool at spindle poles depends on the presence of Cenexin, as we suspect that this Plk1 population is only a subpopulation.

      5) The circles drawn to measure cell size in Figures 2A,E and 7C do not look like a good representation of cell area (as the cells are not perfectly round). The authors use a formular for circle area with an approximation of the radius (based on mean length/width of an oval. It would be much better to use ImageJ to draw a freehand line around the perimeter of the cell and use the in-built tool to measure the area.

      As mentioned in point 1 of reviewer 1 we will use another method to measure daughter cell size.

      Minor comments:

      1) Asymmetry in centrosome size that correlates with centrosome age in apparently symmetrically dividing "cells" has been observed previously in Drosophila syncytial embryos (Conduit et al., 2010a, Curr. Bio.). I think this should be mentioned somewhere given the topic of the study.

      We thank the reviewer for this information. This paper will be discussed in the revised version.

      2) A full description of statistical tests and n numbers for each experiment should be provided in the methods, even if this duplicates information in the Figure legends.

      We will add this information in the method.

      OPTIONAL EXPERIMENTS:

      3) Given that chTOG is very important for microtubule nucleation, it seems strange that this protein was not analysed for a potential asymmetry.

      As suggested by the reviewers we will test for a potential chTOG asymmetry and its impact on spindle size asymmetry.

      4) Cooling-warming experiments could be done using higher concentration of formaldehyde, as it's likely that microtubule nucleation is not immediately halted when using 4% formaldehyde.

      The fixation solution was chilled at 4°C, which should halt any further depolymerization. We will specify this point in the Material and Methods section.

      Reviewer ____#____3:

      Major points:

      1) The evaluation of spindle and cell size asymmetry related to centrosome age only relies on fixed sample preparation. Cells should be followed by time-lapse microscopy as the metaphase plate position relative to the spindle poles and/or the cell cortex may fluctuate over time and as the observed differences remain in a very subtle range. This is an important possibility to consider for 1:1, 1:0 or 0:0 spindle pole configurations where centrosome integrity is impaired.

      We agree with the reviewer that this is a drawback of our approach, but the experiments the reviewer suggests is not possible for 1:0 or 0:0 or only in an approximate manner. Indeed, we do not have a centriole-independent spindle pole marker that would allow us to mark precisely the position of the spindle pole. In the past we used Sir-tubulin, which gave us an approximate position of the spindle poles, and which allowed to us monitor the spindle asymmetry over time of 1:0 cells (see Dudka et al., 2019), a point that we will discuss. Nevertheless, as suggested by the reviewer we will attempt to monitor these asymmetries in 2:2 and/or 1:1 cells expressing GFP-Centrin1 and GFP-CENPA (kinetochore marker) in WT conditions. Indeed, we cannot expand this approach to all the conditions, as the calculation of the spindle asymmetry index is based on a very high number of cells, and the monitoring of spindle asymmetry can only be achieved by selecting mitotic cells one-by-one and then monitoring them over a short period of them (Tan et al., eLife, 2015), which makes such an approach extremely time-consuming.

      2) Cell size asymmetry was evaluated based on cell area at the equator. Volumes will be a better indicator as daughter cell shapes can be different in telophase if they do not re-adhere at the same speed. This evaluation should also be confirmed with another readout, like the position of the cleavage furrow relative to the spindle poles in late anaphase, as again the observed differences are in a very subtle range.

      As indicated in the similar points of reviewer 1 and 2, we will improve our methodology to take this comment in account

      3) The authors propose that differential microtubule nucleation at the spindle poles underlies spindle size symmetry breaking without providing direct evidence. If the observed spindle symmetry in the 1:1 configuration after pericentrin, CDK5RAP2 or g-tubulin siRNA fuels this interpretation (Fig4C), the differential microtubule nucleation capacity at the spindle poles after microtubule-depolymerisation-repolymerisation assays was not evaluated in these conditions, as compared to the control situation.

      As suggested by the reviewer we will analyze the microtubule nucleation capacity after the downregulation of PCM proteins.

      4) If differential microtubule nucleation at the spindle poles is responsible for spindle asymmetry, overexpression of PCM proteins or g-tubulin should be sufficient for re-establishment of symmetric protein distribution, spindle and cell size symmetry in 2:2 or 1:1 configuration. The authors should evaluate whether this is the case or not.

      This is an interesting suggestion, which we will test, although overexpression of these proteins might also lead to other defects in the spindle, such as multipolar spindles.

      5) The authors describe that the cortex-centrosome distance is not changed according to centrosome age (Fig2C), but centrosome-metaphase plate distance is (Fig1D). These observations are difficult to reconcile if differential microtubule-nucleation capacity is at play. Again, time-lapse microscopy would enable to detect over time whether only metaphase plate position relative to spindle poles is changing or if spindle pole position relative to the cell cortex is also fluctuating.

      We plan to give a try to image WT 2:2 cells by time lapse microscopy and to measure several parameters such as half-spindle size, spindle (a)symmetry and the cortex to centrosome distance over time.

      Minor points:

      6) Main PCM and MT nucleation protein "depletion" do not appear to impact spindle assembly, but only spindle symmetry in 1:1 and 1:0 configurations (Fig4A and 4F-H). Can it be explained by the fact that their depletion is not always total (for pericentrin, Fig5F versus FigS2A or Fig7G)? Can they comment on this point?

      Spindles displaying abnormal centriole number at spindle poles (1:1 and 1:0) can still assemble bipolar spindle in absence of the main PCM proteins (Chinen et al., JCB, 2021, and Watanabe et al., JCB, 2020).

      In our study, the depletion of PCM protein is almost total (97% for pericentrin, 98% for Cdk5Rap2).

      7) If centrosome age dictates spindle and cell size asymmetry through differential MT-nucleation capacity at the spindle poles, how can this process be modulated? Indeed, centrosome age is common to all cell types, but cell size asymmetry is more or less pronounced. The authors should further discuss this point based on the literature.

      We will discuss this point in the discussion.

      __ Description of the revisions that we have already carried out in the revised manuscript__


      1. The discovery of differences in half-spindle size during symmetric division is intriguing. However, the methodology for quantification of the data remains unclear. Key questions, such as how the center of the metaphase plate is determined from the image data, the definition of exact pole position when centrioles are located at spindle poles, the objective determination of daughter cell diameter and width from the image data, and the referential position of the cortex, need more detailed explanation in the manuscript. Additionally, it's crucial to elucidate the specific index used to quantify differences from the image data, especially when dealing with data that only varies by a few percent. Providing clarity on these aspects and, in some cases, re-quantifying the data should be necessary.

      We have already included clearer explanations in the method parts and results part about our methodology and will include a supplementary figure on how precisely we defined and measured the half-spindle sizes, as well as the index used for the asymmetry (using a methodology that we previously used in Dudka et al., Nature Comm., 2018). In addition, we will use a second method to measure the real daughter cell volume.


      __ Description of the experiments that we prefer not to carry out:__


      Point 3 of reviewer 1 : For investigating the mechanism of half-spindle size asymmetry, many perturbation experiments employ knock-down techniques. To directly address the cause of asymmetry, it might be valuable to artificially localize Plk1 and PCM factors to one spindle pole using optogenetic tools or similar approaches and then quantify half-spindle and daughter cell sizes.

      We thank the reviewers for this suggestion, as it could indeed, be of great interest and provide a direct proof of principle. Unfortunately, based on our experience in establishing such a cell line we know that just the generation of such a light-manipulated stable cell line that contains markers for centrosomes and chromosomes or kinetochores takes 6-9 months, in the best-case scenario. This experiment is therefore not possible within a normal revision round (even if extended to 6 months).


    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02172

      Corresponding author(s): Philip Elks

      [The “revision plan” should delineate the revisions that authors intend to carry out in response to the points raised by the referees. It also provides the authors with the opportunity to explain their view of the paper and of the referee reports.

      • *

      The document is important for the editors of affiliate journals when they make a first decision on the transferred manuscript. It will also be useful to readers of the reprint and help them to obtain a balanced view of the paper.

      • *

      If you wish to submit a full revision, please use our "Full Revision" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      • *

      In this paper we report the discovery that a member of the tribbles pseudokinase family, TRIB1 is expressed in human monocytes and is upregulated after stimulation with mycobacterial antigen in a human patient challenge model, the first direct link between immune cell Tribbles expression and innate immune response to infection. We then interrogated the mechanisms of Tribbles roles in TB using a human disease relevant whole-organism in vivo zebrafish model of TB. We show that specifically TRIB1 modulation can tip the battle between host and pathogen enhancing the innate immune response and reducing bacterial burden. We then uncover the molecular mechanisms responsible for the host protective effect of TRIB1, with enhanced antimicrobial reactive nitrogen species and il-1beta, via cooperation with Cop1 E3 ubiquitin ligase. Our findings demonstrate, for the first time, TRIB1 as a host moderator of antimicrobial mechanisms, whose manipulation is of benefit to the host during mycobacterial infection and as such, a potential novel therapeutic target against TB infection.

      We thank the reviewers for their positive appraisal of our work and for their helpful suggestions that will improve our manuscript. In particular we would like to highlight the reviewer’s comments on the gap/need for a new zebrafish in vivo model to understand the roles of tribbles in infection that can “be extrapolated into the human system”, and how they feel these findings will be of broad interest and “significance to cross section of the research community” attracting “interest from readers in the fields of infection, immunity, hematology and animal models” alongside “researchers studying all aspects of Tribbles pseudokinase function, especially researchers seeking models to test small molecule agonists and antagonists.”

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      • *

      Reviewer 1

      The major weakness of the manuscript is that the authors do not evaluate C/EBP transcription factors at all. It is rather surprising as they emphasize cooperation between Trib1 and Cop1 in the main title. C/EBP family proteins are key factors of Trib1-mediated modulation of granulocytes and monocytes. Also, slbo, a drosophila homolog of C/EBP, is a target of tribbles, indicating that the pathway is evolutionary conserved. I would request the following experiments and discussions.

      RESPONSE: We agree that possible C/EBP roles should be discussed in detail, and we will add a new discussion section on this.

      We stand by our data that the host protective mechanism of Trib1 acts requires Cop1, but we are not able to directly show a C/EBP mechanism within the scope of the current project due to a lack of tools/knowledge in the zebrafish on this (further points/comments below on this). It is important to note that we have not claimed a C/EBP mechanism in our manuscript, and we think it is possibly unlikely given that monocyte and granulocyte numbers are not altered after TRIB1 manipulation. Indeed, there are many other candidates other than C/EBP that COP1 could be acting through. Some examples include MAPK (Niespolo et al., Front Immunol, 2020), serine threonine kinases (Durzynska et al., Structure, 2017) and beta-Catenin (Zahid et al., Proteins, 2022).

      In response to this comment, we have modified the title from “Tribbles1 and Cop1 cooperate to protect the host during in vivo mycobacterial infection” to “Tribbles1 is host protective during in vivo mycobacterial infection”. We believe our data does show that the protective effect of Tribbles requires Cop1, but changing the title in this way removes any suggestion that they directly cooperate in the potential C/EBP dependent manner, suggested by the reviewer.

      Although the authors found the number of neutrophils and monocytes unchanged by Trib1 overexpression nor knockdown, they did not demonstrate the differentiation status of both cell types. This is quite an important issue, given that Trib1 knockout promotes granulocytic differentiation via C/EBPa accumulation in mice. Also, the analysis of granulocytic/monocytic differentiation will provide the crucial information how Trib1 protects the host from mycobacterial infection regulating hematopoietic cell functions. The authors should perform morphological analysis and examine cell surface marker expression to examine whether Trib1 and Cop1 modulates granulocytic and monocytic differentiation with and without Mm infection.

      RESPONSE: Unfortunately, we do not have the same level of immunology knowledge nor the antibodies to look at cell surface markers in zebrafish larvae (it is noted that the reviewer identifies that they “not have sufficient expertise in zebrafish models.” We agree with the reviewer that this would be an obvious and informative experiment to do in mouse models, but is not currently possible in zebrafish larval models). The transgenic promoters used (mpx for neutrophils and mpeg1) are robust and widely published to look at total neutrophil and macrophage numbers (Renshaw et al., Blood 2006; Ellett et al., Blood 2011). Mpx, encoding myeloperoxidase, is expressed late in neutrophil differentiation. It is also worth noting that the zebrafish larval model is still a developing organism, and neutrophil/macrophage numbers rise every day between 1 and 5 days post fertilisation, therefore any effect/delay in leukocyte differentiation would likely be captured at the 2dpf timepoint we have already quantified. We cannot perform leukocyte counts during Mm infection reliably as neutrophils/macrophages cluster around infected areas making counting challenging.

      However, in response to this comment we will:

      1. Use a new Tribbles 1 stable CRISPR-Cas9 knockout mutant we have generated and assess neutrophil differentiation using Sudan Black (SD). SD stains neutrophil granules the development of which is during a late phase of neutrophil differentiation.
      2. Interestingly, it has been shown that a zebrafish myeloid specific C/EBP (c/ebp1) is not required for initial macrophage or granulocyte development, but knockdown does result in a loss of the secondary granule gene LysC (Su et al., Zebrafish, 2007). Therefore, our findings are not inconsistent with existing literature, even if C/EBPs are regulated by Tribbles. However, to test this further we will use an LysC:mCherry transgenic line (Buchan et al., PLoS One 2019) to assess expression in developing neutrophils after trib1 manipulation.

      It is interesting that Cop1 knockdown zebrafish is viable, given its ubiquitous expression and multiple important targets of protein degradation. The authors should provide the details of phenotype of Cop1 KO larva and discuss on this issue.

      RESPONSE: Zebrafish mutants are much less often embryonic lethal than mice as maternally contributed protein stores allow for basic metabolic functions to occur throughout the short period of embryonic development (Rossant and Hopkins. Genes and Development 1992). However, in the case of Cop1 Crispant, this is a knockdown rather than a knockout, so there may be sufficient remaining Cop1 availability for development if it is indeed a requirement for larval viability. Although Cop1 knockout mice are non-viable, hypomorphs are viable and develop relatively normally (similar to our knockdown zebrafish) but are tumour prone as Cop1 is required for effective tumour suppression (Milgliorini et al., JCI, 2011).

      We had not commented on the Cop1 larvae phenotype as they look like they develop normally eg. normal body axis, development. However, we agree that this is a relevant point to incorporate into the manuscript and thus will add a comment on this in the Results section. Furthermore, we will add wholebody neutrophil counts into supplementary information, which we have performed and there is no change with cop1 knockdown, suggesting no difference in granulopoiesis.

      [Optional] To obtain the more solid evidence for the Cop1 dependent function of Trib1 on mycobacteria infection, it is better to use the Trib1 mutant that loses the Cop1 binding activity. This experiment will strength the authors' conclusion of the Trib1 and Cop1 cooperation.

      RESPONSE: We will address this comment by using a newly generated stable zebrafish CRISPR-Cas9 Tribbles 1 knockout line with a 14 base pair deletion that is predicted to lead a premature stop at 94aa in the middle of the pseudokinase domain, lacking the catalytic loop. This also lacks the predicted COP1 binding area at the C terminal of the protein. We will assess bacterial burden in this model.

      1. Previous studies have shown multiple defects in hematopoietic lineages such as M2-like macrophages and eosinophils in Trib1 KO mice, suggesting that Trib1 affects cellular functions of macrophages upon mycobacteria infection. I would request the authors to mention some ideas on this point in discussion.

      RESPONSE: We will add a section in the discussion to address this.

      • *

      Reviewer 2

      Structural comparisons are relatively descriptive of identity etc. Nowadays it should be relatively straightforward to comment on structural conservation based on Alphafold models. Specific details may not be accurate but gross folds will be, and comparing those may be more informative.

      RESPONSE: We have taken an initial look at Alphafold models and there are indeed structural similarities between zebrafish and human Tribbles. We will incorporate Alphafold structural models and comment on similarities/differences.

      Some discussion of the mechanisms regulating TRIB1/2/3 transcriptionally is probably relevant given the differential upregulation observed during infection. There is quite a bit of characterisation of different Tribble promoter regions in humans-how Edoes this translate to Zebrafish?

      RESPONSE: We will add a discussion point on what is known about Tribbles promoter regions in humans. We will assess whether anything is known about the promoter regions in zebrafish Tribbles (we have not identified literature on this currently). If nothing is known on this in zebrafish we will attempt to search for regulatory regions found in humans in the zebrafish promoters.

      In terms of Crispr use-can it be confirmed that Crispr modified cell lines have effects at the protein level? This is not my specific expertise, but the supplementary evidence shown seems to show some genomic editing is occurring, but not necessarily how it effects protein levels.

      RESPONSE: We do not have antibodies that work on zebrafish Tribbles proteins to assess this directly. However, we will address this comment by using a newly generated stable zebrafish CRISPR Tribbles 1 knockout line with a 14 base pair deletion that is predicted to lead a premature stop at 94aa in the middle of the pseudokinase domain, lacking the catalytic loop. Unlike the “CRISPant” knockdown work in the peer-reviewed version, this represents a full knockout of Tribbles 1. We will assess the trib1 cDNA of the full knockout line to assess the knockout in terms of transcript.

      A major conclusion of the paper seems to be that TRIB1 works with COP1 in Zebrafish to mediate response to infection. However the discussion does not particularly tie this with the other discussed mechanisms. E.g. JAK/STS, and EBP-linked responses are discussed separately from COP1, where they could well be linked?

      RESPONSE: We agree and this comment fits in with some comments from reviewer 1. We will rework areas of the discussion to address this and bring possible mechanisms together into a new discission section.

      • *

      Reviewer 3

      All comments addressed in new revision (see below).

      It is noted that this reviewer has “expertise from genetic studies of model organisms to assess all aspects of the tools and approaches used in the paper.”

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      Reviewer 1

      Figure 1D, E, F is mislabeled in lines 268-271.

      RESPONSE: Apologies for this typo, this has now been changed.

      Typo in line 399.

      RESPONSE: We have changed “suggesting” to “suggest”.

      Figure 6A-B is mislabeled in line 415

      RESPONSE: Apologies for this typo. We have changed this from “Figure 5A-B” to “Figure 6A-B”.

      • *

      Reviewer 2

      • *

      While the protective effect is stated as an effect size 'close to that of HIF-1a', is there additional rationale suggesting that the two may be linked?

      RESPONSE: Yes, there have been a number of studies that link Tribbles and Hif1-alpha. The best characterised link is in different cancer cells where Tribbles 3 has been linked to HIF-1alpha or hypoxia (in breast cancer (Wennemers, Breast Cancer Research 2011), renal cell carcinoma cells (Hong et al., Inj J Biol Sci, 2019) and adenocarcinoma (Xing et al., Cancer Management Research, 2020). In Drosophila Hif-1alpha induces TRIB in fat body tissue (Noguchi et al., Genes Cells 2022). We have now added references to these studies to the relevant section in the results.

      Reviewer 3

      • *

      Minor issues: small problems with clarity and figure panel correlation as detailed below:

      Mycobacterium marinum Lines 363-365 Refers to Fig 2C-D should be 3C-D

      RESPONSE: Apologies for this typo. This has now been changed to 3C-D.

      negative controls DN Hif-1alpha and PR (Figure 4A-B). Similarly, trib1 overexpression increased the levels of anti-nitrotyrosine staining, a proxy for immune cell antimicrobial nitric oxide production (Forlenza et al. 2008), to similar levels of DA Hif-1alpha (Elks et al. 2014; Elks et al) Not seeing this for Trib1

      RESPONSE: We are not completely sure what the reviewer is referring to here. We think possible confusion stems from the increase of nitric oxide in trib1 is compared to the phenol red control, so we have now clarified that in the text.

      As previously observed, overexpression of trib1 significantly reduced bacterial burden compared to phenol red controls when co-injected with tyrosinase guide (Figure 5A-B).

      The Fig 3 A-B is correct, although 6A-B appear to be novel panels showing this result

      RESPONSE: Yes, we agree, 6A-B has new results showing similar results to 3A-B, as it is necessary to include siblings from the same clutch in each graph to make direct comparisons. To avoid unnecessary confusion, we have removed the “as previously observed” for figure 6 as we had not previously had the tyrosinase co-injection so these are indeed new data.

      444 no comma 446 no comma 457 no comma after "activation"

      RESPONSE: We have removed these punctuations.

      472-475 confusing - better structure in particular in 474 what does "this" refer to?

      RESPONSE: We agree, and have clarified in the following new, clarified sentences:

      “Lipid droplets form in macrophages during Mtb infection that are potentially used as source of lipids by Mtb to allow for intracellular growth (Daniel et al. 2011). However, more recent findings suggest that lipid droplets are formed during the immune activation process after macrophage Mtb infection (Knight et al. 2018), that can subsequently influence the dynamics response of macrophage host defence (Menon et al. 2019). This macrophage lipid metabolism and handling could potentially be influenced by Tribbles.”

      525-526 confusing - better structure perhaps begin with 'Because...'

      RESPONSE: We have changed this confusing sentence to:

      “Here, we demonstrate il-1b and NO control by Trib1, suggesting that Trib1 controls multiple immune pathways and that therapeutic Trib1 manipulation may be more effective than targeting individual immune pathways alone.”

      confusing 538 "this and 539 pave the way for further research into TRIB1 as a target for host-derived therapies" Perhaps "further research into TRIB1 as a target for host-derived therapies could potentially improve infection outcome of mycobacterial infection via pharmacological targeted delivery methods and transient manipulation through genetic approaches"

      RESPONSE: We have changed this sentence as suggested.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      • *

      Reviewer 1

      1. The authors should investigate the expression of the C/EBPa protein p42 isoform and/or other C/EBP family proteins such as C/EBPb, and confirm that the p42 is degraded by Trib1 overexpression and recovered by Trib1 and Cop1 knockout. It is also important to determine both p42 and p30 isoforms are preserved in zebrafish.

      RESPONSE: This is a complex point to unpick in zebrafish and we believe this to be out of the scope of the current project. We do not claim a link to C/EBP. As mentioned in above comments we think that a link to C/EBP may be unlikely given that monocyte and granulocyte numbers are not altered after TRIB1 manipulation. We will add more data to look at different markers of neutrophils (see above comments). There are many other candidates other than C/EBP that COP1 could be acting through. Some examples include MAPK (Niespolo et al., Front Immunol, 2020), serine threonine kinases (Durzynska et al., Structure, 2017) and beta-Catenin (Zahid et al., Proteins, 2022). There is also evidence suggesting that COP1 and C/EBP have distinct binding sites on TRIB1, potentially unlinking their activity in some biological situations (Murphy et al., Structure, 2015).

      C/EBPa is found in zebrafish and is involved in myeloid differentiation and haematopoeisis (Yuan et al., Blood 2011). There is not a huge amount in the literature on this, but it has been shown in zebrafish models that the drug Tanshinone IIA reduces C/EBPa (Park et al., In J Mol Sci, 2017) and we know from previous work in our department that Tanshinone IIA does not affect total neutrophil numbers in the zebrafish larvae (Robertson et al., Sci Trans Medicine, 2014). The most involved C/EBP in zebrafish myelopoiesis appears is a zebrafish specific isoform called c/ebp1 that is myeloid expressed (Lyons et al., Blood 2001). This has a highly conserved carboxy-terminal bZIP domain but the amino-terminal domains are unique. Interestingly, reduction of c/ebp1 does not ablate initial macrophage or granulocyte development, but did result in loss of expression of LysC, a secondary granule marker (we are checking expression of this gene after Trib1 modulation using a LysC:mCherry transgenic zebrafish line).

      We do not have antibodies or tools to detect p42 and p30 in zebrafish. As Tribbles1 regulation of C/EBPa appears to be post-translational (Bauer et al., J Clin Invest, 2015), this would be incredibly challenging to unpick in the zebrafish model due to lack of tools to do this. Due to this and the reasons above we believe this to be out of the scope of the current project.

      [Optional] The effect of enhanced ERK phosphorylation by Trib1 for the protective effect against mycobacterial infection is another interesting point. It would be better if the authors could provide the ERK phosphorylation status upon Trib1 overexpression.

      RESPONSE: Unfortunately, we have no method to answer this question to a conclusive level within the scope of this project. There are limited reports of phosphorylated ERK antibodies that work in wholemount zebrafish (eg, Maurer and Sagerström, BMC Developmental Biology, 2018, that use a rabbit antibody), but this is widely expressed in many tissues of the zebrafish and immune cells would be challenging to resolve.

      Reviewer 2

      We have addressed or propose to address all of reviewer 2’s comments.

      • *

      Reviewer 3

      We have addressed or propose to address all of reviewer 3’s comments.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):*

      The mechanisms that differentiate ER from the nuclear envelope (NE) remain to be fully elucidated but likely depend at least in part on junctions between the ER and NE. How such junctions are formed and maintained is the subject of this manuscript where extensive correlative light and electron microscopy is used to observe and characterize ER-nuclear envelope (ER-NE) junctions at distinct phases of the cell cycle. The authors make use of their own electron tomography data as well as publicly available focused-ion beam scanning electron microscopy (FIB-SEM) datasets to compare the morphology of these junctions in different human cell types as well as in budding yeast. The major finding is that ER-NE junctions in human cell lines are more constricted than ER-ER junctions, often to the point of excluding lumen. The examination of mitotic cells suggests that this constriction likely occurs at the end of mitosis as the NE is completing its maturation from ER to NE. The implications of these morphological changes are discussed but there are no mechanistic or functional studies. Overall, the data are well presented, are of high quality and are rigorously evaluated. The manuscript is well written and scholarly, and the speculations as to the function of the constrictions are reasonable. I only have minor comments. *We thank the reviewer for the positive evaluation on our work and for the useful suggestions on how to further improve the manuscript.

      1. * In Figure 2D, the authors present evidence to demonstrate that an hourglass-like constriction occurs at ER-NE junctions. From the side view, it is difficult to interpret this on the plot, particularly for the ER-NE junctions with a lumen. Perhaps, in the supplemental data, the authors could plot both with and without lumen data separately, and color-code individual traces? I believe this would convey the hourglass nature of these constrictions more clearly.* To make it easier to see individual membrane profiles, we will plot the profiles with and without lumen separately and labelled each profile with distinct colour, as the reviewer suggested.

      * In the Methods section, the authors should describe how carbon-coating of sapphire discs was achieved. If these were provided from the manufacturer precoated, this should be specified.*

      We coated the sapphire discs with carbon by ourselves. We will specify how the carbon-coating was done in the revised manuscript.

      * On page 10, Figure 5F callout 9 lines from the bottom likely should be 5E. We will correct this error.

      Reviewer #1 (Significance (Required)):

      Overall, this work provides an important new morphological perspective on the nature of ER-NE junctions in human cells. As the authors describe in their introduction, such junctions have been noted previously in the literature but not in a dedicated study using modern imaging techniques in human cell lines. In describing the morphology of these junctions, the authors lay the groundwork for future mechanistic, functional, and structural studies. We thank the reviewer for appreciating the significance and the impact of our work.

      *

      • *

      • *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):*

      Summary: In this manuscript, Bragulat-Teixidor et al., use correlative live-cell imaging and electron tomography to study the structure of the endoplasmic reticulum-nuclear envelope (ER-NE) junction in HeLa cells (and also in S. cerevisiae). The authors also make use of publicly available whole-cell FIB-SEM datasets to study ER-NE junctions in mouse pancreatic islet, HeLa, and human macrophage cells to corroborate their findings in other cell types.

      The authors show that the structure of the ER-NE junction in interphase cells adopts an hourglass shape with a constricted neck. Comparing the ER-NE junction to the ER tubule-sheet junction, the authors show that these structures are different: the ER tubule-sheet junction is not constricted. Because the NE forms from the ER during postmitotic NE assembly, the authors compare the structure of the ER-NE junctions in anaphase, telophase, and interphase cells, and find that the junction becomes constricted in telophase. The number of ER-NE junctions increase going from telophase to interphase.

      While the authors do not provide any direct evidence for this, they propose a functional model where the ER-NE junction is constricted because it regulates the supply of certain lipids and proteins from the ER to the NE. One proposed example is that the constriction of the ER-NE junction might prevent the passage of large protein aggregates from entering the NE.

      The general question of how the structure of the ER-NE junction might regulate the passage of lipids and proteins from the ER to the NE is interesting and potentially important. However, the authors should address the following issues to improve the accuracy and completeness of this manuscript for it to be considered for publication. *We thank the reviewer for the appreciation of our work and the thoughtful suggestions for further improvements.

      * Major comments: 1. The authors compare the structure of the ER-NE junction to the structure of the ER tubule-sheet junction in interphase cells. They should instead or in addition be comparing the ER-NE junction to ER sheet-sheet junctions. This is likely a better comparison for two reasons:

      i) The NE is similar to an ER sheet due to its flat and extended structure. The ER membranes surrounding the NE consists mostly of a dense network of sheet-like ER (Zheng et al., 2022, PMID: 34912111). Therefore, the ER-NE junction should be compared to these NE-adjacent ER sheet-sheet junctions and not ER tubule-sheet junctions which are likely to be found in the cell periphery.

      ii) In HeLa cells, the NE assembles from large ER sheets and not ER tubules (Zhao et al., 2023, PMID: 37098350; Otsuka et al., 2018, PMID: 29323269; Lu et al., 2011, PMID: 21825076). Therefore, the ER-ER junctions the authors are already studying in anaphase cells are likely to be ER sheet-sheet junctions, which should be kept the same in their analysis of the ER-ER junctions in interphase cells.

      Related to this point, comparing the side view panels in Figure 2D with 2H, it seems that the width of the ER membranes on either side of the neck region of the ER-NE junction is in fact getting wider (more sheet-like). This is in contrast to the ER-ER junction where the width stays constant for the ER tubule that is fusing onto the ER sheet. This suggests that indeed, the ER-NE junction is more similar to an ER sheet-sheet junction. *It is a very interesting possibility that the ER-NE junction might be similar to the ER sheet-sheet junction. We will inspect whether the ER that forms the ER-NE junction consists of sheet or tubular ER in our EM tomograms, and describe the outcome in the revised manuscript.

      * The authors claim that in late anaphase cells, the ER-ER/NE (written like this because the ER and NE cannot be distinguished like the authors also point out) junctions are not constricted and had a similar morphology to ER-ER junctions in interphase. However, this claim is only qualitative at the moment, as the authors do not provide any quantification of the width of the ER-ER/NE junctions in late anaphase cells. To make the current claim that the ER-NE junction only becomes constricted in telophase, the authors should report the width of the ER-ER/NE junctions in late anaphase cells.

      In late anaphase cells, large ER sheets initially wrap around chromatin at the periphery of the chromosome mass (Zhao et al., 2023, PMID: 37098350; Otsuka et al., 2018, PMID: 29323269; Lu et al., 2011, PMID: 21825076). Therefore, the authors might find it easier to identify ER-ER/NE junctions in the so-called "non-core" regions, instead of in the current regions shown in Figure 3A. *As the reviewer pointed out, we did not provide quantification of the width of ER-ER/NE junctions in late anaphase cells. We will measure them and show the quantification in the revised manuscript.

      * Minor comments: 1. In the Supplementary Figures 1 A-D, make the scale bars white. Currently, the black scale bars are especially difficult to see in the top panels in Supplementary Figure 1C. *We will change the colour of some scale bars to make them more visible in the Supplementary Figure 1.

      * In the Results section entitled "The number of ER-NE junctions per cell increases from telophase to interphase", the authors should tone down this claim because the number of telophase cells examined is low (only 2 telophase versus 9 interphase cells). It would be better to include the word "slightly" in the title to change it to "slightly increases". *We will modify the text accordingly. * In the Results section entitled "The number of ER-NE junctions per cell increases from telophase to interphase", the authors state "These densities were much lower than those of ER-ER junctions...". For sure this is true for ER tubule-tubule junctions in the periphery of the cell as ER tubules form an intricate network by constantly fusing to each other, but it's not clear if this is also the case for ER tubule-sheet or ER sheet-sheet junctions. For clarity, the authors should state that they mean ER tubule-tubule junctions.

      Same comment also for the statement "...although their abundance remains considerably lower than that of ER-ER junctions or nuclear pores at both cell cycle stages". The authors should state that they mean ER tubule-tubule junctions. We will clarify what we mean by ER-ER junctions in the revised manuscript. * In the Results section entitled "The constricted morphology of ER-NE junctions is observed in different mammalian cells, but not in budding yeast", the authors state "...pancreatic islet cells (Figure 5A), HeLa (Figure 5B), and macrophage (Figure 5C) were significantly smaller than most ER-ER junctions (Figure 5F)". The last figure reference here is wrong and should be changed to Figures 5D-E. We will correct this error. * In Discussion, the authors state "Proteins known to form and stabilize junctions in the ER, including Atlastins and Lunapark...". The authors should specify that they mean ER tubule-tubule three-way junctions. Also more generally throughout the manuscript, the authors should be more careful in specifying which ER-ER junctions they mean in each case.*

      As pointed out in the Major comment 3 above, we will clarify this point in the revised manuscript.*

      1. In Discussion, the authors state "Thus, we favour a second scenario in which ER-NE junctions are generated from ER tubules that contact and eventually fuse with the ONM". Given that the ER membranes adjacent to the NE are mostly sheet-like (as pointed out in Major comment 1 above), the authors need to explain how they think an ER tubule (mostly found in the cell periphery) could access and fuse to the NE. As mentioned in the response to Major comment 1 above, we will examine if the ER that forms ER-NE junctions is tubule or sheet in our EM tomograms. Depending on the outcome of the examination, we will rephrase the text.

      *

      * Reviewer #2 (Significance (Required)):

      Although the ER-NE junction has been studied in other organisms before, this study represents the first structural characterisation of the ER-NE junction in mammalian cells. Therefore, this study represents an advance for the field in gaining a better understanding of different ER structures and morphologies. How the ER is remodelled during the cell cycle is also an interesting question and an active field of research (Merta et al., 2021 PMID: 34853314; Zhao et al., 2023, PMID: 37098350) which this study further contributes to. This study would therefore be interesting for anyone interested in ER structure/morphology, ER-NE connections, and cell cycle regulation of such ER-NE connections.

      My field expertise is in ER and NE. I do not have sufficient expertise to evaluate the methodology for the EM tomography part of this paper. We thank the reviewer for appreciating the novelty and the impact of our work.

      *

      *

      *

      * Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Bragulat-Teixidor et al. is a study of the connection of the ER with the nuclear envelope. It uses advanced ultrastructural techniques: high pressure freezing instead of chemical fixation and EM tomography instead of serial sectioning. Synchronized HeLa cell cultures were examined during interphase, late anaphase (4-6 min after anaphase onset) and early telophase (8-10 minutes after anaphase onset).

      The investigators find an unexpected, unusual structure - a constricted neck 7-20 wide and about 10 nm long where the ER connects to the nuclear envelope. The 7 nm connections had no apparent lumen. These are not seen in late anaphase when the NE has not yet formed, but they are seen a few minutes later during early telophase when there is a newly formed NE surrounding the chromosomes. A quantitation was made of their abundance, more was found later during interphase, and with wider lumens.

      It is very nice to show the EM images as uncolored and segmented (colored). The images shown in the figures are presumably the best that were obtained during the study. Heavy metals do not stain membranes uniformly or exclusively, and identification of structures doesn't always seem unambiguous. The three dimensional information can certainly make this easier though this information is difficult or not possible to show in journal format. In the end, the reader must depend on the judgment of the person who did the analysis. Overall, the analysis seems trustworthy. *We thank the reviewer for the comment. To better present the three-dimensional structure of ER-NE junctions, we will provide movies of the EM sub-tomograms containing the junctions. In this way, the readers will be able to inspect the three-dimensional structure of six ER-NE junctions.

      * HeLa cells are very convenient for getting information on cell cycle dependence. However, they are cancer cells in culture, so it is important to look at other cell types as well. The same methodology was used on budding yeast and they saw a wide tentlike connection, which reproduces an earlier study. This seems more consistent with what is known or expected from ER membranes. It is not less interesting but perhaps less puzzling. To get evidence on other mammalian cells, the authors did an analysis of data from OpenOrganelle. These are high pressure frozen cells / tissue imaged by FIB-SEM. The voxels are 4 nm, which is significantly larger than those in EM tomography. Unfortunately, the difficulty of identifying structures is correspondingly more significant. The images shown do not contradict the HeLa results but by themselves (without the HeLa cell data), a convincing case for narrow connections probably couldn't be made. *The reviewer raises a very good point about a limitation of the FIB-SEM datasets in OpenOrganelle. We agree with the reviewer that, as we had mentioned in the manuscript (line 6–11, page 10), the spatial resolution of the FIB-SEM datasets are not enough to gain insights into the exact morphology of the 7–20 nm wide ER-NE junctions because the voxel size is 4 nm. However, the resolution is good enough to examine if ER-NE junctions are narrower than ER-ER junctions, as shown in Figure 5A–E. The fact that we rarely found non-constricted ER-NE junctions in FIB-SEM datasets confirms the tiny nature of ER-NE junctions. To clarify this point, we will modify the text (line 24–25 on page 10) as below:

      Previous: This analysis of FIB-SEM images confirms the hourglass morphology that distinguishes ER–NE from ER–ER junctions as seen in our EM tomograms…

      Revised: This analysis of FIB-SEM images confirms that ER-NE junctions are narrower than ER-ER junctions as seen in our EM tomograms…

      * The work in this manuscript seems to have been done well. Assuming that this structure is confirmed in other mammalian cells, another kind of question comes to mind: is this the final word on ER to NE connections? The lumenless neck does not seem like it would be a stable structure, somehow it seems like a transient one. In the future, it would help if a new structural protein was identified or some theoretical analysis to help explain the shape. *Certainly, this will not be the final word on ER-NE junctions, which are crucial for the ER-to-NE transport of lipids and transmembrane proteins. In the future, it will be important to identify structural proteins regulating the junctions and reveal how their constricted morphology affects the ER-to-NE transport. We believe that, as you kindly mentioned in the last paragraph of your comments, our observations “serve as a starting point for further structural and functional work” for this unique yet fundamental junctions that connect the ER to nucleus.

      * It is generally now assumed that high pressure freezing preserves structure perfectly. However, in this reviewer's mind, there is a possibility that some structures are not. The sample is brought to 2000 atmospheres within a few milliseconds, frozen, then the high pressure is released after a second. Although many intracellular structures do seem well preserved, could the junction be susceptible to high pressure? A second source of uncertainty is that in order to embed the samples in resin, the water was removed by freeze substitution. This is known to cause a small amount of tissue shrinkage and possibly could alter a delicate structure. Another way to look at this kind of structure is cryo-EM tomography on hydrated lamellae from plunge frozen cells. I don't recommend that the authors do another arduous, possibly too arduous set of experiments with a completely different technique, but perhaps another group has data which could support their findings. *We think it is very unlikely that ER-NE junctions were deformed due to the high-pressure freezing. In general, high-pressure freezing allows vitrification of specimens up to 0.5 mm in thickness and the vitrification works better for thinner specimens. Our specimens are only 0.02 mm thick monolayer cells frozen in a chamber with 0.03 mm depth. Thus, the vitrification is expected to occur fast and the ER-NE junctions must have been frozen in the same way as in other regions of the cell.

      However, as the reviewer pointed out, it is possible that the dehydration of the samples due to freeze substitution might cause deformation in ER-NE junctions. To verify the structural preservation of ER-NE junctions in our protocol, we will compare the morphology of the ER and NE in cryo-EM datasets that are available in public databases with ours. We will describe the outcome in the revised manuscript.

      We think that our conclusion from the EM analysis is solid, because we observed significant structural difference between ER-NE junctions and ER-ER junctions in the same cells (Figure 2). In addition, we found the morphology change of ER-NE junctions in late-anaphase, early-telophase, and interphase cells that were high-pressure frozen and freeze-substituted on the same sapphire disc, and found that the ER-NE junctions became progressively constricted from telophase to interphase (Figure 3).

      * The following are suggestions for the Discussion:

      Yeast have many of the same biochemical processes as mammalian cells. Perhaps their lack of narrow connections can be used as a clue to the function of the narrow necks seen in HeLa cells. For instance, the authors speculate that the narrow connection serves to keep phosphatidylserine in the nuclear envelope low. If the yeast nucleus has the same concentration of phosphatidylserine as the ER, it would provide good evidence for this idea. Yes, it is indeed the case. It was shown that the yeast outer nuclear membrane has the same concentration of phosphatidylserine as the ER (Tsuji et al., Proc. Natl. Acad. Sci. U. S. A.*, 2019). We had described this in the discussion on page 14 “this phosphatidylserine enrichment occurs in mammalian cells and not in budding yeast (Tsuji et al., 2019)”, which was probably overlooked by the reviewer. In the revised manuscript, we will rephrase the text to make this point clearer.

      * There might be other instances of lumenless neck structures. Dynamin mutants can cause a stable constricted tubule - are the dimensions of this tubule similar to that of the ER / NE connections? Or possibly some ESCRT related structure? These are very interesting questions. As shown in Figure 2A-D and Supplementary Figure 1B, the inner diameter (an inner leaflet distance) of the lumenless ER-NE junctions is below 1 nm. In contrast, the inner diameter of most constricted membrane tubules that the dynamin mutant K44A Dynamin 1 generates is 3.7 nm (Antonny et al., EMBO J., 2016, doi: 10.15252/embj.201694613). The inner diameter of membrane tubules that ESCRT-III subunits CHMP1B and IST1 form is 4.4 nm (Nguyen et al., Nat. Struct. Mol. Biol.*, 2020, doi: 10.1038/s41594-020-0404-x). Thus, the lumenless ER-NE junctions is unique in their highly-constricted nature and might be regulated by proteins other than dynamin or ESCRT proteins. We will discuss this point in the revised manuscript.

      * There do not seem to be any recent studies of the ER / nuclear membrane connection in fixed cells. However, there is serial section data online which can be inspected. There are connections in mouse brain cortex in the data of Kasthuri et al., 2015 (https://neurodata.io/project/ocp/). Instead of a tubule connection, there seems to be a narrow sheet of ER that connects to the nuclear envelope. But there is something odd about these too. The authors may like to mention something about this or similar work in their manuscript. This reviewer has looked at chemically fixed data from several cell types from his own unpublished data and connections are surprisingly hard to find. Possibly, the connection is particularly sensitive to chemical fixation.* We inspected the serial section data of mouse brain cortex that was chemically fixed. The nuclear envelope in this dataset is deformed and does not seem well preserved. We do not think that we can extract useful information on the ultrastructure of ER-NE junctions from this dataset, and thus will not mention this work in our manuscript.

      It is great to hear that the reviewer tried to look for ER-NE junctions in their own EM data. The frequency of ER-NE junctions is rare (only 0.1 junction per square micrometer, Figure 4). Thus, we think that the reason why it was hard to find the junctions in the reviewer’s data is due to the low-frequent nature of this junction and not due to the chemical fixation.

      • *

      * Reviewer #3 (Significance (Required)):

      This is a careful and thorough study of the connection between the ER and the nuclear envelope. The discovery of reticulons and similar proteins, along with biophysical modeling, made the form of the ER accessible to analysis. The factors that govern ER structure are now much better understood. This is particularly true of sheets versus tubules, the three way tubule junctions and to some extent, the junction of ER tubules coming out of the edge of a sheet. However, with all this activity, the subject of the connection of the ER to the nucleus has not been examined in detail. What makes it different is that the tubule is connected perpendicular to the plane of a sheet.*

      We thank the reviewer for appreciating the quality and novelty of our work.

      * The manuscript uses the best ultrastructural techniques and provides strong evidence for a narrow neck at this connection in HeLa cells. With the same methodology, yeast cells (S. cerevisiae) have a wider connection. OpenOrganelle data from other mammalian cell types was examined. This data has less resolution and although it does not contradict the HeLa cell data, it does not support it strongly. *As mentioned in the response to one of this reviewer’s comments above, the spatial resolution of FIB-SEM datasets is good enough to examine if ER-NE junctions are narrower than ER-ER junctions. We think that our observation of several mammalian cells in FIB-SEM datasets strongly supports the conclusion that ER-NE junctions are narrower than ER-ER junctions and extends our findings in HeLa cells to two other mammalian cell types.

      * This work is of interest to cell biologists specializing in membranous organelles or those interested in nuclear physiology. The connection of ER to nuclear envelope is an interesting problem that has not been studied recently. This manuscript could very well serve as a starting point for further structural or functional work by the authors or other groups. *We thank the reviewer for appreciating the significance and impact of our work.

      *

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Summary: Membrane bound ribosomes and ER exit sites are present in the cytosolic side of nuclear envelope (NE), suggesting that NE shares protein translocation, folding and quality control functions with the endoplasmic reticulum (ER). Moreover, membrane continuity between the ER and outer NE membrane is evident, and, thus, NE is considered as a subdomain of the ER. To support this, during cell division, NE loses its identity, and participates to daughter cells as part of the ER. However, NE has also membrane proteins and luminal proteins that are enriched to NE and absent from ER during interface, and the segregation of NE specific proteins/lipids occurs concomitantly with NE formation during late anaphase/telophase. In this study, the ultrastructure of the ER-NE junctions is described using high resolution electron tomography. Results show convincingly a specific constriction at the ER-NE neck during interface in several mammalian cell types. This structure is absent during metaphase, and also from the budding yeast. Authors present a model for the formation of ER-NE junctions in higher eukaryotes and speculate about their functional role. *We thank the reviewer for the appreciation of our work and the valuable suggestions for further improvements.

      * Major comments: The main conclusion of the paper is that although the ER and outer NE membranes are continuous, a specific hourglass shaped constriction at the neck is found in higher mammalian cells during interphase. The structure is specific to ER-NE necks, as it is absent during metaphase and ER-ER junctions. For the analysis, authors used high pressure freezing to ensure best structural preservation. Unfortunately, fixation is not the only potential source of artifacts; during tomography at ambient temperature, the thinning of the plastic sections under the beam can be up to 30%. In evaluation of the results, authors should consider how this thinning could affect the measurements of membrane distances and luminal width, and what type of distortions may happen as a consequence of asymmetric shrinkage.*

      In addition to analysis of own samples, authors took advantage of the publicly available whole-cell datasets in OpenOrganelle and used these datasets to expand the number of cell types analyzed. Moreover, the 3D-datasets were generated with different imaging technique, FIB-SEM. Although this technique provides lower resolution in general, it provides isotropic resolution, and the data could be used to eliminate the shortcomings of the tomography, thinning of the sections and the missing wedge. The authors could expand the comparison of the data from these different sources from this perspective, especially since HeLa cells were used in their own tomography studies and FIB-SEM datasets in OpenOrganelle. Similarly, it would be interesting to see if similar approach could be used to compare their results to those obtained by cryo-EM by utilizing the cryo-EM database. Have authors checked if any suitable datasets for analysis of ER-NE junctions could be found from public archives? For the analysis of mitotic cells, double thymidine block was used to synchronize the cell culture. It is not clear, why synchronization was necessary, as CLEM was used to select the cells, and their number was rather low. Do cells continue growing and synthesizing new proteins during thymidine blocks? As one way to control potential artifacts due to the synchronization treatment, authors could compare the average thickness of ER and NE in naturally occurring interphase and mitotic cells vs. synchronized cells. We agree with the reviewer that it is important to clarify the degree of shrinkage and deformation of the sample that our EM protocol might introduce. To access the degree of sample shrinkage and deformation in the plastic sections, we will compare the ONM-INM distance measured in our plastic sections with the one in cryo-EM tomograms of rapidly-frozen and FIB-milled mammalian cells that are publically available (EMPIAR, the Electron Microscopy Public Image Archive, https://www.ebi.ac.uk/pdbe/emdb/empiar/), and describe the outcome in the revised manuscript.

      The reason why we synchronized the cell cycle is to enrich cells in late anaphase and early telophase in the same plastic sections, so that we can compare their ultrastructure side-by-side. In the revised manuscript, we will examine if the double thymidine block affects the ER-NE junction morphology by comparing the morphology of the ER and NE between the synchronised and non-synchronised cells.

      As we described in the response to Reviewer 3, we think that our conclusion from the EM analysis is solid because of the following reasons. (i) We observed a significant structural difference between ER-NE junctions and ER-ER junctions in the same cells (Figure 2). (ii) We discovered a morphology change of ER-NE junctions in late-anaphase, early-telophase, and interphase cells that were freeze-substituted on the same sapphire disc; the ER-NE junctions became progressively constricted from telophase to interphase (Figure 3).

      Minor comments: On page 5, last chapter (+ Fig.1 legend and materials and methods): "the quick tomograms covered the entire NE" is misleading, as the imaging covered a thin layer of the entire NE only. - Authors could have analyzed the entire NE from the FIB-SEM datasets but chose to use stereological approach to minimize their work.

      We will modify the text to make it clear that the quick tomograms covered the NE in a section and not the entire NE of the cell in the revised manuscript.

      * To save time from the readers to follow the reference, authors could describe how the specimens used in OpenOrganelle datasets were fixed and processed, especially as they emphasize the importance of high pressure freezing in their own sample prep. Similarly, in Fig.4 legend, authors refer to measurements done in the previous study without explaining how and from what type of data. *We thank the reviewer for pointing these out. We will describe how the OpenOrganelle datasets were generated and how the nuclear surface area measurement was done.

      • *

      Is there a difference between mesh generation and segmentation, or is it just two different terms used for the same thing by different programs? We apologize our short description of these terms. We will clarify these terms in the revised manuscript.

      *

      Reviewer #4 (Significance (Required)):

      General assessment: ER-NE gates were described earlier in the literature for specific cell types using standard thin-section TEM imaging, and in this study, the analysis was done with modern technology at 3D. The text is fluent and clear, and the quality of the images was excellent. The analysis of the data was thorough, and materials and methods including image analysis part were presented accurately and clearly. Ultrastructural analysis was done systematically, and generated models are beautiful and informative. Much thought has put into planning of the experiments and experimental approach. The shortcoming of the study is its limitation to ultrastructural analysis only without attempts to connect to any mechanism. The discussion part contains lot of speculation of the factors that might be needed for the formation and maintenance of the constriction and present several hypotheses for the function of the constriction. The paper would be much stronger if one of few of the leads would be followed, and if there would be any explanation for the role of these structures, or factors affecting them. *We thank the reviewer for the appreciation of the clarity and quality of our work. The molecular mechanism that regulates the function, shape and biogenesis of ER-NE junctions will be the subject of future studies, for which our discovery of a highly-constricted morphology of the ER-NE junctions lays the groundwork.

      * Advance: The paper provides a very nice example for the reuse of publicly archived imaging datasets to complement own experimental work. Hopefully this paper encourages others to the same path, as the large volumeEM datasets require significant investments and contain wealth of potential for reuse. *We strongly agree with the reviewer. The volume EM datasets that are publically available contain wealth of potential for new discoveries. We also hope that our paper encourages other scientists to make good use of those datasets and also to deposit their own data to the public databases. We will deposit our EM tomograms to EMPIAR, the Electron Microscopy Public Image Archive.

      * The paper strengthens the description of the ER-NE junction structure significantly and convincingly but does not further our understanding of the mechanisms behind the structure nor the function of them and raises more questions than provides answers. For structural analysis of this kind, the state-of-the-art technology is cryo-EM (e.g., preparation of lamella with cryo-FIB-SEM followed by cryo-tomography), and in this study, the technical limitations come from plastic embedding and ambient temperature imaging. The used techniques would be more adequate for cell biological study, where the described structure is somehow connected to the function in cell, or the factor(s) needed to the formation or maintenance are identified. *Indeed, a limitation of our current study is that we did not reveal the underlying molecular mechanism and the functions of the constricted morphology of ER-NE junctions. We do not think that cryo-EM is necessarily required because we have collected evidence that the ER-NE connections are distinct from the ER-ER junctions in not only our EM tomography data (Fig. 2) but also in the EM datasets deposited in public databases (Fig. 5).

      * Audience: This study will be of special interest to cell biology community. The study could be an opening to several lines of research, e.g., identification of the factors forming or maintaining the structure, the potential function of the structure, how the structure affects the dynamics of the NE/ER membrane and luminal proteins. *We thank the reviewer for appreciating the impact of our work.

      * Reviewer's expertise: The reviewer has long experience in electron microscopy, volumeEM techniques and image analysis, and operates mainly in the field of cell biology.*

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript entitled "The Drosophila Tumour Suppressor Lgl and Vap33 activate the Hippo pathway by a dual mechanism, involving RtGEF/Git/Arf79F and inhibition of the V-ATPase." by Portela et al. presents an interesting perspective of the molecular mechanism regulating Hippo pathway, revealing new proteins involved in this process. In this study, the authors try to show us that Lgl activates the Hippo pathway via Vap33 either by interacting with RtGEF/Git/Arf79F or by inhibiting V-ATPase, thus controlling epithelial tissue growth. The methodology used by the authors is adequate but could benefit from further experiments that would allow them to reach the conclusions stated in their research. Thus, based on the interpretation of the results presented by the authors some concerns were raised that should be addressed during the review process and that are explained in the major comments. Major comments: • It is not clear why in "The Hippo signaling pathway is negatively regulated by V-ATPase activity in Drosophila" section, the authors use Vha68-2 RNAi to reduce the activity of V-ATPase and later they use the overexpression of Vha44 to activate V-ATPase. The authors should explain why they used different proteins to regulate V-ATPase. The way the authors wrote their results sounds like different Vha proteins regulate V-ATPase, which means that cells may have different ways to activate V-ATPases, not being clear if regardless that the downstream effect of V-ATPase activation is always reflected in the Hippo pathway. Thus, the authors should state what other Vha proteins may have a similar effect, I would like to see evidence that Vha44 and Vha68 knockdown and overexpression leads to similar results.

      Response: Vha68-2 and Vha44 are both components of the V-ATPase. We have added further details to the results to make this clearer. We have previously shown that knocking down several components of the V-ATPase, which disrupt V-ATPase function, have a similar effect on the Notch pathway (Portela et al., 2018 Sci. Signal., PMID: 29871910). Vha44 overexpression had been documented to result in V-ATPase activation (Petzoldt et al., 2013, Dis Model Mech., PMID: 23335205), and no other Drosophila V-ATPase transgenes were available to conduct experiments with other lines.

      • In "Vap33 activates the Hippo pathway" section, the authors' conclusions represent a big statement considering the results obtained. Though Diap1 is a Hippo pathway target, it does not mean that this protein is solely regulated by this pathway. For example, there are studies that show that this gene can also be transcribed by STAT activity. Though in the following section the authors show how Vap33 activates this pathway, the results obtained in the section "Vap33 activates the Hippo pathway" are not enough to make this assumption. We suggest that the authors rephrase this section. (Optional: To maintain this statement, the authors should have performed, for example, a luciferase assay containing specifically Hippo pathway binding sites in the Diap1 gene, showing that the transcription factor of the Hippo pathway is somehow regulated by Vap33). Response: Whilst Jak-STAT signalling has been shown to induce Diap1 expression in the wing disc during development (PMID: 28045022), however expression profiling after activation of the Jak-STAT signalling in the eye epithelium did not identify Diap1 as a target (PMID: 19504457). Additionally, there are no reports that Lgl depletion in eye disc clones elevates Jak-STAT signalling (Stephens et al., J. Mol. Biol. 2018, PMID: 29409995), but instead loss of cell polarity in scrib mutant cells in the eye disc results in expression of the Jak-STAT pathway ligand, Upd, and non-cell autonomous induction of Jak-STAT signalling in the surrounding wild-type cells (PMID: 25719210, __PMID: __23108407). We have previously shown that Lgl depletion leads to inactivation of the Hippo pathway and elevates expression of the canonical Yki targets, Ex and Diap1 (Grzeschik et al., 2010, Curr Biol., PMID: 20362447). In this current study we show that Vap33 overexpression leads to the downregulation of Diap1 and in lgl mutant tissue reduces the elevated Diap1 expression. Since there is no evidence that either Lgl or Vap33 (VAPB) perturbations affect the Jak-STAT signalling pathway, we conclude from our results that Vap33 acts by reducing Yki activity and thus activating the Hippo pathway. We have added additional explanation to this section of our manuscript.

      • The authors present a highly speculative discussion, raising different hypotheses. Though such hypotheses are well supported by the literature, the authors would enrich the quality of their research if indeed they could prove them. Particularly, testing for vesicle acidification, testing if V-ATPase indeed blocks the interaction of Lgl/Vap33/RtGEF/Git/Arf79F, and alters Hpo localization, testing if Git/RtGEF inhibits Arf79F and consequent Hpo localization. Response: Although it would extend the paper to conduct further experiments, my lab is now closed so this is not possible. We have already published that vesicle acidification is increased in lgl mutant tissue (Portela et al., 2018, Sci. Signal., PMID: 29871910) and that Hpo localization is altered in lgl mutant tissue (Grzeschik et al., 2010, Curr. Biol., PMID: 20362447).

      • The authors should also apply more specific techniques to infer how the Hippo pathway is affected by such genetic manipulation since diap1 can be a target gene of different pathways. Response: We have shown that lgl mutant tissue also shows upregulation of the Hippo pathway target, Ex-LacZ, and affects the phosphorylation of Yki (Grzeschik et al., 2010, Curr. Biol., PMID: 20362447), and RtGEF/Git mutant tissue shows upregulation of the Yki target, Ex-LacZ (Dent et al., 2015, Curr. Biol., PMID: 25484297). Since RtGEF/Git are positive regulators of Hippo, but there is no evidence that they are involved in the regulation of the Jak-STAT pathway, the effect of Vap33 overexpression on Diap1 levels in the context of a RtGEF knockdown (Fig 5) is most likely to be due to effects on the Hippo pathway. Similarly, since Lgl deficiency upregulates Yki targets, Ex-LacZ and Diap1 (Grzeschik et al., 2010, Curr. Biol., PMID: 20362447), the reduction of the elevated Diap1 levels in lgl mutant clones by knocking down or reducing Arf79F activity (Fig 7), is most likely due to inhibition of Yki activity and therefore elevated Hippo pathway signalling.

      Minor comments: • The authors present a well-structured manuscript, that generally is easy to understand. However, at some points, the statements given by the authors seem highly speculative. • The figures presented in this manuscript and the statistical analysis seem adequate and are clearly described.

      Response: We thank the reviewer for their support of our study. We have added more explanation to support our conclusions.

      Reviewer #1 (Significance (Required)):

      The study presented by Portela et al. gives new insights into the regulation of the Hippo pathway with the discovery of new proteins involved in this mechanism, which can be interesting to those working on basic research and focused on studying signal transduction. However, this study lacks some novelty. Throughout the manuscript, the authors only observed the physiological consequences of manipulating this pathway based on the eye phenotypes, and in the discussion, many hypotheses were raised based on the already available literature, which shows that much is already known about the Hippo pathway. The advances shown in this study are limited to the description of the signaling pathway itself and to the eye morphology. As a suggestion, the authors should explore the knowledge of their findings in order to understand how we can use them to achieve advances in other fields and physiological conditions. For example, only at the end of the discussion, did the authors raise the questions that would really push their discoveries a step forward, namely how this mechanism acts during the response to tissue wounding and whether the mammalian orthologs of Lgl and Vap33 also act via these mechanisms to control tissue growth in mammals. It would be interesting if the authors could direct their research efforts to understand if the proteins identified can be targeted to improve wound healing or to delay aging for example. Altogether, the authors present an interesting study but, at this moment, it still lacks the significance and novelty needed for publication. We encourage the authors to keep up their good work to address these suggestions, which will definitely improve the quality of their study.

      Response: We respectfully disagree with the reviewer’s comments regarding the significance of our study. On the contrary, our study is significant since it has discovered a mechanism linking Lgl and Vap33-RtGEF/Git/Arf79F and the V-ATPase to the regulation of the Hippo pathway, an important tissue growth regulatory and tumour suppressor pathway. The Drosophila eye epithelium is a highly validated model for exploring mechanisms that are relevant to human epithelial biology and cancer. Whilst extending our studies of the mechanism by which Lgl controls the Hippo pathway to wound healing and mammalian systems would be the next step, this is beyond the scope of this discovery paper.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary This manuscript investigates potential mechanisms through which the lgl gene might affect the Hippo signaling pathway. The authors employ a combination of physical interaction studies and clonal analysis in Drosophila eye discs to investigate potential links between lgl and other genes. Some of the results are intriguing, but the analysis is rather preliminary, and there are technical concerns with some of the results presented.

      Main issues - The authors propose effects of genes involved in vesicle trafficking and acidification in Hippo signaling, but there is no clear cellular mechanism described by which these effects could be mediated. This deserves further consideration. eg if they think there are effects on the localization of Hippo, this could be directly examined. In the Discussion, the authors suggest that "The V-ATPase might therefore act to inhibit Hippo pathway signalling by blocking the interaction of Lgl/Vap33/RtGEF/Git/Arf79F with Hpo in vesicles, thereby altering Hpo localization and inhibiting its activity." but Hippo is a cytoplasmic protein and has never been reported to be within vesicles.

      Response: Whilst Hpo is a cytoplasmic protein there is evidence that it could also be associated with vesicles, since Hpo pathway components bind to several endocytic proteins by mass spectrometry analysis (Kwon et al., 2013, Science, PMID: 24114784; Verghese and Moberg, 2020, Front. Cell Dev. Biol., PMID: 32010696). We have previously published that Hippo localization is altered in lgl mutant tissue (Grzeschik et al., 2010, Curr. Biol., PMID: 20362447). For a better precision, we have updated the wording to state that the proteins described in our manuscript may alter Hippo localization “on endosomes” as opposed to the previous “in vesicles”.

      • The Yki stains in Fig. 1 are confusing. The nature of the signal throughout the wing disc looks very different in 1A vs 1B vs 1C, this needs to be explained or re-examined. Fig 1C (wts RNAi ) seems to show an elevated Yki signal in some cells, and lower in others in - prior studies have reported that wts affects the nuclear vs cytoplasmic localization of Yki, but not its levels, so this needs to be clarified.

      Response: There are some tissue folds in the eye disc tissues that might be confusing the reviewer, but Yki nuclear staining is lower in Vha68-2 mutant clones, and higher in wts knockdown and Vha44 over-expressing clones (arrowheads). When Yki is concentrated in the nucleus the staining appears more intense, as it does in the wts knockdown clones. Similar results on Yki staining upon Hippo pathway impairment in epithelial tissues have been obtained by other Hippo pathway researchers (eg PMID: 20362445, __PMID: 19900439, PMID: __19913529, __PMID: __26364751).

      • In Fig 1D the clones appear to have different effects in different regions of the eye disc; the authors should clarify. Also, the disc in 1D appears much younger than the discs in 1A-C, but similar age discs should be used for all comparisons.

      Response: All eye discs are from wandering 3rd instar larvae, but the mounting of the samples on the slide and the confocal Z-section could account for apparent different regions of the eye disc showing stronger upregulation of Ex-LacZ and Yki staining. The data has been statistically analysed from multiple eye discs and the effects observed are significantly different to the control (as plotted in Fig 1E).

      • The authors should clarify whether any the manipulations they perform are associated with Jnk activation, as this could potentially provide an alternative explanation for downregulation of Hippo signaling.

      Response: Lgl mutant clones only upregulate the JNK target MMP1 in some cells at the border of the clones but show elevated Yki activity within the clones. Vha44 overexpressing clones do show upregulation of JNK signalling (Petzoldt et al., 2013, Dis Model Mech., PMID: 23335205), but since JNK signalling is known to inhibit Yki activity in the eye epithelium (PMID: 22190496), it is unlikely that the upregulation of Yki activity (downregulation of Hippo signalling) in Vha44 overexpressing clones is due to JNK activation.

      • The authors report in Fig 2C,E that over-expression of Vap33 reduced expression of Diap1, which they interpret as evidence of increased Hippo pathway activity, but this experiment is lacking essential controls, as the apparent reduction of Diap1 could simply reflect increased cell death or a change in focal plane, and indeed the difference in the label stain makes it look like these cells are undergoing apoptosis. Thus it's important to also have a stain for a neutral protein, or at least a DNA stain. Additionally, it is important to stain for at least one additional marker of Hippo pathway activity (eg ex-lacZ or Yki localization), as there are other pathways that regulate Diap1

      Response: We have previously examined the effect of Vap33 overexpressing clones on the Notch signalling pathway and do not see a reduction in Notch target gene expression relative to the control (Portela et al., 2018, Sci. Signal., PMID: 29871910, Fig 3). Thus, although there might be some cell death in Vap33 overexpressing clones (possibly due to lower Diap1 levels), it is unlikely that cell death per se results in lower Diap1 levels. We are unable to conduct further experiments to examine other Hippo pathway activity markers since my lab is now closed.

      • In Fig. 4 the authors perform PLA experiments to examine potential association between various pairs of proteins, but they don't show us key controls. They report in the text using single antibodies as negative controls, but this doesn't control for non-specific localization of antibodies. The better negative control is to do the PLA experiments in parallel on tissues lacking the protein being detected (eg from animals not expressing the GFP- or RFP-tagged proteins they are examining). Also, there is a lot of variation in the apparent signals shown in different PLA experiments in fig 4, the authors should comment on this.

      Response: We have previously used the PLA assay to examine Lgl and Vap33 interactions (Portela et al., 2018, Sci. Signal., PMID: 29871910, Fig 2) and have conducted an experiment expressing Vap33 tagged with HA via the GMR driver in the posterior region of the eye disc and then detected Lgl-HA protein interactions, which only showed PLA foci in the posterior region where Vap33-HA is expressed but not in the anterior region where Vap33-HA is not expressed. This may be thought of as the best possible control since these differentially expressing regions were part of the same tissue sample. Furthermore, in our previous study (Portela et al., 2018, Sci. Signal., PMID: 29871910, Fig S2), we conducted a negative control PLA using the GFP and Vap33 antibodies in eye tissue not expressing GFP-Lgl and observed no PLA foci. We have edited the text to refer to these controls.

      The variation in PLA signal may be due to low levels of expression of certain proteins or lower levels of protein-protein interactions. We have edited the text to add this explanation.

      • The authors claim that RtGEF mutant cells increase Diap1 expression, and that Vap33 over-expression reverses this effect (Fig. 5). The effect of RtGEF looks very subtle and variable, it should be confirmed by examining additional reporters of Hippo pathway activity. It also seems like the disc in 5A is at a different stage &/or the quantitation is done from a different region as compared to the disc in 5C.

      Response: RtGEF mutant cells have also been shown to upregulate the Yki target, Ex-LacZ (Dent et al., 2015). Unfortunately, we were unable to construct an Ex-LacZ RtGEF mutant stock and there was no available Ex antibody.

      For Diap1 quantification, clones were chosen just posterior to the morphogenetic furrow of each eye disc and multiple clones were analysed relative to the adjacent wild-type clones in many samples and quantified and plotted in Fig 5E.

      • The analysis of the influence of Vha68-2 mutant clones, and their genetic interaction with Git, similarly suffers from missing controls and incomplete analysis. Additional Hippo reporters besides just Diap1 should be examined. The Diap1 analysis which shows reduced expression needs examination of neutral controls or nuclear markers to assess potential apoptosis within clones, or changes in focal plane.

      Response: We have also examined the effect of Vha68-2 clones on Ex-LacZ expression (Figure 1) and show that it is also reduced relative to the surrounding wild-type clones.

      We have previously examined Vha68-2 mutant clones for the expression of a Notch pathway target (Portela et al., 2018, Sci. Signal., PMID: 29871910, Fig S1) and show with DAP1 staining that cells are in the same plane and are retained in pupal retina, so are not dying. We now refer to our previous study in the text.

      Similarly, the analysis of Arf79F mutant clones in Fig 7E,G is compromised by lack of controls for viability and tissue layer, and analysis of an additional Hippo reporter is once again essential.

      Response: We don’t believe DAPI stains are necessary as the GFP membrane/cytoplasmic staining clearly shows the outline of the cells and where the nucleus is in the mutant clones and shows that the cells are intact and not dying.

      Reviewer #2 (Significance (Required)):

      The strength of the study is the potential dissection of novel connections between the lgl tumor suppressor and the Hippo pathway. However, there are signifiant limitations due to the preliminary nature of the study, which is incomplete and missing essential controls. If these limitations are overcome the work will be of interest to specialists in the field.

      Response: We are hoping that our explanations and responses to the main issues above alleviate concerns regarding controls.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this study, Portela and colleagues identified new regulators of Hippo pathway downstream of the core apico-basal polarity protein Lgl. While the impact of Lgl depletion of Yki activation was already characterised both in Drosophila and Vertebrates, the mechanism connecting these two pathways was still unclear. Using the Drosophila eye, mosaic analysis, epistatic analysis and mass-spectrometry, they identified two routes through which Lgl depletion can lead to Hippo pathway downregulation and eye overgrowth. This regulation required the previously characterised Lgl interactor Vap33, which on the one hand activates Hippo by inhibiting the V-ATPase, and on the other hand activates Hippo through its interactions with the actin regulators Git, RtGEF (two previously characterised regulators of Hippo, https://pubmed.ncbi.nlm.nih.gov/25484297/) . They also identified another regulator of Hippo downstream of Lgl, Arf79F, whose ortholog interact with Git in mammals and is also in close proximity with Hippo, Git and RtGEF in Drosophila, and whose depletion abolish Hippo downregulation and eye overgrowth in Lgl mutant. This is a well performed study which identified new links between Lgl and regulation of the Hippo pathway. Many of them are conserved in mammals and may be relevant in pathological conditions associated with Lgl loss of function and Yap missregulation. The experiments are well conducted with a quite thorough epistatic analysis combined with many assays to characterize protein interactions. Admittedly, the molecular mechanism remains uncharacterised and some experiments may help to indicate putative mechanisms, but the characterisation of these news regulators and clear genetic interactions results constitute already solid and interesting data. I have some suggestions though that could help to reinforce the conclusions.

      Main suggestions :

      1. While the precise molecular mechanisms is not absolutely necessary, it would be interesting to document the subcellular localisation of these new Hippo regulators in WT and Lgl mutant context (Git, RtGEF Vap33 and Arf79F), either with Antibody if they exist, or with fusion protein (which for a good part were already generated for the PLA results). This may reveal obvious misslocalisation which would support the role of Lgl as a scaffolding protein that maintain proper subcellular localisation of these factors.

      Response: Whilst this experiment would extend the study, we are unable to do this since my lab has now closed.

      Most of the epistatic experiments focus on factors that rescue the overgrowth and increase of diap1 expression in Lgl mutant. Did the author test if any of these core regulators are sufficient to recapitulate Lgl mutant eye phenotype, for instance Vap33 KD in the eye, or Arf79F overexpression. Negative results would still be informative as they would point to the existence of other downstream regulators of the eye phenotype

      Response: Vap33 knockdown by RNAi in clones does not phenocopy the lgl mutant mosaic adult eye phenotype (Portela et al., 2018, Sci. Signal., PMID: 29871910, Fig 2), presumably due to other functions of Vap33. We have added further details regarding this point In the Discussion.

      We have not examined Arf79F overexpressing clones.

      It is at the moment hard to interpret the relevance of the results obtained by PLA. While there are some negative controls based on the absence of secondary antibody, what is the number of particle obtained for two non-interacting cortical proteins ? Since this is based on proximity, I would expect that some positive particles would still appear by chance, but much less than for two physically interacting proteins or subunits of a complex. Could the author provide such a negative control by testing for instance Git/RtGEF with another non-interacting cortical protein ? That would help to assess the relevance of the conclusions based on PLA.

      Response: The PLA is a robust assay to assess protein-protein interactions of proteins that are

      Some of the epistatic links are a bit hard to interpret at the moment, and additional epistatic test may be relevant. For instance, the increase of diap1 upon Git depletion in the Vha68 mutant (Figure 6) is used to conclude that Git is required for the Hippo upregulation upon reduced V-ATPase activity. However this could be compatible with two independent branches regulating Hippo (in an opposite manner), which is more less what is suggested by the authors in their conclusion and the model of figure 8. I would suggest to reformulate this conclusion in the result part. Similarly, there is currently no experimental exploration of the epistatic link between Arf79F, Git and RtGEF (which is based on results in mammals). It would be interesting to check if Git and RtGEF mutant phenotype (Hippo downregulation) can also be suppressed by downregulation of Arf79F.

      Response: We have now added further explanation to the result section regarding Fig 6.

      Unfortunately, we are unable to do further experiments since my lab is now closed.

      Apart from very obvious phenotype (eye in Lgl mutant mosaic) it is a bit hard to interpret the picture of adult eye provided in this study (specially for mild phenotype). Could the authors provide more explanation in the legends, and if possible some quantitative evaluation of the phenotype when relevant? Otherwise, apart from obvious rescue of the Lgl mutant, it is a bit hard to interpret the other genotypes (e.g. : Vap33OE, RtGEF mutant, Vha68 mutant)

      Response: We have added more explanation of the adult eye phenotypes in the text/fig legends.

      Other minor points :

      1. I would recommend when possible to clearly indicate in Figure 8 which part of the pathway are clearly documented in this study, and which part are still hypothetical (eg: link with PAK).

      Response: We have re-drawn the model figure to highlight what we have found in this study by adding orange arrows between Lgl-Vap33-RtGEF/Git-Arf79F-Hpo and Lgl-Vap33-V-ATPase and V-ATPase-Hpo.

      1. Page 4, the sentence "as aPKC's association with the Hpo orthologs, MST1/2, and uncoupling MST from the downstream kinase, LATS (Wts), thereby leading to increased nuclear YAP (Yki) activity [17], consistent with what we observe in Drosophila [5]." may need to be reformulated (at least I had trouble to understand it).

      Response: We have edited the sentence to "In mammalian systems, deregulation of Lgl/aPKC impairs Hippo signalling and induces cell transformation, which mechanistically involves the association of aPKC with the Hpo orthologs, MST1/2, thereby uncoupling MST from the downstream kinase, LATS (Wts) and leading to increased nuclear YAP (Yki) activity [17], consistent with what we observe in Drosophila [5]."

      1. Page 11 : "a decrease in Diap1 expression was observed and clones were smaller than wild-type clones (Fig 7E), suggesting that the Arf79F knockdown clones were being out-competed" I am not sure one can conclude from this that the clone are "outcompeted" (which would suggest at context dependent disappearance of clone, while here the data could be totally compatible with a cell-autonomous decrease of growth and survival). This statement would only make sense if global eye depletion of Ar79F had no adult eye phenotype.

      Response: We have edited the sentence to "a decrease in Diap1 expression was observed and clones were smaller than wild-type clones (Fig 7E), suggesting that the Arf79F knockdown clones have reduced tissue growth ----."

      Reviewer #3 (Significance (Required)):

      This study identifies regulators of Hippo which through their interactions with Vap33 explains for the first time how Lgl depletion leads to Hippo misregulation (without impairing apico-basal polarity). This is an interesting study based on epistatic analysis and mass-spectrometry and identify several regulators conserved in mammals. While the molecular mechanism remained to be explored, it clarifies for the first time how Lgl depletion ( a core regulator of apico-basal polarity) leads to Hippo downregulation and tissue overgrowth, a phenotype also observed in mammals and characterised several years ago in Drosophila. The authors previously characterised the interaction between Vap33 and Lgl and its role in the regulation of Notch signaling through the V-ATPase. This study nicely complement these previous results and connect now Vap33 with Hippo and Lgl while answering a long unresolved question (how Lgl depletion affect Hippo pathway).

      This results will be interesting for the large community studying the hippo pathway, apico-basal polarity and tissue growth. It also outlines interesting factors that could be relevant for tumour neoplasia and hyperplasia.

      I have expertise in epithelial biology, apoptosis, cell competition, Drosophila, cell extrusion, mechanobiology, morphogenesis and growth regulation.

      Response: We thank the reviewer for recognizing the significance of our study.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this article, the authors delve into an intriguing topic, aiming to enhance our understanding of the organization of the mitochondrial genome of T. gondii, a parasite of significant importance in both human and animal health contexts.

      In essence, their approach involves enriching mitochondrial material, followed by genome sequencing and the analysis of mitochondrial short RNAs. They achieve a remarkable depth of mitochondrial sequencing and generate valuable RNA data. Furthermore, their efforts lead to the discovery and annotation of new short RNAs.

      Overall, the article is well-crafted and presents compelling results. However, it's worth noting that, at times, the authors appear somewhat self-congratulatory, and certain results might be perceived as overly ambitious. Nevertheless, the discussion is aptly constructed.

      Major comment:

      They assert certain discoveries that had already been reported. Notably, they adapt an existing protocol for mitochondrial enrichment and describe it as 'We developed a protocol to enrich T. gondii mitochondria.' It's worth noting that they neither reference a more recently described protocol (PMC6851545) nor compare the performance of their modified protocol with the original.

      The protocol they employ does not seem to yield exceptionally high success rates, as mitochondrial DNA constitutes less than 10% of the total sequenced DNA.

      Additionally, they frequently mention the identification of specific combinations of sequence blocks previously identified by Namasivayam et al. (PMC8092004), which was also discussed in Namasivayam et al. 2021."

      Missing in the supplementary material are basic details on the sequences performed. Distribution of mitochondrial reads length, depth, etc.

      Further clarification is needed for Figure 2. Specifically, the frequency units or combinations of frequency (A, B, and C) are not clearly explained. While the matrix's asymmetry suggests a 5'- 3' orientation difference, this orientation difference is not explicitly specified (B). Additionally, the fragment Mp does not appear in the block combination figure (C).

      Some points to improve the introduction:

      Provide an evolutionary context for the following phrase: 'An idiosyncratic feature of Apicomplexa is a highly derived mitochondrial genome.' Specify what you intend to emphasize.

      Line56: The sentence must begin with a capital letter

      In line 58 "Nuclear genes encoding proteins with functions in mitochondria contribute strongly to P. falciparum and T. gondii cell fitness" Although it is mentioned later, it would be more effective to introduce the fact that all but three genes are encoded in the nucleus.

      Line68: "Apicomplexan mitogenomes usually code only for three proteins" It seems to me that 'usually' should not be included.

      Line 65-67: The sentence should include that the mitochondrial genome is composed of a total of 20 blocks of repeating sequences organized in multiple DNA molecules of varying length and non-random combinations

      At the end of the introduction, the authors state that they have developed a protocol for mitochondrial enrichment. The text should be modified taking into account that: 1- The new protocol is an adaptation of another existing protocol. In fact, the Methods the authors say the protocol was "slightly" modified. 2- There is already existing mitochondrial enrichment protocol available [Reference: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6851545/#mmi14357-bib-0074]. In any case, they should consider performing a comparative analysis between the proposed protocol and existing ones to determine its relative effectiveness. It should be noted that the proposed protocol enriches in organelles (including the nucleus and apicoplast), but when sequencing DNA, mitochondrial DNA accounts for only 5% of the total reads, which may raise doubts about its overall efficacy.

      Some points related to Results section:

      Lines 113-115: 'To distinguish between NUMTs (nuclear DNA sequences that originated from mitochondria) and true mitochondrial sequences, it is necessary to enrich mitochondrial DNA.' I disagree with this sentence. NUMTs, in general, consist of very short sequences. With long reads, it is relatively straightforward to differentiate mitochondrial sequences from those nuclear sequences that have small mitochondrial fractions. In my opinion, even many Illumina reads can be confidently identified as belonging solely to the mitochondria. I found this article that supports this argument, indicating that the majority of NUMTs are less than 100 nucleotides in length [Reference: https://pubmed.ncbi.nlm.nih.gov/37293002/].

      Lines 166-168: 'A previous sequencing study used Oxford Nanopore sequencing technology (ONT) to identify combinations of sequence blocks in T. gondii mitochondria (Namasivayam et al. 2021).' However, it's important to note that Namasivayam's group did not merely use ONT to identify combinations of blocks; rather, they discovered, identified, and defined these combinations based on sequencing with long reads.

      Line 177: "The length of mitochondrial reads ranged from 87 nt to 17,424 nt" It would be beneficial to include a histogram depicting the length distribution of the obtained reads. It's worth noting that nanopore reads tend to be shorter than Illumina reads

      Line 194-195 "we found that only a small fraction of all possible block combinations are prevalent within the genome" this has been previously described (PMC8092004)

      Line 201. "This indicates that the genome's flexibility is limited and that not all block combinations are realized". This is consistent with the findings published by Namasivayam et al. in 2021, which have already established that the combination of the 21 blocks is non-random.

      Line 205: "All combinations are well covered in our ONT results and helped to refine block borders relative to previous annotations (Fig. S2)" In the supplementary materials the authors say: "However, the blocks Fp, Kp, and Mp frequently occur separately in the mitochondrial genome We therefore treated Fp, Kp and Mp as separate blocks and have shortened the blocks F, K and M accordingly". As far as I understand, for this very reason, Namasivayam and collaborators annotate them as partial fragments, which may appear in other regions but are, in turn, parts of larger F, K, and M fragments. To redefine the segments F, K, and M without the sequences corresponding to Fp, Kp, and Mp, as shown in Figure S2, these fragments should be distinct from the 'partials.' In other words, segments of the type (F minus Fp), (K minus Kp), and (M minus Mp) should appear in the reads, and should be distinguishable from Fp, Kp, and Mp. If this distinction is made, I am satisfied with the new definition.However, if such a separation is not evident, it seems important to clarify it in the text or to reconsider this new definition.

      Lines 221-223: "This suggests that there is no need to postulate mechanisms of genomic or posttranscriptional block shuffling to arrive at full-length open reading frames." The authors argue that invoking mechanisms of genomic or post-transcriptional block shuffling is unnecessary to explain the presence of full-length open reading frames, given that genes represent 2-3% of mitochondrial sequences. However, there is a missing estimate regarding the probability of encountering all three genes within a single molecule or mitochondrial genome, as well as the total number of sequenced mitochondria. Consequently, the statement appears overly assertive. In the absence of alternative mechanisms for generating complete genes, this would mean that at most only 1646 mitochondrial genomes would have been sequenced. To comprehensively address this issue, the authors should consider discussing this scenario further. They should also provide information about how many reads they found containing all three genes and how many contained two of the genes.

      Lines 249-250 "using the block combinations identified here by ONT sequencing " which is the difference between blocks identified here with those on Namasivayam ? The division of M, K and F fragments?

      Line 287: "The six remaining small RNA fragments are specific to T. gondii" I would suggest being more cautious in this sentence by stating that they were not found in other organisms. Given the similarity of the mitochondrial genome between T. gondii, N. caninum, and other coccidians, it would be expected to find them in these organisms as well.

      Line 300 "Among the novel small RNAs identified, there is also a class that was only detectable due to our insights into genome block combinations." A valid strategy is to map the small RNAs to the generated nanopore reads or to an assembly made with these reads, rather than solely relying on the single blocks or combinations of blocks, as this approach would yield the same result.

      Line 444: "Upon closer scrutiny, however, the reshuffling appears limited to specific block borders and is not random" This was already established by Namasivayam et al 2021.

      I would like to highlight the potential for a more comprehensive examination of the mitochondrial genome in the discussion. While the proposed explanations for the presence of sRNAs at the 'block borders' appear plausible, it's worth noting that the definition of these blocks is artificial rather than biological. I think it is interesting to discuss without the concept of block sequences, but of sequences existing in the mitochondrial genome. Therefore, it's important to discuss whether these sequences (the block borders) are consistently present in all mitochondrial genomes. The total cumulative length of the blocks is 5.9 Kb, which is relatively small and comparable to one of the smallest mitochondrial genomes on record. It is conceivable that recombination and the generation of new sequences play a role in expanding genomic space for encoding, such as RNAs.

      Line 535-536 "We developed a protocol to enrich T. gondii mitochondria and used Nanopore sequencing to comprehensively map the genome with its repeated sequence blocks." I find this sentence to be somewhat assertive, especially considering that they modified an existing protocol and obtained results that may not be optimal. Additionally, they have not compared their protocol with other available methods for mitochondrial enrichment.

      Some points related to Method section: In none of the experiments is it specified how many parasites were initially used as a starting point

      "Masking NUMTs in the T. gondii nuclear genome" it's unclear whether the authors utilize all hits or filter the results of BLASTN. It would be helpful if they specify the criteria for filtering, such as identity percentage or query coverage. Additionally, it's not clear how they generate the GFF3 file from the BLAST results, or whether they instead create a BED file. Providing clarification on this process would enhance the reproducibility of their methods. Moreover, it would be beneficial if the authors include information regarding the number of sequences they intend to mask, the average length of the NUMTs, and the total percentage of the genome these masked sequences represent.

      Line 657 "Mapping results were filtered using SAMtools"<br /> The text does not specify the filtering criteria or the parameters used for this process.

      Line 673 establish "No matching reads were found" in the "Sequence comparisons of ONT reads found here with published ONT reads for the T. gondii mitochondrial genome" but in the results the authors say: "While smaller reads of our dataset are found in full within longer reads in the published datasets, we do not find any examples for reads that would be full matches between the dataset. Could you provide a more detailed explanation? Specifically, I would like to know how many reads from the dataset (including their length) are also present in other datasets, and at what minimum length do they cease to coincide?

      689 - The text does not specify the filtering criteria or the parameters used for Samtools filtering process.

      Lines 689-693 Please describe better the methodology used.

      Line 696: the program is fastp not fastq (Chen et al. 2018)

      Line 697: what do you mean only the ends of the reads were mapped? how many bases? Or do they mean that they map the reads fowrards and reverse reads?

      Significance

      In this article, the authors delve into an intriguing topic, aiming to enhance our understanding of the organization of the mitochondrial genome of T. gondii, a parasite of significant importance in both human and animal health contexts.

      In essence, their approach involves enriching mitochondrial material, followed by genome sequencing and the analysis of mitochondrial short RNAs. They achieve a remarkable depth of mitochondrial sequencing and generate valuable RNA data. Furthermore, their efforts lead to the discovery and annotation of new short RNAs.

      Overall, the article is well-crafted and presents compelling results. However, it's worth noting that, at times, the authors appear somewhat self-congratulatory, and certain results might be perceived as overly ambitious. Nevertheless, the discussion is aptly constructed.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02012

      Corresponding author(s): Frederic, Berger

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank reviewers for useful suggestions and comments on our manuscript which helped to improve and strengthen our conclusions. Our point-by-point answers are below. We have answered most of the points raised by the reviewers and added numerous new experimental data including detailed structural and biochemical analyses that led to support further that BCP4 (and not BCP3) is the plant functional counterpart of MDC1 because in response to DNA damage it binds phosphorylated H2A.X and recruits the MRN complex. In addition, we provide further support to the phylogenetic analysis and evidence for the plant counterpart of PAXIP1.

      We believe that our revised manuscript which includes a set of new experimental data strongly support our main conclusion that BCP4 is a functional counterpart of metazoan MDC1.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      MDC1 is a key regulator of DNA damage responses (DDR) in animals. MDC1 has multiple protein domains, in which the BRCT domain binds γH2A.X. However, plants lack the homolog of MDC1. In this study, the authors found that BCP4 binds γH2A.X and proposed that BCP4 is a functional counterpart of MDC1, which will greatly enhance our understanding of plant DDR pathway. I have the following concerns.

      1. The relationship between BCP3 and BCP4 needs to be clarified. Line 255, the authors mentioned that"we conclude that BCP3 and BCP4 have functional properties as human MDC1". In the Abstract, the authors mentioned that "we identified BCP4 as a candidate ortholog of human MDC1". I am confused about the conclusion. Both BCP3 and BCP4 are or only BCP4 is MDC1? In addition, in BCP3 and BCP4, only their BRCT domains share homology with MDC1. They lack other domains of MDC1. Therefore, "ortholog" may not be an appropriate term. I think "functional counterpart" may be a better term.

      Response: Our analysis emphasizes the fact that human MDC1 is very derived from an ancestral form MDC1 that did not share most domains found in MDC1 from mammals. Because it is still difficult to establish with certainly what the ancestral MDC1 was, we agree that functional counterpart is a more correct term, so we changed this accordingly throughout the manuscript.

      BCP1-4 all contains tandem BRCT domains. I am wondering whether it is possible to figure out why only BCP3 and BCP4 bind γH2A.X through sequence analysis. Are there any key residues essential for γH2A.X binding?

      Response: We used AlphaFold models of tBRCT domain of BCP1, BCP2, BCP3, and BCP4. While in Alphafold models the tBRCT domain of each BCP protein largely overlaps with a structure of human MDC1 tBRCT domain, only the tBRCT domain of BCP3 and BCP4 are predicted to make contacts with γH2A.X similar to that of human MDC1. Although residues that are involved are not fully conserved between BCP3/4 and human MDC1 we obtain in vitro data supporting that the interaction of BCP4 is mediated by a comparable pocket of three key residues that contact the phosphate group of γH2A.X. See also answers to comments of Referee #2 and new Figures 3 and 4, corresponding description on page 8-9, and Supporting figure 3.

      Line 183, "On an unrooted phylogenetic tree, these two proteins clustered with MDC1 and PAXIP1 (Figure 1B).". In Figure 1B, MDC1 is closer to BCP3 and BCP4 than PAXIP1 and PAXIP1 is closer to BCP2 than MDC1. If the authors want to include PAXIP1 in Figure 1C, the authors should include BCP2 as well. In the γH2A.X binding assays, I do not understand why the authors tested BCP1 instead of BCP2. In Figure 2D, why bcp2 was not included?

      Response: We created a new alignment for Figure 1C including BCP2 tBRCT domain and the tree that includes all BCP BRCT domain (Figure 1D) does support a close relation between MDC1 and BCP3 and 4 and PAXIP1 and BCP1. As we stated on page 5-6 lines 175-178, BCP2, also contains acetyltransferase domain, which is unique for plant BCP2 protein. Based on its domain organization, BCP2 was not considered as a candidate for MDC1 homolog, and we did not perform mutant complementation. This is why after our initial analysis of bcp mutants (DNA damage sensitivity, formation of gammaH2A.X, and phylogeny), and based on similarities with MDC1 and PAXIP1 we focused on bcp1, 3, and 4 mutants and the corresponding proteins. The function of BCP2 remains to be investigated, but this is out of the scope of this manuscript that is primarily dedicated to find the functional counterparts of MDC1 and PAXIP1.

      The expression level of BCP1-4 in the mutants need to be examined using qRT-PCR. Especially, for the bcp3 mutant, which is a weak allele.

      Response: We did not perform this experiment, because it was done in Vladejic et al., 2022 and expression data are available from various genomic dataset on TAIR.

      The authors used "bleomycin" or "zeocin" in different parts. Please be consistent.

      Response: We consistently use Bleomycin for treatment of seedlings followed by western blotting and Zeocin for true leaf assay. These two agents produce DNA double strand brakes in similar ways, and we could show previously that levels of γH2A.X and γH2A.W.7 are similar when using these two agents (Rosa M, Mittelsten Scheid O Bio. Protoc. 4:e1093. doi: 10.21769/BioProtoc.1093: Lorkovic et al., Curr Biol. 2017, doi: 10.1016/j.cub.2017.03.002). Zeocin was chosen for true leaf assays because we observe lower variation between batches and biological repeats compared with bleomycin.

      1. Figure 3E and 3F, please indicate the treatments of the upper and lower panels.

      Response: Thank you for pointing this out. This has been indicated in the corresponding legend of the new Figure 3 A - C.

      Line 338, "bcp1 mutants show reduced homologous recombination rates (Fan et al., 2022; Vladejić et al., 2022; Yu et al., 2023)". The bcp1 mutant was not reported in Fan et al. paper.

      Response: This sentence has been changed to accurately describe data in each of the mentioned papers.

      Line 40, please add a comma after "In ". Line 331, please add a comma after "In mammals". animal

      Response: This has been corrected.

      Line 123, "only BRCA1 and BARD1 were described in plant lineage". Additional BRCT proteins were described in plants, including XIP1 (Nat. Commun. 13:7942), BCP1/DDRM2 (New Phytol. 238:1073-1084; Front. Plant Sci. 13:1023358), and DDRM1 (PNAS, 119: e2202970119).

      Response: This sentence refers to known BRCT domain mediator/effector proteins. From the published data about XIP1, BCP1/DDRM2, and DDRM1, it is not possible to assign these functions to proteins in question. Nevertheless, we changed this sentence to avoid ambiguous interpretation and we later in the text introduce XIP1, BCP1/DDRM2, and DDRM1 proteins as needed.

      Reviewer #1 (Significance (Required)):

      This study identified BCP4 as a functional counterpart of MDC1, which filled the gap of plant DDR signaling.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ In this study, Frédéric Berger and colleagues identified BCP4 in Arabidopsis thaliana as a potential plant orthologue of vertebrate MDC1. The conclusions are based on both in silico analysis (phylogenetic analysis) and in vitro biochemical and cell biological experiments. BCP4 loss causes sensitivity of DNA damage. Moreover, BCP4 binds to a phosphopeptide derived from the C-terminus of H2AX, via its C-terminal BRCT domains and forms foci in cells exposed to DNA damage, which co-localize with gammaH2AX foci.

      Major comments: The conclusions are generally supported by the data, but the evidence presented is still quite limited. For example, it is still possible that BCP4 recruitment to sites of DNA damage is mediated by another protein and not by direct interaction with gammaH2AX. To firmly conclude that BCP4 is an MDC1 orthologue, it is in my opinion essential to perform a (limited) mutagenesis analysis. The key amino acids in the BRCT domains that recognize gammaH2AX need to be mutated and it has to be shown that these mutants are defective for H2AX phosphopeptide binding and are not recruited to sites of DNA damage. Such residues may be tricky to identify, but one obvious candidate would be the Ser residue in beta1 (VLFS motif). In vertebrates, this is a Thr that directly interacts with the phosphate in gammaH2AX. Another possible critical site may be shortly before alpha2 (RTRN motif). In vertebrates, it is RTVK, and the K makes direct contacts with the phosphate in gammaH2AX. This function is perhaps carried out by an R. Structure prediction with alphafold may help to identify the most critical residues

      Response: We thank the reviewer for these suggestions. We used AphaFold to predict structures of tBRCT domains of all BCPproteins and compared them with structure of human MDC1 in complex with gamaH2A.X peptide. Based on these analyses we performed mutagenesis of critical amino acids in BCP4 based on their predicted interaction and their conservation. We showed that mutations of critical residues reduced or almost completely abolished binding of BCP4 to γH2A.X. These data are now part of the new Figure 4. See also corresponding description on page 8-9. In addition we provide genetic data that show that the foci formation of BCP4 depends on H2A.X (new Fig 3B and C). We did not attempt genetic complementation experiments with these mutants because it would take nine months to obtain stable transgenic plant lines expressing various mutant versions of BCP4 and the limitation of Arabidopsis transgenesis does not allow to control precisely the expression of transgenes, which could cause a difficult interpretation in this particular case.

      Another critical issue is the introduction of the study. This needs to be revised, because the literature is not correctly cited in several places. For example, the cited paper by Salguero et al., 2019 did not show that the PST repeats of MDC1 constitute a docking site of TP53BP1, but instead, that the PST repeats can bind to chromatin independently of gammaH2AX.

      Response: We thank the reviewer for spotting this mistake. We carefully checked all references and corrected all wrongly associated ones or used original reports instead of reviews.

      Also, we did re-write some parts of the Introduction as referee #1 also asked for some clarification.

      The data are generally well presented and convincing. The only thing that needs to be added is a quantification of the microscopic analysis (e.g. number of foci per cell, or similar).

      Response: We quantified the foci number in all mutants reported in Figure 2C. These data are now included in the new Figure 2D. Optional: it would be interesting to address the question why plants seem to have two MDC1 orthologues. The longer BCP4 and the shorter BCP3. What is the functional difference between those? Do they perhaps distribute functions that are combined in one protein in vertebrate MDC1 on two different proteins? Response: Thank you for prompting us to address this outstanding question. We now provide evidence supporting that only BCP4 is a functional counterpart of MDC1. We show that a specific region of BCP4 but not BCP3 is able to interact with NBS1 of the MRN complex (see new Figure 6). Also, BCP3 is missing the N-terminal TQxϕ repeats present in BCP4. Although the function of these repeats is unknown at this point, these data together suggest some functional diversification between BCP3 and BCP4. We mention this on page 11, lines 372-374.

      Reviewer #2 (Significance (Required)):

      The strength of the study is the detailed phylogenetic analysis. Also, the biochemistry and cell biology is well done.

      Limitations are the lack of evidence that BCP4 carries out functions in the cell (beyond recognising gammaH2AX) that are carried out by MDC1 in vertebrate cells

      Response: We thank the reviewer for pointing out this important point. To address it we performed pull-down assays with TQxϕ and SQ/DWD regions of BCP4 with NBS1 and found that Arabidopsis NBS1 interacts with the SQ/DWD region, and that this interaction is mediated by FHA+tBRCT of NBS1. Based on Alphafold prediction, we performed further deletion and point mutation analysis of the SQ/DWD region and defined that the binding of NBS1 requires an alpha-helix comprising sequence that is not conserved in BCP3. So, we concluded that a sequence specific of BCP4 (not in BCP3) is capable of recruiting the MRN subunit NBS1.

      At this point we could not demonstrate this in vivo by analyzing NBS1 foci in BCP4 mutant background. Unfortunately, commercial antibodies for plant NBS1 or other subunits of the MRN complex are not available, and to get transgenic plants expressing fluorescent protein tagged NBS1 would require a period much longer than the time for reasonable revisions of a manuscript. Nevertheless, our in vitro interaction data strongly argue for BCP4 having function in binding MRN complex as human MDC1, although the mode of interaction of BCP4 with NBS1 is different from that of human MDC1 and NBS1.

      Please see the new Figure 6 and corresponding description on page 11-12.

      The study is of great interest to readers working on chromatin responses to DNA damage in plants.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary The authors set out to find proteins containing BRCT domain to isolate the readers of phosphorylated H2A.X in plants. Using systematic analysis of the BRCT domain proteome, they discovered 21 proteins. Further analysis showed that BCP3 and BCP4 are the ortholog of animal MDC1 and BCP1 is the animal ortholog of PAXIP1. They also extended their work to an evolutionary perspective, finding that BCP1 and BCP4 in plants and PAXIP1 and MDC1 in metazoans evolved independently form a common ancestor. However, this manuscript raises some concerns. Checkout the comments and questions below.

      Major comments: 1) If you think that BCP3 and BCP4 work as a mediator of DDR, can you show us that those mutants have a defect of DDR? The authors only assessed true leaf developing. Leaf developing is affected by not only DNA damage but also other factor. Therefore, authors should show us additional data showing the BCP mutant lines show defective of DNA damage response.

      Response: The “true leaf assay” is a classical assay for testing plant mutants for DNA damage sensitivity (Rosa M, Mittelsten Scheid O Bio. Protoc. 4:e1093. doi: 10.21769/BioProtoc.1093). If DNA damage occurs and is not efficiently repaired, meristematic cells in shoot meristem are arrested and do not divide, hence plants do not produce the first pair of “true” leaves after cotyledons expand. In this assay cotyledons open and grow normally as they are already fully determined and do not undergo any cell division after seed germination.

      In this assay the treated WT seedlings also show a reduction of the number of plants with true leaves as compared with untreated WT (100%). Furthermore, WT and mutant seedlings develop normally and comparably without Zeocin induced DNA damage.

      2) Do you have DNA damage sensitivity data for bcp3 bcp4 double mutants?

      Response: We obtained bcp3bcp4 double mutant and tested it for DNA damage sensitivity. The double mutant is slightly more sensitive than bcp4 single mutant, but not as sensitive as H2A.X mutant. The reason for this is presumably the nature of the bcp3 mutant allele available, with a T-DNA insertion located in the 5’-UTR with some residual expression of BCP3 protein as reported by Vladejic et al., 2022. We did not feel that this would improve the manuscript, so we did not include this data. To obtain a new mutant allele would take time and work beyond the reasonable time required for revision. In addition, since we show that the functional counterpart of MDC1 is BCP4, we did not think that it is relevant to pursue further the characterization of the function of BCP3 in the context of this manuscript.

      3) Some red algae have H2A.X but don't have BCP4 and BCP1 (Figure 4). In this case, how do they read the phosphorylated H2A.X? Can you discuss the point?

      Response: Actually, most red algae do not even have H2A.X. At this point we do not have data that could answer this question and it is difficult to make any prediction about this. Analysis of DDR system in red algae is totally beyond the scope of the current manuscript. See also answer to comment #5.

      4) L307-L312: I thought that the timing of the appearance of SQEF motif in H2A.X differ from the appearance of BCP4 from Figure 4. Why do you say that the evolution of BCP4 and H2A.X coincides?

      Response: we thank the Reviewer for pointing out the need for clarification.

      Histone H2A with a C-terminal SQEF/Y motif is categorized as H2A.X that distinguishes this variant from H2A.Z (not discussed here) and H2A itself. In Archaeplastida many algal species possess either H2A or H2A.X. Only in streptophytes the ancestral gene duplicated leading to neofunctionalization of both H2A and H2A.X and in this case H2A.X form a monophyletic clade. The evolution of BPC1 and 4 are slightly posterior or coincident with this neofunctionalized H2A.X variant, suggesting co-evolution in streptophytes.

      5) Some red algae don't have BCP1, BCP4 and H2A.X. How do they transfer the signal to downstream? Do you have any idea about this?

      Response: To address this interesting question we re-analyzed BRCT domain proteome of the red algae and again could not find any protein containing features of BCP4 present in green algae and land plants or in Opistokont MDC1.

      We did find that red algae without MDC1 do encode MRE11, RAD50 but not NBS1. Also, components of non-homologous end joining DNA repair pathway, Ku70 and Ku80 are conserved in these organisms. So, how some red algae cope with DNA damage remains enigmatic. Similarly unicellular red algae do not have the classical autophagy pathway. This is the result of the very strong genome reduction (Response: Thanks for this comment. We did change title of the manuscript to avoid ambiguity.

      Minor comments: 6) I think you should show us a schematic representation of BAP1 and PAXIP1 to compare both protein features.

      Response: We added schematic presentation of PAXIP1 to Supporting Figure 2B.

      7) L176-L178: Which data support this sentence? Response: The sentence in question: “BCP1 has two tBRCT domains positioned at the N- and C-terminus and a so far unrecognized C-terminal PHD finger which is present in all plant lineages except Brassicaceae (Supporting Figure S1A and S2A).”

      Response: Our data presented on Supporting Figure S1A (schematic presentation of BCP1 protein with indicated PHD finger consensus sequence) and S2A and Source data (alignment of PHD fingers in BCP1 in flowering plants, non-flowering land plants and multicellular green algae) clearly demonstrate the presence of a C-terminal PHD finger in BCP1 except in Brassicaceae. These can also be seen in the full complement of BCP1 sequences that are available in Source data.

      8) L271-L279: There are unreadable characters at "TQx_".

      Response: This very likely appeared during conversion into PDF file. We fixed this now.

      Reviewer #3 (Significance (Required)):

      Significance: General assessment: This study give us an idea how organisms have evolved the upstream system of DDR.

      Advances: This study extend the knowledge of DNA damage response in plants.

      Audience: broad and basic research

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      MDC1 is a key regulator of DNA damage responses (DDR) in animals. MDC1 has multiple protein domains, in which the BRCT domain binds γH2A.X. However, plants lack the homolog of MDC1. In this study, the authors found that BCP4 binds γH2A.X and proposed that BCP4 is a functional counterpart of MDC1, which will greatly enhance our understanding of plant DDR pathway. I have the following concerns.

      1. The relationship between BCP3 and BCP4 needs to be clarified. Line 255, the authors mentioned that"we conclude that BCP3 and BCP4 have functional properties as human MDC1". In the Abstract, the authors mentioned that "we identified BCP4 as a candidate ortholog of human MDC1". I am confused about the conclusion. Both BCP3 and BCP4 are or only BCP4 is MDC1? In addition, in BCP3 and BCP4, only their BRCT domains share homology with MDC1. They lack other domains of MDC1. Therefore, "ortholog" may not be an appropriate term. I think "functional counterpart" may be a better term.
      2. BCP1-4 all contains tandem BRCT domains. I am wondering whether it is possible to figure out why only BCP3 and BCP4 bindγH2A.X through sequence analysis. Are there any key residues essential for γH2A.X binding?
      3. Line 183, "On an unrooted phylogenetic tree, these two proteins clustered with MDC1 and PAXIP1 (Figure 1B).". In Figure 1B, MDC1 is closer to BCP3 and BCP4 than PAXIP1 and PAXIP1 is closer to BCP2 than MDC1. If the authors want to include PAXIP1 in Figure 1C, the authors should include BCP2 as well. In the γH2A.X binding assays, I do not understand why the authors tested BCP1 instead of BCP2.
      4. The expression level of BCP1-4 in the mutants need to be examined using qRT-PCR. Especially, for the bcp3 mutant, which is a weak allele.
      5. The authors used "bleomycin" or "zeocin" in different parts. Please be consistent.
      6. In Figure 2D, why bcp2 was not included?
      7. Figure 3E and 3F, please indicate the treatments of the upper and lower panels.
      8. Line 338, "bcp1 mutants show reduced homologous recombination rates (Fan et al., 2022; Vladejić et al., 2022; Yu et al., 2023)". The bcp1 mutant was not reported in Fan et al. paper.
      9. Line 40, please add a comma after "In animal". Line 331, please add a comma after "In mammals".
      10. Line 123, "only BRCA1 and BARD1 were described in plant lineage". Additional BRCT proteins were described in plants, including XIP1 (Nat. Commun. 13:7942), BCP1/DDRM2 (New Phytol. 238:1073-1084; Front. Plant Sci. 13:1023358), and DDRM1 (PNAS, 119: e2202970119).

      Significance

      This study identified BCP4 as a functional counterpart of MDC1, which filled the gap of plant DDR signaling.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, it is shown that cofilin severs actin filaments slowly when fascin is present. Authors show that this is due to slower cluster nucleation of cofilin on fascin-induced actin bundles. Interestingly, the authors show that cofilin binding promotes helicity in actin filament bundles which in turn promotes fascin exclusion and more cofilin clustering in adjacent filament bundles; thus, inducing local transmission of structural changes.

      The authors use an elegant approach, and the data is nicely presented. Overall, I

      consider that this manuscript is in good shape to be published. It might benefit from language editing, though.

      We thank the reviewer for their positive comments. We have edited the manuscript to improve its readability (changes are in blue in the manuscript).

      Reviewer #1 (Significance (Required)):

      According to me the significance of this manuscript is that elegantly shows the molecular details of the cofilin severing effect of fascin-induced actin filament bundles. The authors show that cofilin binding promotes helicity in actin filament bundles which in turn promotes fascin exclusion and more cofilin clustering in adjacent filament bundles; thus, inducing local transmission of structural changes.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this study, Chikireddy et al. perform a series of experiments in which they compare the efficiency of cofilin-mediated severing and actin filament disassembly on individual filaments versus bundles of different sizes from by the actin-bundling protein fascin. The key outcome, quite distinct from previously published conclusions by the authors themselves and other authors, is that fascin bundling actually reduces cofilin-mediated severing mostly because of much slower "nucleation" of cofilin clusters on fascin-bound filament bundles. Cofilin cluster formation is followed by local fascin removal, and the nucleation of a cofilin cluster on an adjacent bundle in the absence of fascin is strongly enhanced. The reason for the latter surprising observation is not entirely clear, but proposed to arise from cofilin-mediated changes in filament helicity of neighboring filaments. To my understanding, the main reason why fascin protects from cofilin severing here rather than enhancing it (as reported previously) is due to the lack of constraining of the induced, cofilin-mediated twist, because if this twist is constrained e.g. by anchoring of the bundles to the surface chamber, then severing by cofilin is accelerated.

      We thank the reviewer for their positive feedback on the manuscript. We have substantially edited the manuscript in light of the insightful comments of the reviewer (changes are in blue).

      Major comments:

      I think the study is very well done, most experiments are super-elegant and controlled; I really don't have any objections against the conclusions drawn, as most of what I have seen is totally justified and reasonable. So from a scientific point of view, I can easily agree with all the major conclusions drawn, and so in my view, this should be published fast.

      Minor comments:

      There are two minor points that could be addressed:

      1) I am not entirely convinced by the conclusions drawn from the EM images shown in Figure 6A, and in particular by the filaments in two-filament bundles locally twisting around each other (without breaking) at spatial sites lacking fascin and decorated by cofilin. This is hard to imagine for me, and the evidence for something like this happening is not very strong, as in the EM, only larger bundles could be observed. In addition, I am not sure that the braiding of filaments seen in the presence of cofilin is really occurring just locally on cofilin-decorated bundle segments and thus indeed coincides with loss of fascin as proposed in the scheme in Fig. 6B.

      Can the authors exclude that the braiding is not caused by some experimental artefact, as induced perhaps by sample preparation for negative staining?

      We thank the reviewer for raising this point. We have repeated the negative staining EM experiments several times and now show new images and quantification (new Supp Fig. 13). In our new series of experiments, the braiding that was previously shown in Fig. 6 proved difficult to reproduce and to quantify. We therefore decided to remove EM observations from the main Fig. 6, and we no longer present them as evidence supporting the mechanism that we propose for inter-filament cooperativity.

      From EM images, we now quantify the frequency of fragmentation of large actin filament bundles. We observed that bundles often terminate with the ends of their filaments in close proximity, consistent with sharp breaks due to co-localized cofilin clusters.

      We have rewritten this part of the result section in the manuscript which now reads : ‘To further investigate larger bundles, we imaged them using negative staining electron microscopy. In the absence of cofilin, filaments in bundles are arranged in a parallel manner, as previously reported in vitro (Jansen et al, 2011). Compared with the control, filament bundles exposed to cofilin show numerous sharp breaks (65 breaks per 122 µm of bundles, versus 4 breaks per 68 µm in the control. Supp. Fig. 13). This is consistent with bundle fragmentation occurring at boundaries of co-localized cofilin clusters.’

      Did the authors quantify the occurrence of such braided bundle segments with and without cofilin?

      How large are these braided segments on average when you quantify them? Would you also see them if you prepared the bundles for an alternative EM-technique, such as Cryo-EM, for instance?

      As mentioned in the answer to the previous point, the braided segments proved difficult to reproduce and quantify, and we have removed EM experiments from the main figure 6. Instead of the braided segments, we now quantify the severing of the bundles, and the distribution of filament ends at the extremities of the bundles (new Supp. Fig. 13).

      We have not tried Cryo-EM due to limited access to such experimental tools within the timeframe of the study.

      This may admittedly all be experimentally challenging, but would it be possible to combine the negative staining of filaments with staining for cofilin and/or fascin using immunogold technology, to prove that the braided segments do indeed correlate with high cofilin and low fascin concentrations? In the absence of such data, and in particular in the absence of a clear quantification, the proposal is too strong in my view. Finally, it would be nice (albeit not essential I guess) to also look at two-filament bundles. The authors stated these can not be easily generated due to the tendency of fascin to promote the formation of larger bundles, but can this not be titrated/tuned somehow by lowering fascin concentrations, to come closer in reality to what is proposed to occur in the scheme in Figure 6B? In any case, the way the data are presented right now appears to constitute a pretty large gap between experimental evidence and theoretical model.

      We agree with the reviewer that EM observations are limited and, alone, do not provide strong evidence in favor of braiding/super-twisting being the mechanism responsible for inter-filament cooperativity (please see our answers to the points above). We have performed negative staining EM assays at higher cofilin-1 concentration (500 nM) compared to microfluidics assays, in order for cofilin to quickly bind to filaments, even in large bundles, so that our chances to capture bundles targeted by cofilin would be high.

      Nevertheless, both microfluidics and EM observations point in the same direction : bundle fragmentation by cofilin is caused by the co-localized cooperative nucleation of cofilin clusters.

      2) I think that the proposal of cofilin-decorated filaments to "transfer" the resulting cofilin-induced changes in filament helicity onto neighboring filaments in the bundle, which is proposed to occur locally and in the absence of fascin is a bit vague, and difficult to understand mechanistically. Can the authors speculate, at least, how they think this would occur? Are there no alternative possibilities for explaining obtained results? Maybe I am missing something here, but with considering cofilin to be monomeric and only harboring one actin-binding site, this proposal of helicity transfer onto neighboring filaments seems inconclusive.

      On single actin filaments, the change of helicity induced by cofilin binding has been observed by many groups using EM and cryoEM (e.g. McGough et al, JBC 1997 10.1083/jcb.138.4.771; Egelman et al, PNAS 2011 10.1073/pnas.1110109108 ; Huehn et al, JBC 2018 10.1074/jbc.AC118.001843). These studies have revealed that actin subunits get ‘tilted’ relative to their original orientation along the filament long axis. This leads to the shortening of the helical pitch for cofilin-saturated actin filament segments.

      In our assays, the progressive binding of cofilin along a single filament creates a cluster where all actin subunits are tilted and the helical pitch of the filaments within the cluster is shortened (from a half pitch of 36 nm down to 27 nm). This change of helicity in a cluster induces the rotation of one end of the filament relative to the other (as we have shown previously in Wioland et al, PNAS 2019). Therefore, if two parallel filaments are stapled together, the local twisting of one filament causes the twisting of the other in the overlapping region.

      We have rephrased this point to more clearly explain this in the last paragraph of the results section:

      “From our kinetic analysis, we propose the following model that recapitulates the binding of cofilin to fascin-induced 2-filament bundles (Fig. 6D). Initially, actin filaments in fascin-induced bundles are in conformations that are less favorable for cofilin binding than isolated actin filaments. Once a cofilin cluster has nucleated, its expansion locally triggers fascin unbinding and prevents it from rebinding. The increase of filament helicity induced by cofilin causes a local twisting of the entire bundle, thereby changing the helicity of the adjacent filament in the fascin-free region facing the cofilin cluster. In this region, the increase in filament helicity enhances cofilin affinity, and thus locally promotes the nucleation of a cofilin cluster (inter-filament cooperativity).”

      We have tried to think of other alternative scenarios that might explain our observations, but none appeared to be valid.

      Reviewer #2 (Significance (Required)):

      General assessment:

      The strength of this study is that owing, at least in part, to the microfluidics devices employed and the careful biochemistry, the experimental setups are super-controlled and clean, and they are used in a highly innovative and elegant fashion. The simulations are also nice! A limitation is that it is not entirely clear how precisely the main observations can be translated to what's happening in vivo. The results are largely dependent on the bundles not being constrained I understand, so to what extent would bundles be unconstrained in vivo? Perhaps this is not so important, because the experimental setup allows the authors to dissect specific biochemical behaviors and inter-dependencies between distinct actin binding proteins, but the latter view (if correct) could be stated more clearly!

      We thank the reviewer for their remarks. We have updated the part where we discuss the biological implications of our in vitro observations to better explain how the twist-constraints expected for fascin bundles in cells would accelerate cofilin bundle disassembly.

      Advance:

      As stated above, the results are opposite to the proposed synergistic activities of fascin and cofilin observed for bundles previously, perhaps because they were not constrained. So although touched in part and in a very polite fashion in the discussion, the authors could specify more clearly what the differences between the studies are, and which of the distinct activities observed either here or in previous literature will be dominant or more relevant to consider in the future? This will be hard to discern as is now, in particular for non-experts.

      We agree with the reviewer that the manuscript will benefit from discussing more in depth the plausible reasons why our experimental observations are in disagreement with the earlier interpretation by Breitsprecher and colleagues. We have extended our discussion on this point, which now reads: “Previously, using pyrene-actin bulk experiments, Breitsprecher and colleagues observed a diminished cofilin binding to fascin-induced filament bundles (Breitsprecher et al, 2011). In spite of this, their observation of fluorescently labeled actin filament bundles seemed to indicate an efficient severing activity. Since cofilin was not fluorescently labeled, they could not observe cofilin clusters, and they proposed that severing was enhanced because fascin served as anchors along filaments and impeded cofilin-induced changes in filament helicity”

      Audience:

      This manuscript will be most influential for a specialized audience interested in the complexities of biochemical activities of specific actin binding proteins when looking at them in combination. Although specialized, this is still a quite relevant audience though, since prominent actin binding proteins like cofilin are highly important in virtually any cell type and various actin structures, hence of broad relevance again in this respect.

      Expertise:

      I am a cell biologist and geneticist interested in actin dynamics and actin-based, motile processes.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      My only major concern that is that although the authors provide data that strongly supports interfilament cooperativity in two filament bundles for cofilin binding, the evidence to support that this induces filament twist on the opposing filament is not strong enough to conclusively establish this as the mechanism for the observed interfilament cooperativity. This is stated as such in the results section as a proposed model, but stated with more certainty than the presented data supports in the discussion. It might be better, based on the data presented, to state this as one possible mechanism for the observed cooperativity.

      We thank the reviewer for their remark. We have edited our discussion section to clearly say that inter-filament cooperativity arises from cofilin-induced filament twisting is a proposed model that would best account for what we observed: “Indeed, we report here the exclusion of fascin from within cofilin clusters, and a strong increase in the nucleation of cofilin clusters on adjacent filaments. This inter-filament cooperativity mechanism leads to the co-localized nucleation of cofilin clusters, and permits bundle fragmentation faster than if the nucleation of cofilin clusters on adjacent filaments were purely random. To our knowledge, this is the first time such inter-filament cooperativity is ever reported. To explain this mechanism, we propose that the cofilin-induced change of helicity produced locally on one filament can be transmitted to the adjacent filaments within the bundle (Fig. 6D).”

      So far, we have been unable to propose alternative mechanisms that could explain our observations in light of what is known for cofilin at the single filament level (a similar point was raised by reviewer #2, please see above).

      Areas within the paper, if addressed, will improve the arguments presented as well as the readability of the paper.

      (1) The authors use both the terms cofilin binding (in section I of the results) as well as cofilin nucleation (in section III of the results). It is unclear if these terms are meant to indicate the same, or different, processes. The manuscript would benefit from a clear explanation of the steps of cofilin-mediated disassembly measured and quantified in the experiments, namely nucleation (or binding), cluster growth, and filament or bundle fragmentation. A clear description of these steps would also allow the reader to follow the logic of the experiments from Figure 3 to Figure 5.

      We have edited the introduction to better describe the different steps of cofilin activity, and to remove any ambiguity whereas we are referring to cofilin binding or cofilin nucleation.

      2) Throughout the paper, the authors move from single filaments, to 2-filament bundles, to multifilament bundles, using different concentrations of fascin and cofilin. Given the biphasic behavior of cofilin, namely that low concentrations favor severing and high concentrations can favor coating and filament stabilization, I think it is important that concentrations for the components are consistent across experiments, and if changes of concentrations of important components (such as cofilin and fascin) are changed, a clear explanation as to why is included.

      As explained in the beginning of the result section, most of our experiments and quantification of cofilin activity using the microfluidics assay were done using 200 nM fascin and 200 nM cofilin as a standard. This is the case, in particular, for all the data shown in Fig 2, 3 and 4, where we compare the behavior of single filaments, 2-filament bundles, and larger bundles, exposed to the same protein concentrations.

      We have also explored higher fascin and cofilin concentrations to document their respective impact, always mentioning any change in concentration. We agree with the reviewer that cofilin activity is biphasic at the single filament level (in the range of 0 to 1 µM for mammalian ADF/cofilin, at physiological pH 7.4). In the case of fascin-induced bundles (already for two-filament bundles), filament saturation by cofilin, and thus their stabilization, will occur at higher cofilin concentration. This is mainly due to the lower nucleation activity of cofilin on fascin-induced bundles, preventing the nucleation of numerous cofilin clusters that will eventually fuse together, thus preventing saturation of filament bundles by cofilin before bundle fragmentation.

      (3) In Figure 2, it is mentioned that for the spectrin seeds with the microfluidics, the filaments consisting of larger bundles were not analyzed along with the single filament and 2-filament bundles. Instead, a different experiment with seeds attached to beads is used to assess larger filament bundles. Why were larger bundles not analyzed in the microfluidic experiment?

      We appreciate the insightful observation by the reviewer. When elongating actin filaments from spectrin-actin seeds, the seeds are randomly located on the glass coverslip of the microfluidics chamber. Upon exposure to fascin, only a subsection of any filament will be in contact with one or multiple filaments, ultimately forming a bundle due to the presence of fascin. In the case of high filament densities leading to large bundles, it is very difficult to identify the exact subsection of each filament which is engaged in a bundle or not. Despite our attempts to image individual filaments before and after exposure to fascin for enhanced clarity, the inherent difficulty persisted.

      This limitation hindered our ability to quantify cofilin activity on large bundles when using spectrin-actin seeds randomly distributed on glass. To address this, we opted for an alternative approach involving micron-sized beads coated with spectrin-actin seeds. This modification not only circumvents the aforementioned limitation but also aids in the formation of larger bundles (up to 10 filaments per bundle). This adjustment significantly enhances our ability to study and quantify cofilin activity on larger bundles, contributing to a more robust and comprehensive understanding of cofilin activity on bundles.

      And conversely, why were 2 filament bundles not assessed with the beads? Comparing the findings on two filament bundles with the findings on multifilament bundles would be easier for the reader if the small and large bundles were evaluated in the same experiments. If this is not experimentally feasible, the authors need to provide clearer explanation as to why this analysis is not included.

      Actually, we did assess 2-filament bundles in the bead assay. The cofilin activity on 2-filament bundles from beads are reported, along with larger bundles, in figure 3E-F for nucleation, and in figure 4C for cofilin cluster growth rates.

      (4) The authors indicate that at increased fascin concentration (1uM) that single filaments decrease the nucleation rate of cofilin clusters. The authors should comment on the mechanism for fascin (at 1uM concentration) for affecting cofilin binding.

      We thank the review for this comment. We now comment on this mechanism in the result section:

      “This observation is consistent with the low affinity of fascin for the side of single actin filaments. Furthermore, this indicates that cofilin and fascin may have overlapping binding sites, or that a more complex competition may exist between the two proteins, where the binding of one protein would induce conformational changes on neighboring actin subunits affecting the binding of the other protein.”

      (5) The authors should determine and include the dissociation rate for the labeled cofilin used in this study, especially given the proposed mechanism for cofilin excluding fascin within the bundles.

      • If the reviewer means that we need to characterize the behavior of the labeled cofilin: in Wioland et al 2017, we have previously reported that cofilin dissociates slowly from cluster boundaries (at 0.7 s-1 for cofilin-1 on alpha-skeletal rabbit actin, as used in the present study) and extremely slowly from inside a cofilin cluster (~2.10-5 s-1).

      • If the reviewer means that we should investigate the competition between fascin and cofilin along bundles: we agree that this is indeed an interesting question. However this is quite complex because many unknown parameters are involved. In addition to the on/off-rates of each protein and how it is affected by the presence or the proximity of the other protein, we need to consider that fascin has fewer binding sites than cofilin, and that their accessibility changes as the helicity of the filament evolves as cofilin binds. Investigating this question would require many experiments, which we would need to confront to a model. We believe that this is out of the scope of this manuscript.

      (6) For Figure 4, D and E, what do the dynamics of fascin and cofilin signal look like on a larger filament bundle? It would be informative to provide the cofilin cluster nucleation rate on larger filament bundles with a range of fascin concentrations (as in 3D for a two filament bundle).

      It would be interesting indeed to investigate the dynamics of fascin and cofilin on larger bundles. However, this experiment is quite challenging due to the fluorescence background of fluorescently-labeled fascin in our microfluidics assay (regardless of bundle size). We have been unable to perform this assay with success on large bundles. Moreover, it is difficult for us to carry out more of these experiments now that the first author of the study has left the lab.

      However, based on our results, we would expect that, for large bundles, increasing fascin concentration would also have a limited impact on the reduction of cofilin nucleation. Indeed, for 2-filament bundles, we can note that the increase of fascin concentration has a more limited impact on the nucleation of cofilin clusters (fig. 3D, roughly ~2 fold decrease for fascin from 100 to 500 nM), than the number of filaments per bundle (fig. 3F, a 10-fold decrease when increasing the size of a bundle from 2 to 10 filaments).

      (7) Additionally, it would be useful to report the cofilin severing rate at a range of cofilin concentrations, at least for the 2 filament bundles.

      Cofilin severing rate is not dependent on cofilin concentration in solution. This has been reported previously by several groups, including ours (e.g. Suarez et al, Current Biology 2011 ; Gressin et al, Current Biology 2015; Wioland et al, Current Biology 2017).

      Below is the comparison of cofilin cluster severing at 100 and 200 nM cofilin, on single actin filaments, which we added to supplementary figure 10.

      At 100 nM cofilin, we measured a similar cofilin cluster severing rate on 2-filament bundles, by measuring the survival fraction of overlapping cofilin clusters that lead to 2-filament bundle fragmentation over time. The figure pasted below is new Supp. Fig. 11.

      When the severing occurs in the two filament bundles, does this severing occur mostly at boundaries with cofilin-actin and bare actin or does this severing occur at cofilin-actin/fascin-actin boundaries?

      This is an interesting point. In the presence of a saturating amount of fascin, on 2-filament bundles, one fascin protein is bound every 13 actin subunits along each filament of a bundle. Most of the time, a cofilin boundary will not be in contact with a fasin-bound actin subunit. The limited spatial resolution of optical microscopy does not allow to say whether fascin was present at the boundary of a cofilin cluster or not when severing occurred. Nonetheless, we show that cofilin cluster severing is unaffected by fascin-bundling (i.e. severing rates per cofilin cluster boundary are similar on single filaments and on 2-filament bundles). Overall, bundling by fascin probably does not change the way cofilin severs, i.e. it occurs at the boundary between cofilin-decorated and bare actin regions.

      (8) For the images of large bundles appearing braided in figure 6A, the lower left panel the braided appearance is not obvious. Additionally, what is the number of filaments in the bundles shown? Finally, given that in Figure 3F it is indicated that cofilin cluster nucleation events are rare on large bundles, and the cluster growth rate is reduced on large bundles (Figure 4C), the authors need to indicate how frequently this braided appearance is observed as well as what the nucleation rate, growth rate and severing rate is for 500nM cofilin on bundles.

      We have repeated the negative staining EM experiments several times and now show new images and quantification (new Supp. Fig. 13). In our new series of experiments, the braiding that was previously shown in Fig. 6 proved difficult to reproduce and to quantify. We therefore decided to remove EM observations from the main fig 6, and we no longer present them as evidence supporting the mechanism that we propose for inter-filament cooperativity.

      As stated in point (7) above, the severing rate is independent of cofilin concentration. We’ve used 500 nM cofilin, which is a rather high cofilin concentration, to investigate bundle fragmentation in EM, as in solution we mostly form large bundles and they are more slowly targeted by cofilin than individual or 2-filament bundles (figure 3F & 4C). At the single filament and 2-filament bundle level, the nucleation of cofilin clusters is extremely fast at 500 nM cofilin (> 10-4 s-1 per binding site).

      (9) The authors indicate that the rapid fragmentation of twist constrained 2-filament bundles prevented them from directly quantifying the nucleation rate of the subsequent cofilin clusters that overlapped the initial ones. I'm unclear why this is the case, and if this is the case, I don't understand how the authors can be sure that a second nucleation event occurred in the twist constrained bundles. From the experimental data in 7C, it appears that the fragmentation rate for two filament bundles is similar to the fragmentation rate for twist constrained single filaments. The authors need to clearly state what they were able to observe and quantify as well as include the timing for this severing. If the authors could not observe a second nucleation event prior to severing, this should be clearly stated.

      Fragmentation of a 2-filament bundle requires the severing of two co-localized cofilin clusters, one on each filament. When 2-filament bundles are twist-constrained the sequence of events leading to bundle fragmentation is fast, thus it is difficult to separate the events within the resolution of our experiment. In this case, cofilin clusters sever quickly, thus the size of the clusters is small, which translates into a low fluorescence intensity. Therefore, the quantification of the increase of cofilin fluorescence intensity along a bundle did not allow us to unambiguously identify the ‘cooperative’ nucleation of two-overlapping cofilin clusters before the bundle is fragmented. So, apart from the quantification of the nucleation of cofilin clusters, which we show is unaffected by twist-constraining the bundles, we were unable to measure the growth rate nor the severing rate of cofilin clusters.

      Numerical simulations, using similar severing rates for cofilin clusters on both twist-constrained single filaments and 2-filament bundles, satisfactorily reproduce our experimental observations (dashed lines in Fig. 3C).

      We have edited the ‘Twist-constrained bundle fragmentation’ section to clearly say what we measured and what could not be measured : “We observed that the nucleation rate of cofilin clusters was similar for both twist-constrained and twist-unconstrained fascin bundles (Supp. Fig. 15), in agreement with observations on single actin filaments (Wioland et al, 2019b).

      The rapid fragmentation of twist-constrained 2-filament bundles prevented us from directly quantifying the nucleation rate of the subsequent cofilin clusters that overlapped with the initial ones, as well as cluster growth and severing rates.”

      This could be due to the rapid fragmentation, but it could also be due to severing occurring in the absence of a second cofilin nucleation event. It would be informative to compare the time from cofilin nucleation to severing event for two filament bundles in twist constrained and unconstrained. Clarification of the dynamics of nucleation and spreading of cofilin and the timing of fragmentation of the twist constrained filament bundles is needed.

      As explained in the previous point, cofilin-induced severing occurs significantly faster on twist-constrained single actin filaments compared to unconstrained filaments.

      For twist-unconstrained filament bundles, we never observed bundle fragmentation that originated from only one cofilin cluster. For twist-constrained bundles, while our observation is limited by the rapid fragmentation of the bundles, it is hard to imagine that a single cofilin cluster on one filament would induce the fragmentation of the neighboring filament. Recently, Bibeau et al, PNAS 2023, using magnetic tweezers to twist single actin filaments, showed that, without cofilin, applying up to 1 rotation/µm to an actin filament does not cause its fragmentation. It is thus reasonable to say that cofilin binding is required to fragment twist-constrained filaments.

      Moreover, in our numerical simulations (without inter-filament cooperativity, faithfully reproducing the kinetic of 2-filament fragmentation observed in microfluidics), 75% of bundle fragmentation resulted from a sequential nucleation of cofilin clusters, with the nucleation of the second cofilin cluster occurring after the first cofilin cluster has already severed one filament of the bundle.

      (10) Discussion of how twist constrained fragmentation dynamics might affect the dynamics of larger bundles in structures such as filopodia would be useful.

      We had substantially edited the discussion section of the manuscript, attempting to better discuss the physiological implications of our in vitro observations (bundle size & twist-constraints).

      Minor changes that would improve the paper:

      (11) In Figure 1C, Figure 2B and Figure 2E, the indication, on the graph, of the fold-change between the rates is confusing as it is not clear from the labeling on the graph that the x15 is referring to the slope of the lines, keeping this information in the legend is appropriate, but if it is to be included on the graph, perhaps adding in the linear fit on the graph is also needed.

      We have edited the figures accordingly, and included fit lines in figure 1.

      (12) Figure 7A, lining up the diagram with the kymographs below would help improve interpretation of the diagram and simulation. Alternatively, if the diagram (upper) in A does not diagram the kymographs below, this needs to be clearly stated, and it would be preferable that the diagram above matches the kymographs below.

      We have edited the figure layout accordingly.

      (13) Despite referencing the Breitsprecher, 2011 paper in the introduction, the authors do not explain how their results showing that cofilin fragments filament bundles slower than single actin filaments correspond with the Breitsprecher findings that fascin bundles favors cofilin filament severing. While the authors do not need to explain the Breitsprecher data, if they reference these findings that run counter to their results, an explanation for the discrepancy would be reasonable to include in the discussion.

      We agree with the reviewer comments, which was also a comment made by reviewer #2. We now more directly discuss possible discrepancies between Breitsprecher and our studies : “Previously, using pyrene-actin bulk experiments, Breitsprecher and colleagues reported a diminished cofilin binding to fascin-induced filament bundles (Breitsprecher et al, 2011). In spite of this, their observation of fluorescently labeled actin filament bundles seemed to indicate an efficient severing activity. Since cofilin was not fluorescently labeled, they could not observe cofilin clusters, and they proposed that severing was enhanced because fascin served as anchors along filaments and impeded cofilin-induced changes in filament helicity. This proposed mechanism bears resemblance to our previously reported findings for artificially twist-constrained single actin filaments (Wioland et al, 2019b). Here, we show that this mechanism does not occur in fascin-induced bundles.”

      Reviewer #3 (Significance (Required)):

      The research presented in "Fascin-induced bundling protects actin filament from disassembly by cofilin" is relevant and of interest to the field as it directly addresses a limitation in our understanding of how cofilin-induced severing occurs in F-actin bundles. Bundled F-actin may constitute the majority of linear F-actin within the cell and is specifically important in F-actin-based structures such as filopodia and stress-fibers. The data supports a model for interfilament cooperativity that provides a molecular mechanism for cofilin-mediated severing of fascin-bundled filaments.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Planned Revisions based on comments from Reviewer #1

      • The introductory material and the title of the paper emphasize the ring canal scaling question. This problem is somewhat obscured in the text by the side problem of nuclear scaling, which comes up frequently even though the results are not as thoroughly explored. Could the authors think about moving these data into a different, single figure for the sake of coherence? This is not a required revision. Just a thought.
      • *We have moved the nuclear scaling data from Fig. 5 into Fig. S3, and once we have analyzed the data from the planned experiments (over-expressing either HtsRC or the active form of myosin), then we will have a better idea of whether we should move the rest of the nuclear scaling data out of the main part of the paper, consolidate it into a single figure (as Reviewer #1 suggests), or keep some of it in the main figures. *

      Planned Revisions based on comments from Reviewer #3

      • I cannot see differences in RC size in the panel A images. More importantly, this method altering ring canal size is limited. A more direct way is overexpression of HtsRC (https://doi.org/10.1534/genetics.120.303629).
      • We have requested and just recently received the line to over-express HtsRC in the germline. We plan to cross this UAS line to the mataTub-GAL4 which expresses GAL4 beginning around stage 3 of oogenesis. Because crossing this UAS line with this GAL4 line produced egg chambers with larger ring canals in the original study2*, we do not anticipate any technical issues with this experiment. We will incorporate the results from analysis of these egg chambers in the revised manuscript. *
      • To further explore the effect of ring canal size on scaling, we will also be testing a condition that we hope will have the opposite effect on ring canal size; expression of a phosphomimetic version of the non-muscle myosin II regulatory light chain, encoded by spaghetti squash (Sqh)(UAS-sqhE20E21). We plan to cross this UAS line to two different GAL4 drivers (nos-GAL4, which expresses GAL4 in a pulse during early oogenesis and then in another pulse in mid-oogenesis and the mataTub-GAL4 which expresses GAL4 beginning around stage 3 of oogenesis). We know that expression of sqhE20E21will reduce the size of the ring canals that connect the nurse cells to each other, but it is possible that the posterior ring canals will not show a strong phenotype. In a study that looked at egg chambers homozygous for a mutation in the myosin binding subunit of the myosin phosphatase, DMYPT, which should also increase sqh phosphorylation, it was shown that the posterior ring canals were larger than those connecting nurse cells 1*. Therefore, it is possible that this condition may not allow us to consistently reduce the size of all ring canal types; however, if we do see a significant reduction in posterior ring canal size in these egg chambers, we will include these data in the revised manuscript. *

      • In panel 2E, it would be helpful to plot the y-intercepts separately, too.

      • Based on the analysis of the data from the proposed experiments, we will consider plotting the y-intercepts separately for the various conditions.

      1. Description of the revisions that have already been incorporated in the transferred manuscript

      Revisions made based on comments from Reviewer #1

      • One way to think about the dhc-64C experiments presented in Figure 2 is that they are meant to test the hypothesis that ring canal size impacts scaling in such a way that transport across the four ring canals tends towards equilibrium over time. One possibility would therefore be that ring canals aren't programmed to grow to a particular final size but rather they grow at different rates until their diameters are the same. This seems to me an important distinction. It might be made by analysis of the arpC2-RNAi cells, since those ring canals are meant to be initially larger. Unfortunately, I can't see the answer.
      • *Reviewer #3 suggested determining the ratio of the diameter of the M1 ring canal to the M4 ring canal. If ring canals grow toward equilibrium (to achieve a similar final size), then we would expect to see this ratio approach 1; when we performed this analysis, we saw that the ratio did decrease as the egg chambers increased in volume, but it never quite reaches a ratio of 1. We have added a supplemental figure (Fig. S1) showing these data and incorporated this idea into the text within the results and discussion sections. *
      • *Although it would be informative to determine whether ring canals that all started with a similar diameter would grow at the same rate, we have not found a condition that would provide the opportunity to test this hypothesis. We hope that the planned experiments will provide us with a way to test this hypothesis; we will determine the M1/M4 ratio in egg chambers over-expressing either HtsRC or sqhE20E21 and see whether this ratio still decreases as egg chamber volume increases. *
      • *Once we perform the planned experiments to either increase or decrease ring canal size, then we can determine whether we need to further modify Fig. 3 to highlight these size differences between ring canals in the arpC2-RNAi egg chambers or whether we will instead focus more on the results of the planned experiments. *

      • The authors write that arpC2-RNAi "ring canals tended to be larger than those in similarly-sized control egg chambers," but that conclusion isn't obvious to me from the data in Figure 3B. The only difference I can see is that the M4 ring canals look to be consistently smaller in the experimental versus control egg chambers, especially at the final timepoint.

      • *To further clarify the difference in ring canal size between the control and the arpC2-RNAi egg chambers, we have added additional explanation to the results section to highlight that the y-intercepts of the lines of best fit are significantly higher in the arpC2-RNAi egg chambers at each stage. This demonstrates that given an egg chamber volume, the ring canals will be larger in egg chambers depleted of ArpC2 than in the controls. *

      • The authors write that "there was a consistent, but not significant decrease in the scaling exponents for the arpC2-RNAi egg chambers compared to controls," but I don't see this in the M1 (identical) or M2 (almost the same) ring canals. The scaling decrease is most pronounced at M4. All the other ring canals seem to reach a final size that's equivalent to controls. What does this tell us about scaling? Is the M4 more sensitive to the effect of arpC2-RNAi? I note and appreciate that the data for M4 show a wide distribution and might have been impacted by outliers, which could be discussed.

      • *We have separated the arpC2-RNAI ring canal scaling data by lineage (Fig. S2), and we have color-coded the data in Fig. 3B (as suggested by Reviewer #3). *
      • We have expanded the discussion of these results and their implications, and we have added a line in the results section to address this wide distribution of the M4 ring canal sizes.

      • The possibility that ring canal scaling "could generate eggs of different sizes" could use some elaboration (at least) as it does not seem to be especially well supported:

      • Only one of the small egg lines had lower scaling exponents than the big egg lines, and it's a struggle for me to understand the extent of that difference based on the data shown. (Is it significant?).
      • *We have restructured this section of the results and modified Fig. 5 to highlight similarities and differences between the four lines. In the results section (and in the figure legend), we have stated that when we compared the slopes of the regression lines for all four lines, there was a significant difference for M1, M2, and M4 (Fig. 5C, D, and F). We have also modified the results section to highlight that although the slopes for line 9.31.4 was not different from the two big egg lines, the intercepts were significantly different for M1, M2, M3, and M4 ring canals. We moved the nuclear scaling data to Fig. S3 to simplify the figure. *

      • The authors conclude that "the effect of lineage on ring canal scaling is conserved, and it suggests that at least in one line, reducing posterior ring canal scaling could provide a mechanism to produce a smaller mature egg." The first part of this sentence is confusing for me since I don't know what is meant here by "conserved." The second part of the sentence is technically correct but disguises what I would consider the more meaningful and exciting finding. The 9.31.4 line produces the smallest eggs but does not demonstrate scaling differences in comparison to the big egg lines examined (1.40.1 and 3.34.1). The authors have therefore avoided/solved a "chicken and egg" ("fruit fly and egg"?) problem by showing that scaling and egg size can be decoupled!

      • We have modified the first part of the sentence to clarify our point. We appreciate this suggestion and have modified the text in the results section to further elaborate on the results.

      • This point is not made very clearly in the discussion, which concludes with the suggestion that scaling could help explain why some insects produce much larger or much smaller eggs that fruit flies. I can only understand this to be the case if - as the authors point out - scaling "affect the directed transfer of materials into the oocyte." That argument seems predicated on the possibility that these insects make the same amount of initial material then regulate how much is transferred. Seems like a costly way to go about it.

      • *We have modified this section of the discussion. *

      • I really had to look very closely to distinguish the little blue boxes from the little blue circles in panels 2C and especially 2D. I suggest using a different color instead of a different shape, or maybe splitting the graphs up.

      • *We have made the shapes larger in Fig. 2C (nuclear sizes), and we have split the ring canal size data into Fig. 2D, E and made the shapes larger. The legend has been modified to reflect this change. *

      • "Depletion of the linker protein, Short stop (Shot), or dynein heavy chain (Dhc64C), significantly reduced the biased transport at the posterior, which reduced oocyte size (Lu et al., 2021)." I suggest this sentence might be clearer if it was rewritten as "Depletion of either dynein heavy chain (Dhc64C) or the linker protein Short stop (Shot) significantly reduces biased transport at the posterior, in turn reducing oocyte size (Lu et al., 2021).

      • We have made this change.

      • "Because nuclear growth has been shown to be tightly coupled to cell growth (Diegmiller et al., 2021), we can use nuclear size as a proxy for nurse cell size." I think it would help the reader to know that the Diegmiller study was performed using germline cysts in the Drosophila ovary; I paused when I got to this sentence because I initially read it as overly broad. I suggest "Recent work in demonstrates that nuclear growth is tightly coupled to cell growth in this system (Diegmiller et al., 2021), and we can therefore use nuclear size as a proxy for nurse cell size" or similar. This is certainly not a required revision, just a suggestion.

      • We have made this change.

      Planned Revisions based on comments from Reviewer #3

      • Reviewer #3 asked: Does the ratio of the diameter of M1 to M4 stay the same?
      • *We have performed this analysis in the control egg chambers (from Fig. 1), and we found that the ratio does not stay the same, but that it tends to decrease as the egg chamber increases in volume. We plotted the log of egg chamber volume versus this ratio, and the equation for the regression line was y = -0.166x + 2.32, which was significantly different from a slope of 0 (included in Fig. S1). *

      • It would be helpful to explain that the log-log plots were used to derive a line equation (y=mx + b) and why that is useful in this context. In the case of a log-log plot, what does the y-intercept mean biologically? Is it simply a way to compare two things or does it indicate real measurements such as volume or ring canal size? Also, the slope of the line is being used as a scaling value. Be careful to define the terms "scaling" and "scaling exponent".

      • We have added additional explanation in the results section.

      • Are four significant digits called for in calculating slope? The figure has 4 significant digits, the text has three.

      • *We have modified all figures and text to include only 3 significant digits. *

      • Why is isometric scaling 0.66 - is that microns squared over microns cubed? Please explain.

      • We have added additional explanation to the results section.

      • Were all four posterior nuclei measured? The figure indicates just M1 and M4.

      • We apologize that it was not clear that all four posterior nuclei were measured in Fig. 1. For the sake of space, we only showed images of the M1, M4, and Anterior ring canals and nuclei (in Fig. 1A), but all four nuclear measurements were included in the graph in Fig. 1B. We have added M1-M4 to the legend to clarify and revised the text of the legend.

      • It is hard to explain why all four posterior nuclei are bigger than anterior when one of the four is the same age as the anterior nucleus.

      • The posterior nuclei are larger than the anterior nuclei due to their proximity to the oocyte. Multiple recent studies have described this hierarchical nurse cell size relationship in which the nurse cells closest to the oocyte are larger than those separated from the oocyte by additional intercellular bridges 3–5*. *

      • In panel D, a conclusion is, "Further, the scaling exponent [slope] for the anterior ring canals, which are also formed during the fourth mitotic division, was not significantly different from that of the posterior M4 ring canals". Anterior is 0.23, M4 is 0.25. These seem different to me. How is significance determined? Were any of the scaling exponents in M1, M2, M3, M4 or Anterior significantly different?

      • *Significance was determined within the Prism software using a method equivalent to an ANCOVA. If the slopes are compared, M1 is significantly different from M2, M3, and M4, and M2 is significantly different from M4. M4 is not significantly different from the slope for the anterior ring canals, which supports the correlation between scaling and lineage. *

      • References are needed for the statements about biased transport to the oocyte.

      • *There was a reference to the Lu (2021) paper in that paragraph, but we have added an additional reference to that paper to this part of the results section. *

      • In panel 2C, why are the scaling exponents (slopes) of the controls bigger than in Figure 1B? The controls look hyper allometric in Fig. 2.

      • *This experiment was done with a different GAL4 driver, so it is possible that there are some differences in scaling based on genetic background. *

      • In panel 2D it is impossible to pick out the control posterior vs anterior lines - use different colors as in Figure 1. Why do the control lines for posterior and anterior merge?

      • *We have split the ring canal scaling data from Fig. 2D into different separate panels (Fig. 2D,E), as suggested by Reviewer #1. *
      • These lines likely approach each other because the slope of the line for the anterior ring canals (M4 type) is always larger than the slope for the combined posterior ring canals.

      • Re: Fig. 3: Scaling of what? RC size?

      • *We assume that this comment is related to the heading for this section of the results, so we have added “ring canal to the end of this title, so that it now reads: “Increasing initial ring canal size does not dramatically alter ring canal scaling” *

      • Since there was no effect, "dramatically" should be deleted from the section title.

      • This change has been made.

      • Clarify this sentence: If ring canal size inversely correlates with scaling, then increasing initial ring canal diameter should reduce the scaling exponent.

      • We have made this change in the text.

      • How does panel B show that RCs are larger in arpC2 KD? Fig. S1A has smaller y-intercept for control. Again, it is impossible to see which lines go with which M and which genotype.

      • *As mentioned above, we have modified Fig. 3 to highlight these differences and added additional explanation to the results section. *

      • Panels 4D & 4G are clear - should include significance indications.

      • *We have added asterisks to indicate significant differences. *

      • The conclusion from panels 5B and 5C that reducing RC scaling could lead to smaller mature eggs is a stretch. Without looking at the rest of the lines these data are preliminary and detract from the rest of the paper.

      • *As suggested by Reviewer #1, we have modified the results and discussion sections, and we have added a statement about the need for analysis of additional lines. *

      2. Description of analyses that authors prefer not to carry out

      Comment from Reviewer #2

      • I am surprised that the author has not considered controlling the impact of cell cycle regulation on this scaling process, especially as the work of Dorherty et al. has shown that this type of regulation is essential for regulating the size of nurse cell nuclei. The authors should test the impact of at least dacapo and cyclin E in this process.
      • We have attempted to deplete Dacapo from the germline by crossing two different RNAi lines to multiple germline drivers; however, we have been unable to see a consistent effect on nurse cell nuclear size, which suggests that these RNAi lines may not effectively reduce Dacapo protein in the germline. Although we agree with the reviewer that this is an obvious mechanism that should be explored, we believe that it is not necessary for it to be included in this manuscript, because altering Dacapo levels in the germline would not provide a mechanism to explain our model that ring canal lineage impacts ring canal scaling. Dacapo has been shown to contribute to the hierarchical pattern of nurse cell size observed in the germline. Dacapo mRNA produced in the nurse cells is transported into the oocyte, where it is translated. Then, the Dacapo protein diffuses back into the nurse cells, producing a posterior to anterior gradient 4. Doherty (2021) showed that reducing the levels of the Dacapo protein using the deGradFP system eliminated the nurse cell size hierarchy. If our data had supported a model in which proximity to the oocyte was a strong predictor of ring canal size and scaling (as shown for the nurse cells and their nuclei3,5*), then this would have been an excellent way to dig further into the mechanism. Instead, our data supported a role for ring canal lineage in predicting ring canal growth, since the M4 ring canals at the posterior and anterior showed similar scaling with egg chamber volume. *
      • We believe that performing the proposed experiments (over-expressing HtsRC to increase ring canal size or expressing the phosphomimetic form of the myosin regulatory light chain, sqhE20,E21 to reduce ring canal size) will allow us to determine how ring canal size affects scaling, which will provide additional mechanistic insight into this scaling behavior.*

      *

      Comment from Reviewer #3

      • Panel 3E is interesting and would fit better in Figure 1.
      • *This panel is from a different genetic background than the data in Fig. 1. Therefore, we do not think it would be appropriate to move it to Fig. 1. *
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *

      *Major comments: 1. Mirc56_2 and 4 showed lower integration rates, and the authors suggest that this could be due to sgRNA pool imbalance. The authors should validate this by performing sequencing of the input sgRNA and cassettes. *

      →Thank you very much for your comment, and we agree with your suggestion.

      We are going to confirm sgRNA pool imbalance in donor vector library by amplicon short-read NGS.

      In addition, to confirm another possibility that we raised, we re-sequenced sgRNA donor vector for Mirc56_2 and 4, and will add the following sentences:

      “We firstly doubted that their low integration frequencies were caused by any mutations on PB transposon of sgRNA donor vector, on especially ITR or ID that are important for integration efficiency [PMID: 15663772]. Therefore, we sequenced PB transposons for Mirc56_2 and 4 again. However, we could not find any mutations on their PB transposon.” following to “…efficiency or cell growth.” in the Discussion (page14, line 346)

      *2. Clonal analysis in Figure 5c is unclear a. Figure 5c indicates that all changes were homozygous (e.g. both alleles were deleted). Was this the case in all clones? Or were some mutations heterozygous? *

      →Thank you very much for your comment.

      We apologize for the misleading context.

      We targeted mono allele on X chromosome in male mES cells so that all mutations should be hemizygous as mentioned in the Result (page11, line 259-260)

      To enhance our study is monoallelic assessment, we will add the following sentence:

      “This study targeted mono allele on X chromosome in male mES cells so that all genotype on Mirc56 should be hemizygous and these mutations induced might be cis-mutation.” following to“…owing to six tandem repeats [37]” in the Result (page12, line 302)

      *b. Many clones in Figure 5c show that the entire region was deleted (all black dots). Could this be due to some experimental error or misinterpretation of the sequencing data, or could it be validated using some orthogonal method? This is especially surprising for clones in which the final guide (Mirc56_13) was not detected yet the final site (Mirc56_13) was reported as "Regional deletion". *

      →Thank you very much for your comment.

      We apologize for the misleading context.

      Firstly, we just confirmed and sequenced the mature-miRNA genomic regions by amplifying approximately 200 bp around the target sites. Therefore, we defined unamplified regions as “miRNA deletion”. In addition, to make the Figure 5C easy to understand, we added “predicted regional deletion” and each name of clones as attached.

      In fact, only 4 clones harbored entire Mirc56_X deletions on all analysed Mirc56_X genomic region (Mirc56_1 to 13). Besides, these clones could be PCR-amplified by sgRNA cassettes and Sry on Y chromosome so that these results suggested we could successfully obtain their genomic DNA and at least mature-miRNA genomic regions were deleted.

      Moreover, Mirc56_13 deletions without target sites on Mirc56_13 are always within predicted regional deletions that are induced from upstream and downstream of sgRNA target sites. Therefore, it could be estimated that these deletions were induced from the target sites on Mirc56_14, 15, 16, or 17 and upstream of Mirc56_13.

      To clarify them, we will add the following sentences:

      • “Four clones (#2_066, #1_021, #1_029 and #1_046) harboured entire Mirc56_X deletions on all analysed Mirc56_X genomic region. In addition to these clones, only 3 pairs (#2_019 and #2_084, #2_038 and #1_023, #1_016 and #1_027) harboured same combination of mutations.” following to “…combinations of mutations (Figure 5C).” in the Result (page16, line 378-380)
      • “Meanwhile, focusing on relationship between mutations and target sites that targeted by sgRNA cassettes in each clone, all Mirc56_X genomic regions harbouring Indel mutations were target Micr56_X In addition, if sequential Mirc56_Xs on the genome were deleted, the most upstream and downstream of Mirc56_Xs deleted were always on the target Mirc56_X sites except for #2_025 and #1_41.” following to “…combinations of mutations (Figure 5C).” in the Result (page12, line 304)
      • “Genotyping PCR amplified approximately 200 bp around the mature-miRNA genomic region. Unamplified region is defined as miRNA deletion (Black circle) and amplified region was determined as Indel mutation (Gray circle) or Intact by short-NGS. If sequential Mirc56_Xs on the genome were deleted, black translucent square indicates predicted regional deletion assumed that the genomic region flanked by miRNA deletions was also deleted. Besides, if miRNA deletion was induced in Mirc56_13 and the clone have target Mirc56_X on Mirc56_14, 15, 16, or 17” following to “…in each PB mES clone.” in the Figure legend (page23, line 575) Moreover, because we defined “miRNA deletion”, we will change ”regional deletion” to “miRNA deletion” where I mean “deletion of the mature-miRNA genomic regions” in the Result (page13, line 312) and the Discussion (page14, line 363)

      *3. Next-generation targeted sequencing of clones should be made publicly accessible. *

      →Thank you very much for your comment. We apologize for the inconvenience.

      We already informed Review commons that we made publicly available.

      We already described BioProject ID PRJNA996747 in the Data Availability (page16, line 383-384)

      4. OPTIONAL - Cassette integration number is understudied. One important aspect of tiling mutagenesis is the control over how many guides are present in each cell. The authors report an average of 4.7 cassettes/cell. This could be modulated by the amount of donor vector added, and indeed the authors performed titration experiments, but only with a fluorescent reporter readout. It would be very useful to know how the concentration of donor vector corresponds to the number of cassettes/cell - perhaps genotyping of clones from one or two additional experiments would be sufficient.

      → Thank you very much for your suggestion.

      We agree that cassette integration number is one important aspect of tiling mutagenesis.

      To investigate how many copies our concentration of donor vector could integrate, we are going to check actual copy numbers in several clones by qPCR.

      We think that it is other research to confirm “how the concentration of donor vector corresponds to the number of cassettes/cell”. The correlation might not be liner due to transposase overproduction inhibition (OPI) so that it would require huge amounts of experiments to confirm it. Our research is how CTRL-Mutations induce diverse mutations but not how property PB system have.

      Minor comments: 1. The background fails to acknowledge the work of CRISPR-Cas tiling screens (e.g. https://doi.org/10.1038/nbt.3450) or CRISPR-Cas in creating mutagenesis in cell lines (e.g. https://doi.org/10.1007/978-1-0716-0247-8_29*) *

      → Thank you very much for your suggestion, and we agree with your suggestion.

      We will add to acknowledge previous studies for CRISPR-Cas tilling screens.

      • “Recently, targeted mutagenesis combined forward genetics and reverse genetics has been developed such as saturating mutagenesis and tiling mutagenesis that induce random mutation within target gene(s) [PMID: 25141179, 31586052, 27260157, 28118392]. This targeted mutagenesis can construct a mutant library harbouring subtly different mutations within a target gene(s) so that comparative analysis through the mutant library can screen out critical mutation(s) for biological processes. These random mutagenesises have also revealed the function of numerous coding genes” following to “…list of coding genes [6–8].” in the Introduction (page3, line 55-56)
      • “In addition, the saturating mutagenesis are limited in the length of target region due to an approach basing homology-directed repair although it could introduce random mutations on donor template library harbouring any combination and variety of mutations [PMID: 25141179]. On the other hand, the tiling mutagenesis could expand target length in principle because the length depends on multiplex guide RNA (gRNA) designed to target genomic region. Therefore, tiling mutagenesis has been employed to identify critical regions embedded in cis-elements [PMID: 26375006, 30612741, 26751173, 27708057, 28416141, 31784727]. Tiling mutagenesis requires editor such as Cas9 or epigenetic modifier fused to catalytically dead Cas9 (e.g. KRAB-dCas9), and a library containing multiplex gRNA tiling across target genomic region. In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system.” following to “…within a narrow region.” in the Introduction (page4, line 82) However, we do not agree that we have to acknowledge previous report about KI or KO by single or double cut in cell lines (as you suggested that https://doi.org/10.1007/978-1-0716-0247-8_29) because it is obvious knowledge. Therefore, we will not add this paper.

      2. Figure 1 left 'ROI random mutant PB mES cell' should be horizontally aligned so Mir_1, Mir_2 and MirX align with the upper figure.

      → Thank you very much for your kind comment, and we agree with your suggestion.

      Therefore, we changed it in the Figure 1.

      *3. It is interesting and unexpected that some guides never induce indels, even in the absence of a regional deletion (e.g. Mirc56_3, Mirc56_7). Why might this be? Was there perhaps an error in the assignment of these guides to these cells? *

      → Thank you very much for your comment.

      As you mentioned, Mirc56_3, 4 and 7 had no indel. We appreciate that we can correct our mistakes by your suggestion. We corrected Figure 5D as attached. In addition, we will correct average Mirc56_X site as 22.6 from 22.7.

      These sgRNA also induced miRNA deletion with low frequency (Mirc56_3: 38.9%, Mirc56_4: 25.0% and Mirc56_7: 68.0%, Figure 5D). Moreover, every deleted Mirc56_3, 4 and 7 was within predicted regional deletion except for target Mirc56_3 of PB mES clone #2_080 (revised Figure 5C).

      Therefore, we raised why some guides never induce indels even in the absence of a regional deletion, as “In addition to low frequencies, Indel mutation might disappear due to regional deletion if these sgRNAs could induce Indel mutation”.

      To clarify them, we will add the following sentences:

      • “In particularly, middle target sites such as Mirc56_3, 4 and 7 were induced only miRNA deletion or Intact (Figure 5D)” following to “…in our mutant library (Figure 5C, D).” in the Discussion (page14, line 364)
      • “In fact, every deleted Mirc56_3, 4 and 7 was within predicted regional deletion except for target Mirc56_3 of PB mES clone #2_080 (Figure 5C). In addition, these sgRNA induced mutation with low frequency (Mirc56_3: 38.9%, Mirc56_4: 25.0% and Mirc56_7: 68.0%, Figure S6). Therefore, we suspected that regional deletion and their low mutation introduction rate facilitated to disappear Indel mutation.” following to “…induced at target sites.” in the Discussion (page14, line 366)

        *4. Regarding Mirc56_2 and 4 integration, on line 34 the authors suggest that "We suspect this was caused by a technical error, such as an unequal amount of sgRNA donor vector or the sequence in sgRNA cassettes affecting integration efficiency or cell growth." sgRNA library imbalance would be a technical error, but integration affecting cell growth is not a technical error. This sentence should be reworded. *

      → Thank you very much for your comment.

      We apologize for the misleading sentence even though this paper was already English-reviewed by English language editor.

      We will reword that “We suspect this was caused by the sequence in sgRNA cassettes affecting integration efficiency or cell growth, or a technical error such as an unequal amount of sgRNA donor vector.” following to “…PB mES clones via FACS..” in the Discussion (page14, line 344)

      *5. Line 540 "ration" is the incorrect word - perhaps "ratio"? *

      → Thank you very much for your kind comment, and we are sorry for the typo.

      We will correct it in the Figure legend (page22, line 540).

      6. Plot 5b should be shown as a histogram rather than a swarm plot to show how many clones were in each category.

      → Thank you very much for your suggestion.

      In Figure 5B, we aimed to indicate the number of sgRNA cassette varieties in each clone but not distribution of the number of integrated sgRNA cassettes. Distribution of the number of integrated sgRNA cassettes in clone library matched with the frequency of target sites in Figure 5D.

      We already described the distribution data as “In addition, an average of 22.7 Mirc56_X sites … the same frequency except for the Mirc56_2- and 4-targeting cassettes.” in the Result (page13, line 312-315)

      *Reviewer #1 (Significance (Required)):

      1. General assessment: The authors are successful in creating clonal cell lines bearing a variety of mutations. Unfortunately, the cell lines also have transposase-mediated insertion events of the sgRNA cassettes at unknown positions in the genome, which will hamper the interpretability of any experiment using these cell lines. The authors fail to justify the use of the transposase and integration of the sgRNA, especially compared to lentiviral transfection or RNPs which would produce edits at the region of interest. Alternately, integrated sgRNA cassettes could have been excised with Flp recombinase as in https://doi.org/10.1007/978-1-0716-0247-8_29. *

      → Thank you very much for your suggestion.

      We agree that we did not mention why we choose PiggyBac system compared to lentiviral delivery.

      Therefore, we will add the following sentences:

      • “In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system. To identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis [PMID: 23435812].” in the Introduction.
      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction. However, we are not going to mention comparison to RNPs because it is obvious that random sgRNA expressions is important key for random mutagenesis and design of random sgRNA treatments by RNP is difficult. The reason is that the target region might be cleaved by almost all sgRNA incorporated into cells. On the other hand, it is easier to design the number of sgRNA expression variety using the delivery system via integration into the chromosome because only integrate sgRNA are expressed.

      In addition, we could not agree that “integrated sgRNA cassettes could have been excised with Flp recombinase as in https://doi.org/10.1007/978-1-0716-0247-8_29.”

      This paper reports the concept that one EM7>neoR expression cassette flanked by Frt within KI allele could select intended-KI clone and then the cassette could remove by Flp recombinase. However, this approach is not suitable for our method because it causes structural mutation by recombination of multi Frt cassettes that are integrated into nearby genomic regions. Therefore, we will not mention it.

      *2. Additionally, the genotyping analysis is unclear, and seems to indicate that each clone bears homozygous mutations, with several clones showing deletions of the entire region. *

      → Thank you very much for your suggestion.

      We will revise them in Reviewer #1 Major comment 2a and b.

      3. Advance: The authors are motivated to create clones using tiling mutagenesis. Tiling mutagenesis has already been performed without transposases (e.g. https://doi.org/10.1038/nbt.3450, https://doi.org/10.1371/journal.pone.0170445, https://doi.org/10.1038/s41467-019-12489-8*) in the context of a screen, and clones have already been created using CRISPR/Cas9 mutagenesis so the advance presented in this manuscript over previous published work is unclear. *

      →Thank you very much for your suggestion, and we agree with your suggestion.

      We will add to acknowledge previous studies for CRISPR-Cas tilling screens.

      We will add the following sentences:

      • “Recently, targeted mutagenesis combined forward genetics and reverse genetics has been developed such as saturating mutagenesis and tiling mutagenesis that induce random mutation within target gene(s) [PMID: 25141179, 31586052, 27260157, 28118392]. This targeted mutagenesis can construct a mutant library harbouring subtly different mutations within a target gene(s) so that comparative analysis through the mutant library can screen out critical mutation(s) for biological processes. These random mutagenesises have also revealed the function of numerous coding genes” following to “…list of coding genes [6–8].” in the Introduction (page3, line 55-56)
      • “In addition, the saturating mutagenesis are limited in the length of target region due to an approach basing homology-directed repair although it could introduce random mutations on donor template library harbouring any combination and variety of mutations [PMID: 25141179]. On the other hand, the tiling mutagenesis could expand target length in principle because the length depends on multiplex guide RNA (gRNA) designed to target genomic region. Therefore, tiling mutagenesis has been employed to identify critical regions embedded in cis-elements [PMID: 26375006, 30612741, 26751173, 27708057, 28416141, 31784727]. Tiling mutagenesis requires editor such as Cas9 or epigenetic modifier fused to catalytically dead Cas9 (e.g. KRAB-dCas9), and a library containing multiplex gRNA tiling across target genomic region. In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system.” following to “…within a narrow region.” in the Introduction (page4, line 82) The paper you raised as DOI: https://doi.org/10.1038/nbt.3450 applied CRISPRko tiling mutagenesis to find out critical region embedded 2 kb of p53 binding enhancer region by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions. In addition, we expand the length of target region to more than 50 kb. This is one of the advances.

      The paper you raised as DOI: https://doi.org/10.1371/journal.pone.0170445 applied CRISPRko tiling mutagenesis to find out critical mutation on MAP2K1 and BRAF protein coding sequence by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions.

      The paper you raised as DOI: https://doi.org/10.1038/s41467-019-12489-8 applied CRISPRko tiling mutagenesis for to find out critical domain from protein coding sequence by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions.

      To clarify the advantages, we will add the following sentences:

      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction.
      • “CRISPRko tiling mutagenesis is conducted for less than 15 kb target genomic region so far [PMID: 26375006, 30612741], while CRISPRi tiling mutagenesis can target more than 70 kb [PMID: 27708057], it is reported that. Hence, it remains unknown unclear how length CRISPRko tiling mutagenesis could expand” in the Introduction. In addition, we are going to conduct transposon removal by exicision-only-PBase treatment with several PB mES clones, for the proof of concept that CTRL-Mutagenesis can generate mutant library with no sgRNA cassettes.

      *4. Audience: The manuscript is written for the basic research audience, and the method could be applied to the study of regions of interest in many diseases. However, the unexcised use of transposases make the method less desirable than other methods. *

      → Thank you very much for your suggestion.

      We do not agree that the PiggyBac make the method less desirable than other methods.

      As mentioned in our response for reviewer #1 Significance 3, only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. However, sgRNA cassettes by lentiviral delivery is never removed from the genome. In addition, other approaches such as Flp recombinase that reviewer #1 proposed in Significance 1 is not better than PiggyBac because Flp recombinase causes stratal mutation by recombination of multi Frt cassettes that are integrated into nearby genomic regions.

      To clarify them, we will add the following sentences:

      • “However, multiple integration of single guide RNA (sgRNA) cassettes has higher risk of non-targeted endogenous gene disruptions and may impair functional analysis.” in the Abstract.
      • “However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis [PMID: 23435812]. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome.” in the Introduction.
      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction. In addition, we are going to conduct transposon removal by exicision-only-PBase treatment with several PB mES clones, for the proof of concept that CTRL-Mutagenesis can generate mutant library with no sgRNA cassettes.

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Major concerns:

      1) Concern about the Novelty of Functional Analysis Platforms: The authors claim that there are no established platforms for the study of cis-elements or microRNA clusters. This assertion seems inaccurate, as previous studies have utilized Cas9 tiling screens to investigate cis-regulatory elements (CREs) and large-scale screens to probe microRNA functions, as exemplified by the works of Canver et al. in Nature 2015, Gasperini et al. in Cell 2019, and others. *

      → Thank you very much for your suggestion, and we agree with your suggestion.

      We apologize our false claim so that we will delete the following sentences:

      • “In contrast, no functional analysis platforms have been established for the study of cis-elements or microRNA cluster regions consisting of multiple microRNAs with functional overlap” in the Abstract (page2, line 28-30)
      • “While loss-of-function analysis has been conducted for numerous coding genes, very limited progress has been made on non-coding genes and cis-elements.” in the Introduction (page3, line 47-49) The paper you raised as DOI: https://doi.org/10.1038/nature15521 (Canver et al. in Nature 2015) applied CRISPRko tiling mutagenesis to find out critical region embedded 12 kb of BCL11A enhancer region by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions. In addition, we expand the length of target region to more than 50 kb. This is one of the advances.

      The paper you raised as DOI: https://doi.org/10.1016/j.cell.2018.11.029 (Gasperini et al. in Cell 2019) applied CRISPRko tiling mutagenesis to find out critical region embedded maximum 12 kb enhancer candidates, in addition to CRISPRi tilling candidate screening through one sgRNA by one candidate enhancer, by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions. In addition, we expand the length of target region to more than 50 kb. This is one of the advances. Additionally, to identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome. Thus, to identify combinations of critical region embedded in target regions with no artifact owing to no footprint by removal of sgRNA cassettes, CRISPRko tiling mutagenesis rather than CRISPRi is better method because CRISPRi requires integrated cassettes that stably expressed sgRNA and epigenetic modifier fused to dCas9. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions to find out combinations of critical region embedded in target regions.

      Therefore, to add to acknowledge previous studies and clarify the advantages, we will add the following sentences:

      • “In addition, the saturating mutagenesis are limited in the length of target region due to an approach basing homology-directed repair although it could introduce random mutations on donor template library harbouring any combination and variety of mutations [PMID: 25141179]. On the other hand, the tiling mutagenesis could expand target length in principle because the length depends on multiplex guide RNA (gRNA) designed to target genomic region. Therefore, tiling mutagenesis has been employed to identify critical regions embedded in cis-elements [PMID: 26375006, 30612741, 26751173, 27708057, 28416141, 31784727]. Tiling mutagenesis requires editor such as Cas9 or epigenetic modifier fused to catalytically dead Cas9 (e.g. KRAB-dCas9), and a library containing multiplex gRNA tiling across target genomic region. In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system.” following to “…within a narrow region.” in the Introduction (page4, line 82)
      • “To identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis [PMID: 23435812]. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome. Thus, to identify combinations of critical region embedded in target regions with no artifact owing to no footprint by removal of sgRNA cassettes, CRISPRko tiling mutagenesis rather than CRISPRi is better method because CRISPRi requires integrated cassettes that stably expressed sgRNA and epigenetic modifier fused to dCas9.” in the Introduction.
      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction.
      • “CRISPRko tiling mutagenesis is conducted for less than 15 kb target genomic region so far [PMID: 26375006, 30612741], while CRISPRi tiling mutagenesis can target more than 70 kb [PMID: 27708057], it is reported that. Hence, it remains unknown unclear how length CRISPRko tiling mutagenesis could expand” in the Introduction. On the other hand, we could not find previous studies employing Cas9 tiling mutagenesis to investigate miRNA functions. The application for miRNA cluster is also one of the advances.

      2) Advantages of PiggyBack System Over Lentiviral Integration: The paper does not clearly articulate the advantages of their proposed PiggyBack-based system for sgRNA integration over traditional lentiviral integration. Both methods facilitate the random integration of multiple gRNAs, but the paper lacks a comparative analysis or justification for choosing the PiggyBack system.

      → Thank you very much for your suggestion, and we agree with your suggestion.

      We agree that we did not mention why we choose PiggyBac system compared to lentiviral delivery.

      Therefore, we will add the following sentences:

      • “In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system. To identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis [PMID: 23435812].” in the Introduction.
      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction.

        *3) Lack of Comparative Analysis with Alternative Methods: The authors did not provide a comparison of CTRL-Mutagenesis with other existing screening methods. Such a comparison is crucial for understanding the effectiveness and efficiency of the new method in relation to established techniques. *

      → Thank you very much for your suggestion.

      We agree with the comparison is one of important experiments.

      However, our main claim is validation of tiling mutagenesis using PiggyBac that is only integration system with no footprint. Therefore, we propose our novelty without the comparison and not argue higher / lower efficiency of CTRL-Mutagenesis compared to exiting methods.

      *4) Limitations in Library Resolution: The paper acknowledges the limited resolution of their proposed library. The authors might have explored the use of base editors for enhanced resolution in such screens, as base editing could potentially offer more precise and controlled mutagenesis as briefly mentioned in the discussion. *

      → Thank you very much for your suggestion.

      We agree with your suggestion.

      Base editing is occurred within only editing window. In addition, a major limitation of prime editing is low efficiency (https://doi.org/10.1016/j.tibtech.2023.03.004). Therefore, design of sgRNA for base editor or pegRNA and its editing efficiency requires huge amounts of experiments.

      Our study is proof of concept to validate PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions. Thus, we just discussed limited resolution of our mutant library and proposed the use of base editors for enhanced resolution in the Discussion (page14, line 366-370).

      5) Absence of Functional Data Post-Mutagenesis: A significant limitation of the study is the absence of functional data following the creation of cells with different mutations. While the authors speculate about using differentiation systems or organoids for practical applications, they do not provide empirical data to demonstrate the utility of the CTRL-Mutagenesis approach. This lack of functional validation raises questions about the practical applicability of the method.

      → Thank you very much for your suggestion.

      We agree with your suggestion.

      We would make functional analysis future research.

      In this paper, we just validated PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions.

      To change our tone that claiming usability of our method for functional analysis, we will change the following sentences:

      • Change “to identify functionally important elements in non-coding regions” to “to induce diverse combination and variety of mutations within more than 50 kb non-coding region” in the Title.
      • Add “However, not much loss-of-function screens of non-coding regulatory elements has been conducted due to ambiguous annotations compared with protein-coding genes. Tiling mutagenesis has been employed to identify critical regions embedded in non-coding regulatory elements by comparative analysis through a mutant library harbouring subtly different regions mutated within less than 15 kb region. Conventional tiling mutagenesis construct a mutant library integrated multiple sgRNA cassettes by retroviral delivery. However, multiple integration of single guide RNA (sgRNA) cassettes has higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. Herein, combining tiling mutagenesis and PiggyBac transposon that can be removed with no footprint on integrated sites, we established an expanded tilling mutagenesis method named CRISPR- & Transposase-based RegionaL Mutagenesis (CTRL-Mutagenesis). We demonstrated that PiggyBac system could integrated diverse combinations and varieties of sgRNA cassettes.and then CTRL-Mutagenesis randomly induces diverse combination and variety of mutations within more than 50 kb non-coding region in murine embryonic stem cells. CTRL-Mutagenesis would apply for wider non-coding regulatory elements with no risk of non-targeted endogenous gene disruptions.” in the Abstract.
      • Delete “Comparative analysis of mutants harbouring subtly different mutations within the same region would facilitate the further study of cis-element and microRNA clusters.” in the Abstract (page2, line 38-40).
      • Change “The generated random mutant mES clone library could facilitate further functional analyses of non-coding regulatory elements within the genome.” to “The generated random mutant mES clone library could develop to investigate critical regions of non-coding regulatory elements within the genome.” In the Introduction (page4, line 88-90)

        *Reviewer #2 (Significance (Required)):

        1. In summary, while the idea to integrate sgRNA in the genome by the PiggyBack system is interesting the claim of novelty is questionable due to existing methods in the field. The advantages of their system over existing technologies are not clearly articulated, and a lack of comparative analysis with other methods leaves the efficiency of CTRL-Mutagenesis uncertain. *

      → Thank you very much for your suggestion.

      Previous studies about CRISPRko and CRISPRi tiling mutagenesis employ lentiviral delivery of sgRNA cassettes into the genome. However, multi sgRNA cassette integrations have higher risk to disrupt non-targeted endogenous functions. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome. Nevertheless, lentiviral transposon, one of retrotransposon, cannot be removed from the chromosome. On the other hand, only PiggyBac transposon can be removed with no footprint. Therefore, we aimed to validate PiggyBac system for tiling mutagenesis. Moreover, there is no report that CRISPRko tiling mutagenesis apply for more than 15 kb genomic region. Therefore, we aimed to expand the length of target region.

      Therefore, we will change our claim that our method could expand CRISPRko tiling mutagenesis to more than 50 kb with no risk of non-targeted endogenous gene disruption.

      We will add the novelty and advantage of our method.

      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction.
      • “CRISPRko tiling mutagenesis is conducted for less than 15 kb target genomic region so far [PMID: 26375006, 30612741], while CRISPRi tiling mutagenesis can target more than 70 kb [PMID: 27708057], it is reported that. Hence, it remains unknown unclear how length CRISPRko tiling mutagenesis could expand” in the Introduction. However, our main claim is validation of tiling mutagenesis using PiggyBac that is only integration system with no footprint. Therefore, we will propose our novelty without the comparison and not argue higher / lower efficiency of CTRL-Mutagenesis compared to exiting methods.

      In addition, we are going to conduct transposon removal by exicision-only-PBase treatment with several PB mES clones, for the proof of concept that CTRL-Mutagenesis can generate mutant library with no sgRNA cassettes.

      2. Moreover, the limited resolution of their library and the absence of functional data post-mutagenesis are significant drawbacks that need to be addressed in future research to ascertain the method's practical utility.

      → Thank you very much for your suggestion.

      We agree with your suggestion.

      We would make functional analysis future research.

      Base editing is occurred within only editing window. In addition, a major limitation of prime editing is low efficiency (https://doi.org/10.1016/j.tibtech.2023.03.004). Therefore, design of sgRNA for base editor or pegRNA and its editing efficiency requires huge amounts of experiments.

      Our study is proof of concept to validate PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions. Thus, we just discussed limited resolution of our mutant library and proposed the use of base editors for enhanced resolution in the Discussion (page14, line 366-370).

      Therefore, we just claimed that we validated PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions.

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Major comments: 1. Authors claim that "CTRL-mutagenesis randomly induces diverse mutations only within the targeted regions in murine embryonic stem (mES) cells.", however, the outcome of mutations is not entirely random since most of the mutations are regional deletions. For example, despite the random distribution of gRNAs per cell, the inner regions like Mirc56_5 or Mirc56_8 are mutated with >80% efficiency.*

      → Thank you very much for your comment.

      We agree that middle regions are tending to be deleted and mutation type induced is not entirely random. However, we do not agree that “the outcome of mutations is not entirely random since most of the mutations are regional deletions.” Focusing on the combinations of mutations as mentioned in the Result (page12, line 302-304), CTRL-Mutagenesis could induce diverse mutation combinations randomly at a moderate degree. In fact, 79.2% of clones harboring multiple mutations were induced different combinations of mutations. In addition, to confirm how mutations occurred within Mirc56 by CTRL-Mutagenesis, we constructed only 87 mutant clones though single cloning. Therefore, it is not completely understanded due to fewer clones compared with conventional CRISPRko tiling mutant library. Of course, we should improve the randomness of mutation combinations, but we already discussed it and proposed solutions in the Discussion (page14, line 366-370).

      Certainly, CTRL-Mutagenesis would be difficult to identify necessary and sufficient genomic region due to incomplete randomness. Nevertheless, there is no report to induce diverse combination and variety of mutations within more than 50 kb genomic region. Hence, CTRL-Mutagenesis should be worth screening out critical regions within more than 50 kb regions.

      To clarify them, will add the following sentences:

      • “CRISPRko tiling mutagenesis is conducted for less than 15 kb target genomic region so far [PMID: 26375006, 30612741], while CRISPRi tiling mutagenesis can target more than 70 kb [PMID: 27708057], it is reported that. Hence, it remains unknown unclear how length CRISPRko tiling mutagenesis could expand” in the Introduction.
      • “Four clones (#2_066, #1_021, #1_029 and #1_046) harboured entire Mirc56_X deletions on all analysed Mirc56_X genomic region. In addition to these clones, only 3 pairs (#2_019 and #2_084, #2_038 and #1_023, #1_016 and #1_027) were induced same combination of mutations. Besides, 26 clones had only one mutation from Mirc56_1 to Mirc56_13. On the other hand, there was no mutation on Mirc56_1 to 13 in 11 clones including 5 clones (#2_012, #2_015, #2_054, #2_092 and #2_102) carried no sgRNA cassette for Mirc56 _1 to 13 and 6 clones (#2_017, #2_053, #2_098, #1_003, #1_012 and #1_044) even carried any one of sgRNA cassettes for Mirc56 _1 to 13. Among 48 clones carrying multiple mutations except for clones carrying only one mutation or Intact, 38 clones (79.2%) harboured different combinations of mutations. These results suggested that CTRL-Mutagenesis could induce diverse combinations of mutations.” following to “…different combinations of mutations (Figure 5C).” in the Result (page12, line 304)
      • “Note that CTRL-Mutagenesis would be difficult to identify necessary and sufficient genomic region due to incomplete randomness. Nevertheless, CTRL-Mutagenesis should be worth screening out critical regions within more than 50 kb regions” following to “…to induce regional deletions.” in the Discussion (page15, line 378)
      • Change “diverse mutations” to “diverse combination and variety of mutations” in the Title, Abstract (page2, line 37), Introduction (page4, line 87), Result (page13, line 318), Discussion (page13, line 325), (page14, line 363) Additionally, we do not agree with your suggestions that “the inner regions like Mirc56_5 or Mirc56_8 are mutated with >80% efficiency”. We apologize for the misleading context. These high mutation rates were calculated on only the target sites. Actually, maximum mutation rate on all MIrc56_X genomic regions are 44.8% on Mirc56_10, minimum is 14.9% on Mirc56_2 and an average is 30.9% (attached Figure).

      We appreciate that we can recognize our misleading context by your suggestion. It is more important that the analysis focusing all Mirc56_X genomic regions rather than target Mirc56_X. Therefore, we newly made figure about event occurrence in Mirc56_X genomic regions (attached Figure) as Figure 5D and replaced previous Figure 5D about event occurrence in target Mirc56_X to Supplemental Figure S6.

      To clarify them, we will add the following sentences:

      • “As for event occurrences on each Mirc56_X genomic region, miRNA deletions were dominant and an average of 26.7 Mirc56_X genomic region were induced mutations in 87 clones (Figure 5D). Maximum mutation rate on all MIrc56_X genomic regions was 44.8% (39/87) on Mirc56_10, minimum was 14.9% (13/87) on Mirc56_2” following to “…on the same strand” in the Result (page12, line 309)
      • “__D, __Mutations in 87 Mirc56 random mutant clones. The target sites do not include Mirc56_14, 15, 16, and The vertical axis and bar graphs show event occurrence on each Mirc56 genomic region in 87 Mirc56 random mutant clones. The bar colour indicates each event (Black: Regional deletion, Gray: Indel mutation, White: Intact).” in the Figure legend.

        2. Also, although the authors discuss that the lower mutation frequency observed for Mirc56_2 and 4 may be due to a technical error, confirming this by repeating the experiment would be important to prove the usability of this method.

      →Thank you very much for your comment, and we agree with your suggestion.

      We had already constructed bulk PB mES cells twice and showed Figure 4B combined these experimental replicates.

      To clarify that we constructed bulk PB mES cells twice, we changed Figure 4B as attached and will add the following sentences:

      • “Even though these bulk PB mES cells were constructed twice, it seemed that sgRNA cassettes for Mirc56_2 and 4 were difficult to integrate into the genome.” following to “…were rarely detected” in the Result (page11, line 273)
      • “In addition, we suspected technical errors so that we constructed bulk PB mES cells twice. Unfortunately, their low integration frequencies were not improved.” following to “…efficiency or cell growth.” in the Discussion (page14, line 346)
      • “Bulk1 and Bulk2 indicate the experimental replicate.” following to “…next-generation sequencing (NGS).”in the Figure legend (page22 line 563) In addition, we re-sequenced sgRNA donor vector for Mirc56_2 and 4, and will add the following sentences:

      “We firstly doubted that their low integration frequencies were caused by any mutations on PB transposon of sgRNA donor vector, on especially ITR or ID that are important for integration efficiency [PMID: 15663772]. Therefore, we sequenced PB transposons for Mirc56_2 and 4 again. However, we could not find any mutations on their PB transposon.” following to “…efficiency or cell growth.” in the Discussion (page14, line 346)

      Moreover, to confirm the technical error, we are going to confirm sgRNA pool imbalance in donor vector library by amplicon short-read NGS.

      *3. Additionally, the experiments were performed on the haploid X chromosome of a male cell line. It is questionable whether this method can be generalized to other regions located in the other chromosomes. Clarifying These points would be essential especially because the focus of this manuscript is to describe the efficiency of this novel methodology. *

      →Thank you very much for your comment.

      We expect that CTRL-Mutagenesis could be valid on other biallelic locus.

      Therefore, we raised predicted issue such as complex genotyping and proposed one solution.

      When we target other biallelic locus, we must determine whether the combination of mutations induced are cis- or trans-mutations. Haplotype phasing, combined long-read sequencing with SNP markers within ROI on maternal/paternal chromosome, assembles each allele via SNP markers on each read [PMID: 35710642]. Therefore, combining CTRL-Mutagenesis on heterozygotic alleles of cells derived from such as human or murine hybrid with haplotype phasing might simplify genotyping.

      We will add the following sentences:

      “In this study, CTRL-Mutagenesis was validated by genotyping on mono allele in male mES cells to avoid investigating whether the combination of mutations induced are cis- or trans-mutations. All genotypes on Mirc56 should be hemizygous and these mutations induced might be cis-mutations so that we determined the genotypes by amplifying approximately 200 bp around the target sites. However, we did not confirm large mutations such as deletion of the genomic region between target sites and inversion. Long-read sequencing might capture their large mutations. Besides, we also expect that CTRL-Mutagenesis could be valid for ROI on biallelic autosome and X chromosome in female. Therefore, it is required to determine whether the combination of mutations induced are cis- or trans-mutation. Haplotype phasing, combined long-read sequencing with SNP markers within ROI on maternal/paternal chromosome, assembles each allele via SNP markers on each read [PMID: 35710642]. Therefore, combining CTRL-Mutagenesis on heterozygotic alleles of cells derived from such as human or murine hybrid with haplotype phasing might simplify genotyping.” in the Discussion.

      *4. The limitations of the methods seem not to be fully described in the manuscript and must be clarified. Compared to the previous studies (see "significance" section for details), this method is inferior in that (1) it is time-consuming because it requires clonal expansion of single cells and (2) it has low throughput because it requires genome sequencing due to the occurrence of deletions. These points should be described for the potential users of this methodology. For example, it may be useful to detail the time consumption in each experimental step in Fig. 4A. *

      →Thank you very much for your comment.

      We do not agree that (1) it is time-consuming because it requires clonal expansion of single cells.

      To confirm the mutations that CTRL-Mutagenesis induced, we did not conduct phenotyping screening such as dropout screening in this study. For further high-throughput screening, CTRL-Mutagenesis could apply bulk mutant mES cells, that is treated with Cas9 and EGFP-positive, for phenotyping screening.

      Additionally, we do not agree that (2) it has low throughput because it requires genome sequencing due to the occurrence of deletions. In this study, to prove our concept that CTRL-Mutagenesis could induce diverse combinations and varieties of mutations such as Indel and regional deletion, we conducted genotyping in all random mutant clones. On the other hand, there are alternative comparative method to improve throughput without genotyping. Combination of phenotyping screening and gene expression assay for target miRNAs or transcript regulated by target cis-element help us obtain clones harboring mutations on functionally critical regions within target region. Finally, we should conduct genotyping to identify critical regions embedded in non-coding regulatory elements.

      Even so, we will add the time consumption in Figure 4A as attached because the information may be useful for potential users as you mentioned.

      *Minor comments: 1. Data and methods are well-presented for reproducibility. The EGFP-positive ratio may be added to Fig. 4C for clarity. *

      →Thank you very much for your kind comment.

      We added the EGFP-positive ratio to Figure 4C and will add the following sentence:

      “The percentage above the box indicates the EGFP-positive ratio.” following to “…the gates of the EGFP filter.” in the Figure legends (page23, line 567)

      2. Enhance referencing accuracy, rectify DOI format in ref 21, and ensure consistency in citation formatting, e.g., ref 32.

      →Thank you very much for your kind comment.

      Along with the transfer, we will modify the style of references and have already confirmed the referencing accuracy in the Reference.

      3. It seems that the experimental condition (e.g. The amount of vectors used for transfection) should be re-considered every time the researcher wants to set up an experiment changing target genomic regions, cell types etc. If so, this also should be described in the text for potential users of this method.

      →Thank you very much for your comment, and we agree with your suggestion.

      We will add the following sentences:

      “This study just validated CTRL-Mutagenesis for 17 target sites in mES cells. Therefore, it might be better to adjust the number of integrated sgRNA cassettes according to the number of target sites and cell types.” following to “…sgRNA cassettes to be integrated.” in the Discussion (page14, line 355)

      *Reviewer #3 (Significance (Required)):

      There were various methods described in the late 2010's which aimed to screen for the functional non-coding regions using approaches such as KO-based, HDR-based, and epigenetic silencing using dCas9 (for example, PMID: 25141179, 26751173, 27708057, 28416141, 31784727). The authors should summarize what would be the strength of their method compared to these previously described methodologies. The strength of this methodology seems to be moderate complexity and cost-effectiveness compared to these previous techniques. It may be difficult for this methodology to become a state-of-the-art method to evaluate cis-element combinations, but it can be beneficial to researchers wanting to set up a low-cost system that can produce moderately complex cell libraries.*

      →Thank you very much for your suggestion.

      We will add to acknowledge previous studies for CRISPR-Cas tilling screens.

      • “Recently, targeted mutagenesis combined forward genetics and reverse genetics has been developed such as saturating mutagenesis and tiling mutagenesis that induce random mutation within target gene(s) [PMID: 25141179, 31586052, 27260157, 28118392]. This targeted mutagenesis can construct a mutant library harbouring subtly different mutations within a target gene(s) so that comparative analysis through the mutant library can screen out critical mutation(s) for biological processes. These random mutagenesises have also revealed the function of numerous coding genes” following to “…list of coding genes [6–8].” in the Introduction (page3, line 55-56)
      • “In addition, the saturating mutagenesis are limited in the length of target region due to an approach basing homology-directed repair although it could introduce random mutations on donor template library harbouring any combination and variety of mutations [PMID: 25141179]. On the other hand, the tiling mutagenesis could expand target length in principle because the length depends on multiplex guide RNA (gRNA) designed to target genomic region. Therefore, tiling mutagenesis has been employed to identify critical regions embedded in cis-elements [PMID: 26375006, 30612741, 26751173, 27708057, 28416141, 31784727]. Tiling mutagenesis requires editor such as Cas9 or epigenetic modifier fused to catalytically dead Cas9 (e.g. KRAB-dCas9), and a library containing multiplex gRNA tiling across target genomic region. In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system.” following to “…within a narrow region.” in the Introduction (page4, line 82) The paper you raised as DOI: https://doi.org/10.1038/nature13695 (PMID: 25141179) applied saturation mutagenesis to find out critical mutation on BRCA1 and DBR1 protein coding sequence by HDR-based strategy using donor template library. This method based homologous recombination repair, so that the length of target region is limited. Our method employs tiling mutagenesis whose target length depends on sgRNA designed. We expand the length of target region to more than 50 kb from less than 15 kb previously reported. This is our strength compared with this report.

      The paper you raised as DOI: https://doi.org/10.1038/nbt.3450 (PMID: 25141179) applied CRISPRko tiling mutagenesis to find out critical region from 2 kb of p53 binding enhancer region by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions. In addition, we expand the length of target region to more than 50 kb. This is one of the advances.

      The paper you raised as DOI: https://doi.org/10.1126/science.aag2445 (PMID: 27708057) applied CRISPRi tiling mutagenesis to find out critical region from 74 kb genomic region around GATA1 and MYC by lentiviral delivery of sgRNA cassettes. Our method employs CRISPRko and PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. To identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome. Thus, to identify combinations of critical region embedded in target regions with no artifact owing to no footprint by removal of sgRNA cassettes, CRISPRko tiling mutagenesis rather than CRISPRi is better method because CRISPRi requires integrated cassettes that stably expressed sgRNA and epigenetic modifier fused to dCas9. PiggyBac system can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions to find out combinations of critical region embedded in target regions.

      The paper you raised as DOI: https://doi.org/10.1016/j.molcel.2017.03.007 (PMID: 28416141) reported applied CRISPRi tiling mutagenesis to find out critical region from TAD scale (about 200 kb) with low magnification by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions.

      The paper you raised as DOI: https://doi.org/10.1038/s41588-019-0538-0 (PMID: 31784727) reported applied CRISPRi tiling mutagenesis to develop method that can find out novel regulatory element around protein coding by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions.

      To clarify the advantages, we will add the following sentences:

      • “To identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis [PMID: 23435812]. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome. Thus, to identify combinations of critical region embedded in target regions with no artifact owing to no footprint by removal of sgRNA cassettes, CRISPRko tiling mutagenesis rather than CRISPRi is better method because CRISPRi requires integrated cassettes that stably expressed sgRNA and epigenetic modifier fused to dCas9.” in the Introduction.
      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction.
      • “CRISPRko tiling mutagenesis is conducted for less than 15 kb target genomic region so far [PMID: 26375006, 30612741], while CRISPRi tiling mutagenesis can target more than 70 kb [PMID: 27708057], it is reported that. Hence, it remains unknown unclear how length CRISPRko tiling mutagenesis could expand” in the Introduction. In addition, we are going to conduct transposon removal by exicision-only-PBase treatment with several PB mES clones, for the proof of concept that CTRL-Mutagenesis can generate mutant library with no sgRNA cassettes.

      *Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Major concerns, 1, Authors claim "to identify functionally important elements in non-coding regions in the title but there is no evidence of any functional analysis in the manuscript.*

      → Thank you very much for your suggestion, and we agree with your suggestion.

      In this paper, we just validated PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions.

      To change our tone that claiming usability of our method for functional analysis, we will change the following sentences:

      • Change “to identify functionally important elements in non-coding regions” to “to induce diverse combination and variety of mutations within more than 50 kb non-coding region” in the Title.
      • Add “However, not much loss-of-function screens of non-coding regulatory elements has been conducted due to ambiguous annotations compared with protein-coding genes. Tiling mutagenesis has been employed to identify critical regions embedded in non-coding regulatory elements by comparative analysis through a mutant library harbouring subtly different regions mutated within less than 15 kb region. Conventional tiling mutagenesis construct a mutant library integrated multiple sgRNA cassettes by retroviral delivery. However, multiple integration of single guide RNA (sgRNA) cassettes has higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. Herein, combining tiling mutagenesis and PiggyBac transposon that can be removed with no footprint on integrated sites, we established an expanded tilling mutagenesis method named CRISPR- & Transposase-based RegionaL Mutagenesis (CTRL-Mutagenesis). We demonstrated that PiggyBac system could integrated diverse combinations and varieties of sgRNA cassettes.and then CTRL-Mutagenesis randomly induces diverse combination and variety of mutations within more than 50 kb non-coding region in murine embryonic stem cells. CTRL-Mutagenesis would apply for wider non-coding regulatory elements with no risk of non-targeted endogenous gene disruptions.” in the Abstract.
      • Delete “Comparative analysis of mutants harbouring subtly different mutations within the same region would facilitate the further study of cis-element and microRNA clusters.” in the Abstract (page2, line 38-40).
      • Change “The generated random mutant mES clone library could facilitate further functional analyses of non-coding regulatory elements within the genome.” to “The generated random mutant mES clone library could develop to investigate critical regions of non-coding regulatory elements within the genome.” In the Introduction (page4, line 88-90)

        2, Genotypes of mutant library, especially Mirc56, 14,15, 16, 17 were not determined due to six tandem repeats. Thus, analysis of the relationship between genotype and biological functions is not possible. Moreover, the authors did not show any phenotypic analysis.

      → Thank you very much for your suggestion.

      The 6 tandem repeats consisted of each approximately 3.3 kb are hard to determine mutations and are uncommon.

      Therefore, we skipped genotyping Mirc56_14, 15, 16, and 17

      Certainly, it is drawback that we did not determine all mutations induced by CRTL-mutagenesis.

      Even so, we could determine the properties of mutant library within 37 kb genomic region from Mirc56_1 to Mirc56_13.

      Therefore, we could conclude that CTRL-mutagenesis could induce diverse combinations and variations of mutations into more than 50 kb.

      3, Multiple gRNA may cause deletion and inversion to targeted loci. With local PCR based amplification, detection of large deletion and inversion can be very difficult. I think the authors should examine and address this possibility more carefully. The definition of indel in Fig 5C should be explained in more detail.

      → Thank you very much for your comment, and we agree with your suggestion.

      We did not confirm inversion and large deletion.

      To confirm whether inversions were happened, we are going to perform PCR walking in several clones and long-read sequencing.

      4, Although the authors showed a variety of PB cassettes (Max is 17), more importantly would be to determine the actual copy number of PB cassettes. Difference between the highest and the lowest EGFP intensities in Fig 2C (Donor 300ng Effector 350ng) is approximately ~100 fold, thus ES clone bearing highest PB vector may contain ~100 copies of PB vector. PB transposon prefers insertion in active genes compared to other transposon system such as Sleeping Beauty and Tol2 transposon. (Yoshida J et al Sci Rep. 2017 Mar 2;7:43613. doi: 10.1038/srep43613.). Higher integration rates of PB vectors have a higher chance of endogenous gene disruptions and may impair functional analysis.

      → Thank you very much for your suggestion.

      We agree that cassette integration number is one important aspect of tiling mutagenesis. To determine actual copy number of PB transposon is useful information when potential user consider optimizing our method for own target region. However, to confirm whether the relationship between mutations induced and sgRNA cassettes integrated, the number of integrated cassette variety is more important because the diversity of sgRNAs variety expressed is more related to the diversity of mutations induced. Therefore, we identified the number of integrated cassette variety.

      To clarify this point, we will add we the following sentences:

      “rather than the copy number of sgRNA cassettes because the diversity of sgRNAs variety expressed is more related to the diversity of mutations induced” following to “…the number of sgRNA cassette varieties.” in the Result (page12, line 297)

      Certainly, we apologize that it is not accurate that “EGFP signal intensity correlated with the copy number of EGFP cassettes integrated into genomes[23]” in the Result (page11, line 249-250). EGFP expression levels are affected by cell cycle so that the paper reported that “Median EGFP intensities correlated with the copy number of EGFP cassettes integrated into genomes”.

      Therefore, we will delete the following sentence:

      “EGFP signal intensity correlated with the copy number of EGFP cassettes integrated into genomes[23]” in the Result (page11, line 249-250).

      To investigate how many copies our concentration of donor vector could integrate, we are going to check actual copy numbers in several clones by qPCR.

      Besides, we agree with your suggestion that “PB transposon prefers insertion in active genes compared to other transposon system such as Sleeping Beauty and Tol2 transposon. (Yoshida J et al Sci Rep. 2017 Mar 2;7:43613. doi: 10.1038/srep43613.)

      Therefore, we will change the following sentence:

      “random TTAA sites across genomes [24]” to “random TTAA sites of transcribed region rather than intergenic region [PMID: 28252665]” in the Discussion (page14, line 357).

      However, Sleeping Beauty and Tol2 transposon remain footprint at integration sites when these transposons move [PMID: 15133768, 23143102]. Especially, SB transposon leaves canonical 5 bp insertion at integration sites so that the canonical 5bp insertion into coding sequence could disrupt the function of endogenous protein frequently. On the other hand, PB transposon remains no footprint. Therefore, excision-only-PBase can remove the PB transposon from mutant library clearly. Thus, it is no worry about that PB transposon disrupt non-targeted endogenous gene impair functional analysis if PB mutant library is treated with excision-only-PBase.

      In addition, we are going to conduct transposon removal by exicision-only-PBase treatment with several PB mES clones, for the proof of concept that CTRL-Mutagenesis can generate mutant library with no sgRNA cassettes.

      5, Most non-coding regions are located at autosomes. Genotyping would be very difficult or even impossible by the current PCR based strategy.

      → Thank you very much for your comment, and we agree with your suggestion.

      This is one of our issues.

      We expect that CTRL-Mutagenesis could be valid on other biallelic locus.

      Therefore, we raised predicted issue such as complex genotyping and proposed one solution.

      When we target other biallelic locus, we must determine whether the combination of mutations induced are cis- or trans-mutations. Haplotype phasing, combined long-read sequencing with SNP markers within ROI on maternal/paternal chromosome, assembles each allele via SNP markers on each read [PMID: 35710642]. Therefore, combining CTRL-Mutagenesis on heterozygotic alleles of cells derived from such as human or murine hybrid with haplotype phasing might simplify genotyping.

      We will add the following sentences:

      “In this study, CTRL-Mutagenesis was validated by genotyping on mono allele in male mES cells to avoid investigating whether the combination of mutations induced are cis- or trans-mutations. All genotypes on Mirc56 should be hemizygous and these mutations induced might be cis-mutations so that we determined the genotypes by amplifying approximately 200 bp around the target sites. However, we did not confirm large mutations such as deletion of the genomic region between target sites and inversion. Long-read sequencing might capture their large mutations. Besides, we also expect that CTRL-Mutagenesis could be valid for ROI on biallelic autosome and X chromosome in female. Therefore, it is required to determine whether the combination of mutations induced are cis- or trans-mutation. Haplotype phasing, combined long-read sequencing with SNP markers within ROI on maternal/paternal chromosome, assembles each allele via SNP markers on each read [PMID: 35710642]. Therefore, combining CTRL-Mutagenesis on heterozygotic alleles of cells derived from such as human or murine hybrid with haplotype phasing might simplify genotyping.” in the Discussion.

      Moreover, genome-wide NGS and nanopore Cas9-treated sequencing (nCATs) could also help us to read the mutations without PCR-amplification. However, both methods can obtain reads of target regions with low frequency. Therefore, it is difficult to perform multiplex samples for mutant library.

      *6, Fig 4C, large amounts of Cas9 independent EGFP positive cells suggest the current system is not efficient. *

      → Thank you very much for your comment.

      We cannot agree with your indication.

      In fact, by the cutoff set in Cas9-untreated cells, the EGxxFP system successfully selected at least 76 mutant clones (87.4%) harboring mutations within Mirc56_1 to Mirc56_13. Moreover, we could seed 180 single-cells for single cloning by FACS once.

      To enhance this point, we added the following sentences:

      “Moreover, at least 76 out of 87 PB mES clones have mutations within all analysed Mirc56_Xs (Figure 5C). Therefore, the EGxxFP system could selected ROI mutant mES clones efficiently.” following to “…depended on integrated sgRNA cassettes.” in the Discussion (page13, line 355)

      *Reviewer #4 (Significance (Required)):

      The authors claim "Functional analysis" in the manuscript title but there is no evidence of functional analysis in the manuscript.*

      → Thank you very much for your suggestion, and we agree with your suggestion.

      In this paper, we just validated PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions.

      To change our tone that claiming usability of our method for functional analysis, we will change the following sentences:

      • Change “to identify functionally important elements in non-coding regions” to “to induce diverse combination and variety of mutations within more than 50 kb non-coding region” in the Title.
      • Add “However, not much loss-of-function screens of non-coding regulatory elements has been conducted due to ambiguous annotations compared with protein-coding genes. Tiling mutagenesis has been employed to identify critical regions embedded in non-coding regulatory elements by comparative analysis through a mutant library harbouring subtly different regions mutated within less than 15 kb region. Conventional tiling mutagenesis construct a mutant library integrated multiple sgRNA cassettes by retroviral delivery. However, multiple integration of single guide RNA (sgRNA) cassettes has higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. Herein, combining tiling mutagenesis and PiggyBac transposon that can be removed with no footprint on integrated sites, we established an expanded tilling mutagenesis method named CRISPR- & Transposase-based RegionaL Mutagenesis (CTRL-Mutagenesis). We demonstrated that PiggyBac system could integrated diverse combinations and varieties of sgRNA cassettes.and then CTRL-Mutagenesis randomly induces diverse combination and variety of mutations within more than 50 kb non-coding region in murine embryonic stem cells. CTRL-Mutagenesis would apply for wider non-coding regulatory elements with no risk of non-targeted endogenous gene disruptions.” in the Abstract.
      • Delete “Comparative analysis of mutants harbouring subtly different mutations within the same region would facilitate the further study of cis-element and microRNA clusters.” in the Abstract (page2, line 38-40).
      • Change “The generated random mutant mES clone library could facilitate further functional analyses of non-coding regulatory elements within the genome.” to “The generated random mutant mES clone library could develop to investigate critical regions of non-coding regulatory elements within the genome.” In the Introduction (page4, line 88-90).
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Review Commons - Revision Plan

      Manuscript number: RC-2023-02228

      Corresponding author(s): Gatfield, David

      1. General Statements

      We are grateful to the three Reviewers for their detailed assessment of our manuscript and are delighted about their very constructive and positive evaluations, highlighting the study’s novelty and rigor.

      Briefly, the main points raised by Reviewers 1 and 3 do not involve additional experiments and are mostly about rethinking manuscript structure (e.g. moving data/analyses to the supplement or removing them altogether, as they distract from the main thrust of the story) and making the text overall less dense and more readable.

      Reviewer 3 also raises a number of additional interesting points that we should discuss in our manuscript, which would allow us placing our findings more effectively into the context of the existing literature.

      All these points are very well taken and will be implemented (see below, under 2).

      Reviewer 2 is overall also rather positive – speaking of “a very careful and detailed study that addresses an important issue” and the study being “really rigorous and the logic […] very well explained”; moreover, this Reviewer also shares the view of both other Reviewers that parts of the manuscript (i.e., in particular its beginning) should be shortened.

      Importantly, this Reviewer remarks in addition under “Significance”: “Without additional mechanistic insights suggesting that there is something particular different about the regulation of these mRNAs the manuscript is not of extremely high significance.” – an important point of criticism that we wish to address in our revision, as detailed below.

      2. Description of the planned revisions

      In the following, we detail how we plan to address the points raised by the Reviewers. The order in which we treat the points follows their – in our view – relative importance according to the Reviewers’ feedback. In particular the first item below, under (A), is the main point of criticism that we feel we should address carefully for the future revised version.

      (A) Major point raised by Reviewer 2: “However, the study falls short on addressing the mechanism of this regulation and if it is different of other feeding regulated mRNA oscillations. This diminishes the significance of the study unless additional mechanistic details are provided.” , which is cross-commented both by Reviewer 1: “More importantly, clues to the mechanism (e.g. iron, heme) regulating the rhythmic translation of IRP1 and IRP2 IRE-mRNAs in liver would increase the significance of the work.” as well as by Reviewer 3: “Reading the comment from Reviewer #2 over the lack of a mechanism to explain why only four transcripts with IREs amongst a larger pool are subject to circadian regulation by IRPs somehow reduces the significance of the study, one has to agree that a discovery - likely another component in the system - is wanting. I remain of the view that the present work exposes this "weakness" of the entire field in a global as opposed to a partial manner and in doing so, makes a significant contribution, especially by further sub-classifying the IRE-containing transcripts according to their responsiveness in the diurnal occupancy of their IREs.”

      Our response and revision plan: Indeed, in the original version of our manuscript we established the link to feeding, yet we did not pinpoint the precise molecular cue that could underlie the rhythmic regulation observed on certain IRE-containing mRNAs. We did discuss the molecular candidates quite extensively in the Discussion section of the manuscript (Fe2+; oxygen; reactive oxygen species), and it remains quite obviously the main question whether the observed diurnal control could be mediated directly by changes in intracellular iron availability.

      Of note, the preprint by Bennett et al., for which we cite the initial biorXiv version in our manuscript, was updated very recently (https://doi.org/10.1101/2023.05.07.539729 – see version submitted December 18, 2023). It now includes new data that analyses around-the-clock iron levels also in liver. Briefly, the preprint shows, first, that serum iron is rhythmic with a peak during the dark phase at ZT16 (Figure 1D in Bennett et al.) yet loses rhythmicity when feeding is restricted to the light phase (Bennett et al., Figure 2E), indicating both feeding-dependence and circadian gating. Moreover, liver total non-heme iron – quantified using a method that measures both ferrous Fe(II) and ferric Fe(III) – shows low-amplitude diurnal variations which, however, do not meet the threshold for rhythmicity significance (Bennett et al., Figure 3G). Still, the difference between timepoints ZT4 (lower iron; light phase) and ZT16 (higher iron; dark phase) is reported as significant, with a fold-change that is not very pronounced (not compatible with the observed direction of regulation of Tfrc mRNA, whose higher abundance in the dark phase would rather be in line with lower *cytoplasmic iron levels, as pointed out by the authors.

      Thus, at first sight the analyses by Bennett et al. would appear to answer part of the Reviewer’s question and point towards other mechanisms of regulation than iron levels themselves. However, it should be pointed out that the particular methodology for iron measurements used by the authors includes the use of reducing reagents and hence quantifies the sum of Fe2+ and Fe3+ iron. Large amounts of iron are stored in the liver in the form of ferritin-bound Fe3+, yet the bioactive, low-complexity iron that is considered relevant for IRP regulation is in the Fe2+ form. Therefore, the question whether bioactive ferrous iron levels follow a daily rhythm, compatible with the observed IRP/IRE rhythms described in our manuscript, still remains an open question and warrants a dedicated set of experiments that we are proposing to conduct in response to the Reviewers’ comments.

      Briefly, for the revision we propose to use liver pieces from the two relevant timepoints of our study (i.e., ZT5 and ZT12) and apply a method that allows the separate quantification of Fe2+ and Fe3+ (Abcam iron assay ab83366; this assay can be adapted to liver iron measurements, see e.g. PMID31610175, Fig. 4A). This experiment will provide novel and decisive data on the molecular mechanism that may regulate the IRP/IRE system in a rhythmic fashion and therefore add to the significance of our findings, as requested by the reviewers.

      Moreover, we believe that the outcome of the experiment would be very interesting either way, i.e. if we find rhythms in Fe2+ that are compatible with rhythmic IRP/IRE regulation, we would be able to provide excellent evidence in term of likely molecular mechanism and rhythmicity cue. If, by contrast, we find that Fe2+ is not rhythmic, it will point towards a mechanism that is distinct from simple Fe2+ concentrations.

      In the latter case, collecting additional evidence on relevant alternative molecular cues would be beyond our capabilities for this particular manuscript, as it would require quite sophisticated methodological setup and preparation. For example, one could imagine that measuring around-the-clock liver oxygen levels in vivo – another candidate cue – would be highly interesting, yet we would not be able to conduct these experiments in a reasonable time frame (to start with, we would first need to request ethics authorisation from the Swiss veterinary authorities, which would in itself take ca. 4-6 months before we could even start an experiment). Thus, in the case of non-rhythmic iron levels, we would leave the question of other responsible cues open, but still think that with a balanced discussion of the resulting hypotheses we could provide significant added value to our work.

      (B) Major comment raised by Reviewer 1: “Alas2 is expressed mainly in erythroid cells and not liver, whereas Alas1 is ubiquitously expressed. Therefore, it is possible that Alas2 in this study may originate from red cells/reticulocytes in the liver, and not from hepatocytes.”

      Our response and revision plan: We would like to thank the Reviewer for the comment that is indeed pertinent. It is well established that Alas1 is the main transcript encoding delta-aminolevulinate synthase activity in hepatocytes, and Alas2 is about 10-fold less abundant in total liver RNA-seq data (quantified form own RNA-seq data, not shown).

      We are nevertheless relatively sure that the Alas2 signal comes from low expression in hepatocytes; the best argument in support of this hypothesis is the analysis of single-cell RNA-seq data, as shown in the following Revision Plan Figure 1, which we would be happy to include in a revised version of the manuscript if the reviewers wish:

      (C) Minor comment raised by Reviewer 1: “The paper is dense and not easy to read. For example, the section on Tfrc regulation and NMD regulation is lengthy and perhaps not necessary for the paper and the section on "Previous observations in IRE-IRP regulation...." could be included in the discussion rather in than in the Results section. Some figures could be included in a supplement.” continued in Referee cross-commenting “I agree with Reviewer 2 that the first sections in the manuscript are lengthy and not needed.”; moreover, Reviewer 2: “Also, the manuscript first sections (which mainly describe negative results) seem too long and descriptive.”

      Our response and revision plan: We shall reorganize the paper accordingly, with the aim of making it an easier, shorter, clearer read. Many thanks for the input.


      (D) Minor comment raised by Reviewer 1: “A description of the new anti-IREB2 antibody is needed. What IRP2 sequence was used to generate antibodies?”

      Our response and revision plan: The following information will be included in the manuscript: “Rat monoclonal antibodies against ACO1/IRP1 and IREB2/IRP2 were generated at the Antibodies Core Facility of the DKFZ. Briefly, full-length murine ACO1/IRP1 and IREB2/IRP2 proteins, fused to a poly-histidine tag, were expressed in E. coli and purified on Ni-NTA columns using standard protocols. Purified His-tagged proteins were used to immunize rats and generate hybridomas. Hybridoma supernatants were first screened by ELISA against His-tagged ACO1/IRP1 and His-tagged IREB2/IRP2. As an additional control, supernatants were tested against full-length His-tagged murine ACO2 (mitochondrial aconitase), which shares 27 and 26% identity with ACO1/IRP1 and IREB2/IRP2, respectively. Supernatants reacting specifically with ACO1 or IREB2 were validated by western blotting using extracts from wild-type versus ACO1- or IREB2-null mice.”

      (E) Minor comment raised by Reviewer 1: “A model summarizing the data would be useful.”

      • *Our response and revision plan: Thank you for the suggestion – this will be done.

      (F) “Optional” idea raised by Reviewer 3: “One nuance in the field of circadian biology is that a rhythm is deemed to be genuinely "circadian" when it continues in the absence of zeitgebers. In this sense, although all experiments are valuable, the "collapse" of the rhythm in the paradigms where dietary rhythms have been disrupted makes the phenomenology a candidate "epiphenomenon" rather than being closer related to the biological clock(s). Likewise, in the manuscript we never learn how the liver IRE-binding activity behaves in constant darkness.”

      Our response and revision plan: This is an important aspect that we can clarify more specifically in our manuscript. It is true that constant (darkness) conditions are used to call a phenomenon circadian. We would nevertheless argue that for a rhythmic feature that is specifically found in liver, the constant darkness definition to distinguish circadian from non-circadian is not fully valid because even in constant darkness, the liver clocks are not in a free-running state but continue to be entrained by the SCN clock (it is only the latter that is free-running under these conditions).

      In our manuscript, we actually suggest that the observed rhythms are not a core output of the circadian machinery (Fig. 6 of our manuscript), but indirectly engendered through feeding rhythms, which are coupled to sleep-wake cycles and thus connect in an indirect way to the central circadian clock activity in the SCN.

      In wild-type mice we would therefore expect that irrespective of constant darkness or light-dark entrainment (and assuming ad libitum feeding), the hepatic rhythms of the relevant IRE-containing transcripts would persist in a similar fashion.

      (G) “Optional” idea raised by Reviewer 3: “Where the authors mention in a parenthesis "moreover, there are documented links between iron and the circadian timekeeping mechanism itself", I invite them to take a closer look to the paper Konstantinos Mandilaras and I coauthored in 2012 "Genes for iron metabolism influence circadian rhythms in Drosophila melanogaster". In that work, we showed that RNA interference of genes that are required for iron sulfur cluster formation (including on IRP1) in the central clock neurons of the fly result in loss of the circadian rhythm when flies were kept at constant darkness (not so when they were kept under light:dark oscillation). So this point should probably remain open..”

      Our response and revision plan: We would like to thank the Reviewer for pointing out this interesting connection that would fit well into the context of our manuscript. It should be cited in the context of our current Figure 3, where we measure in vivo and in tissue explants whether IRP-deficiency affects the clock itself.

      To follow Reviewer 3’s idea, we have gone a little further in our analyses of around-the-clock expression data to see if any of the components of the Fe-S assembly machinery is rhythmic itself, which could have the potential to add novel information.

      Briefly, we have used for this purpose our around-the-clock RNA-seq and ribo-seq data from PMID 26486724. In summary, we find that the expression at RNA and/or footprint level is non-rhythmic for the vast majority of genes involved in FeS biogenesis, assembly or transport, with the exception of low-amplitude rhythms for Glrx5 and Iba57 (Revision Plan Figure 2).

      By contrast, all of the following other genes are non-rhythmic throughout (list of Fe-S-relevant genes from PMID34660592): Cytoplasmic/nuclear, all non-rhythmic: Cfd1=Nubp2, Nbp35=Nubp1 , Ciapin1, Ndor1, Iop1=Ciao3=Narfl, Ciao1, Ciao2b=Fam96b, Mms19, Ciao2a=Fam96a; mitochondrial, all non-rhythmic: Iscu, Nfs1, Isd11=Lyrm4, Acpm=Ndufab1, Fdx1, Fdx2=Fdx1l, Fxn, Hspa9 Hsc20=Hscb, Abcb7, Alr=Gfer, Isca1, Isca2, Nfu1

      As these are mainly “negative results”, and as we are also unable to propose a solid possible mechanistic connection between the Glrx5 and/or Iba57 rhythms and the rest of the story of our manuscript, we do not intend to include such data in our manuscript, but are only putting it for the record into this rebuttal.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      NONE

      4. Description of analyses that authors prefer not to carry out

      NONE – we think we can address all points as described above.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02224R

      Corresponding author(s): Austin Smith

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank the reviewers for constructive comments and helpful suggestions which we have adopted to clarify and improve the manuscript. In addition, we have added a link to a web portal that will allow readers to visualise gene expression profiles and create their own plots using our early human embryo UMAP embedding (https://bioinformatics.crick.ac.uk/shiny/users/boeings/radley2024umap_app/). Stefan Boeing created this tool and is added to the author list with agreement of other authors.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript, Arthur Radley and Austin Smith designed a new feature selection method for scRNA-Seq, which is a successor to ESFW previously proposed by the same authors. As an evolution of this earlier framework, cESFW is also based on the idea that informative genes share information with other genes, whereas non-informative genes have a more random relative expression. The authors emphasize the key importance of feature selection in the scRNA-Seq workflow and assess the current state of the art for this step. They also propose that better feature selection leads to less data transformation. They show that cESFW outperforms Scran and Seurat feature selection in most cases of synthetic datasets. cESFW is then used in the context of early human development, re-analysing data from several published datasets where they show that they do not require batch correction. They also further strengthen the conclusion that a "2-step" model for TE-ICM and EPI-Hyp differentiation is also present in human embyros. Finally, they map several types of in vitro pluripotent stem cells, in particular primed and naive, to their manifold and study the evolution of the gene signatures during early human development. Overall, the manuscript is well written and presents a solid methodology. The re-analysis of human early development is convincing and justified. The main critic is that the quality of figures can be greatly improved: their resolution is too low and they are hard to read. For instance, more contrasted color schemes could be used to improve clarity, and given the high number of clusters for some UMAPs, indicating the name of some cluster near their centroids should improve clarity.

      We agree that the resolution of the figures should be improved. We had to compress the images to satisfy the size limit for uploaded documents to bioRxiv. Our final submission will be of higher quality (original figures are at 900dpi). With regards to colour schemes, this is a surprisingly difficult problem. We tried multiple colour palettes but could not achieve greater contrast. The suggestion to add key cluster names near to their centroids on the UMAPs is an excellent idea, which we have implemented.

      Comments: Page 2 I think the criticism of PCA is unfair because it is not a true feature selection method, and it is mainly used for computational purposes. I believe that for most workflows, between 30 and 50 PCs are retained, which do not significantly change the results in the downstream analyses. The citation (Yeung and Ruzzo 2001) does not seem appropriate, as they examine cases where only a small number of PCs are retained, outside the context of scRNA-seq.

      We agree that the criticism of PCA is insufficiently justified by the citation. We thank the reviewer for pointing this out and have removed the comment.

      "Furthermore, HVG selection has been found to be biased toward selecting highly expressed genes over low expressed genes." Could the author justify or remove this statement, as the Seurat and Scran methods are specifically designed to consider average expression to determine HVG? The cited article (Yip, Sham, and Wang 2019) raises this issue for methods other than Seurat and scran.

      The reviewer is correct that the provided citation highlights Seurat and Scran HVG selection as relatively insensitive to the average gene expression levels compared with other HVG selection methods. We again thank the reviewer and have deleted the comment.

      More generally, we have shortened the introduction, focusing on cESFW as a new approach to feature selection rather than critiquing alternative methods.

      Page 6 I might have missed it, but I do not understand the number of cells in the early human development dataset also shown in Figure S2B. The Petropoulos et al. dataset alone is larger than the sum of cells from different cell types. Is there some filtering step that is not described?

      We have added text in the data availability section to clarify the cells used in our analysis:

      “The pre-implantation raw counts scRNA-seq data from Yan et al. 2013, Petropoulos et al. 2016, Fogarty et al. 2017, and Meistermann et al. 2021, were compiled into a single gene expression matrix by Meistermann et al. 2021. For information regarding quality control and cell filtering of these 4 datasets, please refer to Meistermann et al. 2021.”

      The unsupervised clustering used to annotate cell types is unconventional (especially with the high number of clusters chosen), which is not a problem, but should be clarified. Improving the figure 3D to make it clearer and providing a cell cluster correlation plot might help to better appreciate the relationship between cell types.

      We agree that the gene expression heatmap in figure 3D contributed little to the interpretation of the data/results. As suggested, we have replaced this heatmap with a cell cluster correlation plot to help appreciate cell state similarities. (Changes in figure 3.)

      It could be emphasized that the ICM/TE branch cell type is a major difference with the mouse topology, as the readers might not be aware that the ICM/TE is an unspecified blastocyst state that only exists in humans.

      There appears to be some misunderstanding around the use of “ICM/TE branch”. The cluster comprises an uncommitted population at the branching point from morula to either ICM or TE, as also described in the mouse embryo. We have adjusted the discussion to make more clear that the two branching point clusters are heterogeneous populations, not unitary cell types or states:

      “The branching populations reside at critical junctures in blastocyst formation, the partitioning of extraembryonic and embryonic lineages. These branchpoint clusters do not define unitary states. On the contrary, cells in these clusters are heterogeneous and may become specified to alternative fates. For example, PDGFRA, a hypoblast marker (Corujo-Simon et al. 2023), and NANOG, an epiblast marker (Allegre et al. 2022), are heterogeneously distributed in the Epi/Hyp branching population. Furthermore, branch cluster boundaries extend beyond the topological bifurcation, potentially indicating that cells remain plastic and may be redirected. This would be consistent with the demonstration in mouse embryos that cells expressing ICM genes remain capable of generating TE up to the late 32-cell stage (Posfai et al. 2017).”

      Page 9 To further substantiate the stepwise ICM/TE and EPI/PrE specification events, authors could project cells from each embryo on the UMAP, and analyze what are the co-occurrence of cells (as performed for instance in Meistermann et al 2021). This should show as reported (and cited by the authors) that some GATA3 positive cells (TE fated) start appearing from late morula stage and that ICM cells almost never co-exist with EPI nor Hyp in embryos.

      We appreciate this suggestion. We have generated the requested plots showing where cells from individual embryos at different developmental timepoints are positioned on our UMAP embedding. (new supplemental figure (New figure, Figure S6). We present a summary heatmap of cell co-occurrence in revised Figure 4. These results offer greater insight than the RNA velocity analysis, which we have moved to supplemental Figure S6. We have added discussion of these analyses in the “Lineage branching blastocyst development” Results section.

      Reviewer #1 (Significance (Required)):

      The presented methodology shows significant value especially in the field of scRNA-Seq, where the critical step of feature selection is often inadequately addressed. Furthermore, this field is characterized by a limited set of feature selection methodologies. cESFW appears to be an important alternative to HVG methods that could improve scRNA-Seq analysis in certain contexts.

      The new findings on early human development are somehow incremental, but a welcome addition to solidify the two-step model and refine the concept of reject cells. The audience for this early development context is specialized, but cESFW will most likely have an impact to the entire field of scRNA-Seq analysis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Here, Radley and Austin present a novel approach for feature weighting in scRNAseq data based on entropy sorting. Feature selection is a central part of scRNAseq analysis, and it is most likely the case that there is no single approach that outperforms all others across all datasets. Hence, innovation in this space is needed for the field. The cESFW method presented here has several appealing properties from a theoretical point of view, and it also performs well on the synthetic and real datasets considered. Nevertheless, there are several major issues that need to be addressed before I can recommend the manuscript for publication:

      1 The original entropy sorting (eq 1 in SI 1) is based on only two discrete states. However, calculating entropy for continuous distributions can be more tricky and it is unclear to me what assumptions are made regarding the gene expression. Could the authors clarify what properties of the distribution are required for the updated ESE equation to be valid? Is the only assumption that values are drawn from the [0, 1] interval? What happens if values are highly skewed, ie forming a bimodal or power-law distribution rather than something close to a uniform distribution?

      We agree that it is beneficial to clarify these points. We have added a section titled “Assumed properties of underlying sample distributions” to the supplemental information. Briefly, we show that the ESS correlation metric is directly linked to the commonly used correlation metric, Mutual Information (MI). A desirable properly of MI is that it is able to capture non-linear/skewed relationships between features. The ES framework and ESS share this property with MI, allowing the ES framework to be relatively robust to presence of non-uniform distributions.

      The main assumption for applying ES is that the features can be meaningfully scaled between values of 0 and 1. For gene expression, an intuitive way of achieving this is to inspect each gene and designate 0 count values as having 0 expression activity, and the maximum counts as having activities of 1, and all values in between existing within the [0,1] interval. A useful property of ES is that we do not need to assume a particular shape or distribution of the samples within the [0, 1] interval. The ES framework is non-parametric and does not require an assumed distribution to calculate the conditional entropy (CE), even in the continuous form. This is possible because the ES framework is formulated by turning the probabilistic form of CE into an ordinary differential equation (ODE), where the only dependent variable, x, is the overlap between the minority state activities of each individual sample. This calculation is explicitly identifiable/calculable, and is permutation invariant, meaning the shape of the distributions of a reference feature (RF) and query feature (QF) does not need to be assumed/defined. In other words, the ES framework quantifies to what degree active expression states enrich/overlap with one another in a manner that is robust to different distribution shapes.

      2 How robust is the procedure for the choice of percentile for normalizing the gene expression scores? Does one get roughly the same results for 90-99th percentile or is it sensitive to this choice?

      We have carried out a sensitivity analysis on the choice of percentile for each of the synthetic datasets and added it to the manuscript. (New figure, Figure S11). We find that on each of our 4 synthetic datasets the final results of cESFW are robust to a wide range of normalisation percentiles.

      3 Similarly, I am concerned about the procedure for how to choose the number of significant genes. How robust is this process? Also, it is not altogether clear how to generalize the procedure outlined on p19. Most potential users would benefit from more quantitative guidelines. In particular, having to rely on interpretation of GO terms typically requires a considerable amount of understanding about the system at hand which could make it challenging to apply the procedure for others. For most users it would be helpful to know how robust the procedure is to this step and also if there could be more stringent guidelines for how to decide which genes to include.

      We understand the reviewers concern regarding the robustness of feature selection on real scRNA-seq datasets. We have now applied our cESFW workflow to peripheral blood mononuclear cells (PBMC) scRNA-seq data, and found cESFW feature selection to be comparable, and by one metric more robust, than Seurat and Scran HVG selection (New Figure S2).

      As cESFW is applied to more scRNA-seq data, we will learn more about how results compare to highly variable gene selection, and how workflows may be adapted to optimise results in different scenarios. For example, we have found that supervising the selection of gene clusters using a small set of markers known to be important in the system of study can help identify which clusters of genes should be retained during gene selection. We have added this to the materials and methods with the following paragraph:

      “Furthermore, we suggest supervising the selection of gene clusters using a small set of markers known to be important in the system of study. In this work, we found that genes known to be important during early human embryo development (FigS4) are enriched in the dark blue cluster of genes, further suggesting that this cluster of genes is more likely to separate cell type identities in downstream analysis.”

      While gene cluster selection supervision in this manner requires a degree of domain expertise, we believe this is not unreasonable for most applications, and is the case for many scRNA-seq analysis pipelines.

      Our primary software contribution is the cESFW algorithm which calculates the ESS and EP matrices. With this manuscript we provide 6 commented workflows for applying cESFW to different datasets (4 synthetic data, human embryo data, PBMC data). We believe these workflows provide a good balance of documented use cases and user flexibility for cESFW usage. This is important because it is advantageous to be able easily to adapt workflows to incorporate domain expertise and different methodologies. Although workflows such as Seurat and Scran are user-friendly, their rigidity can be difficult when wanting to deviate from their standard workflows. In summary, we believe that our provided workflows are suitable for users to implement cESFW, while providing the flexibility to apply adapted pipelines.

      4 The comparison of the clusterings on p6 is not really fair is it? If I understand it correctly, the 3,012 genes identified by cESFW was used to define clusters in fig 3c through unsupervised clustering. The authors then use HVG methods to identify 3,012 genes and then carries out clustering based on those. To evaluate the methods the silhouette score is used, but the labels from the cESFW clustering is used as ground truth. This does not sound like a fair way to compare. Could the authors please clarify, and if needed come up with an approach where the three methods have a more level playing field if needed.

      The reviewer raises a fair point regarding the comparison of cluster identities and ranked gene lists. This issue is a chicken and egg problem, in that we require a baseline to benchmark different methodologies but lack an explicitly defined ground truth. For that reason we used synthetic datasets for initial comparison.

      For the human embryo data, we have presented substantial evidence that our cluster annotations are biologically coherent and consistent with prior knowledge. We therefore consider it legitimate to compare the ranked lists of Seurat, Scran and cESFW. However, we acknowledge the potential bias and have mentioned this in the “Limitations of the study” section.

      In addition, we have now analysed the peripheral blood mononuclear cells (PBMC) scRNA-seq dataset that is used in the tutorial workflows of Seurat and Scran. This PBMC dataset is arguably better defined since it has more discrete populations of cells, and by using the Seurat generated cell type labels we bias the analysis towards Seurat rather than cESFW. The results show that cESFW performs comparably to Seurat and Scran, and that the cESFW ranked gene list may be more stable than Seurat and Scran. These results suggest that cESFW can be widely applicable as a suitable alternative for feature selection. We have included this analysis in the Results and as a supplemental figure (New figure, Figure S2).

      5 The main cESFW.py file in the github repository is clearly well structured and commented. However, I would like to see a much better documentation so that one does not have to go through the source code to understand what functions there are and what they do. In particular, I would like to see a vignette to make it easier for others to incorporate cESFW into their workflows.

      We thank the reviewer for the positive comments regarding our cESFW.py commenting. We accept that our initial submission failed to point the reader directly towards our example workflows that provide step by step, well commented vignettes for using cESFW to analyse scRNA-seq data. In our initial submission we provided 5 workflows (4 synthetic data and the human embryo data), and in the re-submission we have added a workflow for analysing PBMC data. We have updated our cESFW Github to guide users to these example workflows (https://github.com/aradley/cESFW/tree/main).

      Please note, the embryo workflow will be easily accessible through GitHub, whereas the synthetic data and PBMC workflows will be provided through a Mendeley data link (referenced in the manuscript and on our GitHub). However, the content of the Mendeley link cannot be made public until the paper is finalised, as it cannot be changed after publication. We provide a temporary public Dropbox link for the reviewers so that they may access the additional workflows (https://www.dropbox.com/scl/fo/xr5o9xm6490ftjsa55wxg/h?rlkey=maindrxwdqnirsw1en3my5qsr&dl=0).

      Minor:

      Why are the figures not always in order? For example, fig S10 is mentioned before fig S2 on p 6

      Thank you for pointing this out; we have amended the text.

      I am not sure if the indexing in eq 1 (p 18) is correct. j is both on the LHS and it is also being summed over on the RHS. Should one of these be i instead?

      The indexing is correct. Each column j of a matrix refers to gene/feature on the RHS, and in the calculation on the RHS we take the column averages, leading to vector on the LHS that is still indexed by genes/features j. We have clarified this in the text.

      Reviewer #2 (Significance (Required)):

      The work presents a new method for feature selection in scRNAseq. Feature selection is a very important step and can have a big impact on findings. The method presented here is theoretically sound and it seems to provide interesting result when applied to early embryo development. However, as cESFW is only tested for one dataset it is unclear how well the method generalizes to other problems and datasets.

      Appreciation of the utility of cESFW will grow as it is applied to more datasets. However, we would like to highlight that the human embryo dataset consists of 6 independent scRNA-seq datasets from different laboratories, and that cESFW was able to identify common and differing structure between them without any batch correction, smoothing or feature extraction. We have added to our summary that we propose cESFW may be best suited to analysis of transcriptome trajectories in time course and developmental data. However, we have also now performed comparison of Seurat, Scran and cESFW feature selection in a different context, using a reference PMBC scRNA-seq dataset. The results demonstrate that cESFW is a viable alternative for feature selection in that static system also (New figure, Figure S2).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for their thoughtful and constructive comments. The changes to the text and figures made in response to the questions raised have made this a clearer and stronger manuscript. The additional citations suggested by the reviewers helped to further anchor our study within the growing literature on facultative parthenogenesis. Below we have responded to each comment in blue. We have added new data to the manuscript (Fig. 4C, Fig. S10B and Fig. S10D).

      Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      1. Summary: Here Ho et al. provide strong molecular evidence for the production of facultatively parthenogenetic whiptail lizards, through a gametic duplication. As evidenced through multiple routes, including microsatellites, WGS, RADseq, and RBC ploidy, and lines of evidence from multiple specimens, this study is timely in furthering our understanding of the mechanisms underlying FP. The findings are conclusive.

      That said, I have several comments that should be addressed prior to publication. The introduction which addresses FP in other systems fails to cite several key studies that provide strongly molecular support for terminal fusion automixis. Similarly, the study pushes the idea that this is an adaptive trait, however without proving that the parthenogens can themselves reproduce, this is a moot point at this stage.

      That said, my comments are minor. I found this to be an excellent study, well written, comprehensive in methodology, and one that I strongly advocate for publication.

      We thank reviewer 1 for referring to our manuscript as an excellent study and strongly advocating for its publication. We concur with his/her points that evidence for automixis in other systems was not sufficiently referenced and that the adaptive trait hypothesis for FP is somewhat speculative. The text has been modified accordingly (see below).

      Major comments - None.

      Minor Comments: Should be addressed.

      Line 36 - However, data that supports terminal fusion are no longer restricted to microsat data. Studies utilizing RADseq and whole-genome sequencing in snakes and crocodiles have now provided further evidence supporting terminal fusion.

      See: Booth et al. 2023. Discovery of facultative parthenogenesis in a new world crocodile. Biology Letters. 19, 20230129.

      Card et al. 2021. Genome-wide data implicate terminal fusion automixis in king cobra facultative parthenogenesis. Scientific Reports. 11, 1-9

      Allen et al. 2018. Molecular evidence for the first records of facultative parthenogenesis in elapid snakes. R. Soc. Open. Sci. 5, 171901.

      We have now included that automixis in other systems is supported by both microsatellite and NGS data in the abstract of our manuscript. The references have been included in the main text.

      Ln 42 - Evidence suggesting that isolation from males was not a pre-requisite for FP has previously been reported in snakes.

      See: Booth et al. 2011. Evidence for viable, non-clonal but fatherless Boa constrictors. Biology Letters. 7, 253-256.

      Booth et al. Facultative parthenogenesis discovered in wild vertebrates. Biology Letters. 8, 983-985.

      Booth et al. 2014. New insights on facultative parthenogenesis in pythons. Biol J Linn Soc. 112, 461-468.

      Despite the prior evidence to the contrary cited by the reviewer, it is still a commonly held belief among scientists and science journalists that isolation from males promotes or triggers FP. We have placed our findings in the context of other studies, including those mentioned above, that came to the same conclusion that isolation from mating partners is not a requirement for FP. We thank the reviewer for the additional citations, which are now included in the discussion section.

      Ln 48 - Is this really an argument. While an immediate transition to homozygosity will purge some deleterious alleles, given the genome-wide nature of this, there will also conversely have been strong selection for mildly deleterious alleles.

      Even though many FP animals have congenital defects, our data, combined with that of others, show that seemingly healthy animals arise as well. Even if these healthy animals harbor slightly deleterious alleles, the most detrimental alleles would have therefore been purged especially for subsequent generations. We have modified the abstract to be clearer: “Conversely, for animals that develop normally, FP exerts strong purifying selection as all lethal recessive alleles are purged in one generation.”

      Ln 56 - I would recommend the inclusion of both Allen et al. 2018. R. Soc. Open Sci, and Card et al. 2021. Sci Reports, here, as they are members of the elapids, not represented in the other examples.

      These two citations have been added.

      Ln 60 - Recent studies have highlighted the significance of sperm storage in reptiles. For example, Levine et al. 2021. Exceptional long-term sperm storage by a female vertebrate. PLos ONE. 16(6).e0252049, describe the storage of sperm by a female rattlesnake for ~70 months, with two instances of its utilization to produce healthy offspring during that period. Clearly, molecular tools are providing both support for long-term sperm storage, and an understanding of its utilization.

      Recent work has indeed provided new evidence for instances of long-term sperm storage and the two mechanisms are no longer competing hypotheses, but it is clear that both mechanisms exist in nature. We have modified the text accordingly to include “Nevertheless, clear examples of long-term sperm storage have also been documented in the recent literature (29), underscoring the need for molecular methods such as MS analysis or sequencing data to elucidate the underlying mechanisms.”

      Ln 68 - American Crocodile would also be suitable to include here.

      This has now been included in the list of examples of endangered species.

      Ln71 - The problem with this hypothesis is that parthenogens produced through FP tend to have very low viability. For example, Adams et al. 2023. Endangered Species Research, follow a cohort of sharks produced through FP and all survive. Similarly low levels of survival are reported across other systems for which FP was reported. More likely, FP is simply a neutral trait. The mother is not negatively impacted through producing parthenogens and can go on to produce sexual offspring. Few instances report successful reproduction of a parthenogen. See pers. Comm in Card et al. 2021. And Straube et al. 2016.

      We thank the reviewer for the comment and agree that more data on the successful reproduction of parthenotes are needed to claim that FP is an adaptive trait. We have modified the text to include that studies on “the successful reproduction by FP offspring” are needed to support this hypothesis and have included the Straube et al. 2016 citation. We decided to omit the Card et al. 2021 citation as the reports of second-generation FP was through personal communication mentioned in this study and the results themselves have not yet been published.

      Ln 79 - I doubt that there is a desperate need for this for conservation. However, I think there is a need to simply further our understanding of basic biological function, given that it is not uncommon, and is phylogenetically widespread in species lacking genomic imprinting.

      We agree that understanding FP as a basic biological function is important in light of the realization that it occurs more commonly than previously thought. We have added this aspect to the text: “A better understanding of the triggers and molecular mechanisms underlying FP and the fitness of the resulting offspring are therefore needed in a variety of contexts. These include: to understand a fundamental biological mechanism and its significance in vertebrate evolution, to aid in conservation efforts including captive breeding programs, and to possibly harness FP in an agricultural context (28).”

      Ln 85 - It would be worth citing Card et al. 2021., here given that they used genome-wide ddRAD markers to show support for terminal fusion.

      The citation has been added.

      Ln 91 - Better citations here are Card et al. 2021. Allen et al. 2018, and Booth et al. 2023, which all utilize either RADseq or WGS.

      These citations have been added.

      Ln 95 - The conclusion of genome duplication here was supported only by a small number of microsatellite loci. As such, given that terminal fusion has been supported through genome-wide markers in other species of snakes and crocodiles, the conclusion of genome duplication is likely incorrect.

      In light of the other examples that show terminal fusion in snakes, we have removed this sentence.

      Ln 96 - I would strongly disagree with this statement. Allen et al. 2018, Card et al. 2021, Booth et al. 2023, all provide evidence of heterozygous loci and thus support terminal fusion. While no species-specific chromosome level reference genome is available for any of these species, the fact that levels of heterozygosity are below 33% percent supports terminal fusion. Rates over 33% support central fusion, but have not been reported in any vertebrate to date. AS such, I would recommend the removal of this statement.

      We agree that the studies listed by the reviewer all support terminal fusion in snakes and crocodiles and therefore, we have removed the statement.

      Ln 121 - Recent work in Drosophila mercatorum and D. melanogaster suggest that three genes play a role in the activation of FP in unfertilized eggs. In this case, through the fusion of meiotic products. That said, it is plausible to assume that FP in these lizards has an underlying genomic mechanism that is not related to isolation from males. See Sperling et al. 2023. Current Biology. 33, P3545-P3560.E13.

      Clearly isolation from males is not a key trigger in FP in whiptail lizards and other vertebrate species. With recent work from Sperling et al. 2023 and the fact that selection has led to increases in parthenogenesis in birds, an underlying genetic mechanism may well be at play. We have cited and addressed this in the discussion and propose identifying the genetic basis for FP in whiptail lizards in future studies.

      “Recent work identifying key cell cycle genes inducing FP in two species of Drosophila (71) and selection resulting in higher incidences of parthenogenesis in birds (24, 33) suggest a genetic basis for the initiation of FP. [...] Additional whole-genome sequencing data for species with documented FP will aid in the understanding the genetic basis, propensity, and evolutionary significance of FP.”

      Ln 126 - While these data strongly support FP of the two unusual A. marmoratus appearing offspring, can long term sperm storage be ruled out. Either through captive history or allelic exclusion of other males in the group?

      We have added the following sentence to the text: “Given that all of these offspring are female, inherited only maternal alleles, and animal 122 had no history of being housed with a conspecific male during its lifetime, both interspecific hybridization and long-term sperm storage are all but ruled out and FP is strongly supported.”

      Ln 171 - 191 - Given that the topic of this manuscript is the genomic mechanism underlying FP in this species, are these data necessary? These are not discussed later and as such I would recommend that they are moved supplemental material. Otherwise, they simply clutter that manuscript and detract from the key question. Indeed, they are important to show that the genome constructed is of high quality, but online Supp Mat is the place for that here.

      We chose to keep this section in the main text for the following reasons: There is still a lack of published reference quality genomes for many reptile species and therefore we want to highlight that this A. marmoratus reference adds not only to the understanding of FP, but also expands the small list of reptile genomes and makes the first Aspidoscelis genome available to the community. The high quality and contiguity of the genome (as indicated by the high N50 value and BUSCO score) is important to emphasize in the main text because the absence of any heterozygous regions in FP animals supports a mechanism of post-meiotic genome duplication. We would not want to bury these key points in the supplement.

      Ln 296 - Comparable estimates were made for parthenogenetic production in wild populations of two North American pitviper species. See Booth et al. 2012. Biology Letters.

      In Booth et al. 2012, 2 out of 59 litters of the two pitvipers (3.39%) were identified to contain FP offspring and these results are very similar to our reported rate of FP in whiptail lizards. We have now included this similarity in our discussion. “Interestingly, these rates are similar to what has been reported for wild populations of two North American pitviper species (10)”.

      Ln 312 - Again, can this really be suggested? Above, the authors state that most FP animals that hatched had congenital defects, and a large number failed to hatch. This does not sound like strong support for generating individuals that counter the effects of population bottlenecks and inbreeding depression. The authors need to take this study further and monitor the long-term viability of the FP individuals that survive.

      We agree with the reviewer that the adaptive advantages of FP reproduction are dependent on the fitness and reproductive potential of FP offspring and present data is insufficient to clearly support this notion. We have modified the text to include that long-term studies are needed to support or refute this hypothesis: “However, support for this hypothesis is predicated on the fitness and reproduction of FP offspring and therefore more long-term studies on seemingly healthy individuals of FP origin are needed.”

      Ln 348 - To be able to provide support for this, you need to track animals long term to understand their reproductive competence, and that of their offspring.

      We have added the text: “To assess whether the co-occurrence of sexual and FP reproduction in vertebrates can indeed be considered a reproductive strategy rather than biological noise will require further studies to assess the reproductive competence and fecundity of offspring produced by either mode of reproduction.”

      Ln 358 - But, the caveat is that the parthenogens must themselves reproduce. This must me stated.

      The statement that parthenogens must be able to reproduce to support a hypothesis of FP as an adaptive trait has been added: “One must now consider the possibility that FP is an adaptive trait and that low rates of successful FP could contribute significantly to genome purification. Such a role for FP hinges on further studies demonstrating the ability of parthenogens to reproduce themselves either through further FP or sexually.”

      Ln 359 - Note that FP can also fix mildly deleterious alleles. Only if it is strongly deleterious will it be lost.

      We now make it clearer that selection only applies to strongly deleterious alleles.

      Ln 361 - See above comments.

      We have modified the text to include that “FP offspring will have low genetic load and only pass on neutral and mildly-deleterious alleles to the next generation.”

      Reviewer #1 (Significance (Required)):

      1. Significance:

      While reports of parthenogenesis have been reported as far back as the early 1900's, it has only been over the last decade that reports are become common. Such that facultative parthenogenesis is no longer considered a rarity, but is recognized now as being relatively common and phylogenetically widespread in species that lack genomic imprinting - particularly reptiles, birds, and sharks. Reasons for this are both an increased understanding that the trait can occur, hence recognizing it as an alternative mechanism to long-term sperm storage, and the ease of using molecular approaches.

      The fundamental questions of recent times have been understanding the mechanisms driving FP. Recent papers utilizing whole genome sequencing and ddRADseq have provided support for terminal fusion automixis in snakes and sharks. Here, this study provides evidence of gametic duplication in whiptails, a mechanism with an alternative outcome in regards to the levels of retained heterozygosity. As such, this study compares to the recent work of Card et al. 2021 (Scientific Reports), and Booth et al. 2023 (Biology Letters), in providing substantive advances in the field.

      The audience for this will be broad. Parthenogenesis is a fascinating topic that attracts significant media attention. See the Altmetric score of recent papers on the topic, particularly Booth et al. 2023 (Altmetric score - ~3100). As such, the study will be of interest to both a broad readership, but will also be of great significance to a specialized group working on parthenogenesis. All round, an excellent paper that has promise to advance the field.

      We thank reviewer 1 for this positive assessment and for putting our work into context.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The researchers bring together microsatellite and whole-genome sequencing data from long-term laboratory cultures of lizards to discover occasional production of parthenogenetic offspring by several species of otherwise sexually producing whiptail lizards ("facultative parthenogenesis, "FP") and to show that these FP-produced lizards have patterns of genomic homozygosity that are incompatible with currently held assumptions about mechanisms of FP. Instead, the FP lizards seem to have been produced by a mechanism that results in almost complete homozygosity, likely a consequence of post-meiotic duplication of genomes from haploid unfertilized oocytes. They also show that FP offspring were produced by females housed with males and along with sexually produced offspring, counter to prevailing assumptions that FP offspring are only produced in situations where mates are not available. Many of the FP-produced offspring did not survive to hatching or had major abnormalities, consistent with a situation where this high homozygosity exposes harmful alleles. Finally, the authors used reduced-representation sequencing (RAD-seq) to survey heterozygosity in 321 wild-collected whiptail lizards from 15 species, showing evidence for strikingly low homozygosity in at least one individual and perhaps up to 5, consistent with the potential for FP in nature. These data are of broad interest in demonstrating several exciting new possibilities. Most importantly, the data hint at a different mechanism of FP than previously assumed, and one that causes immediate near-complete homozygosity. This scenario would likely lead to immediate purging of harmful recessive alleles. If the selective load of this purging wasn't insurmountably high, a lineage with a history of purging could produce FP offspring of relatively high fitness. Other exciting possibilities suggested by the data include the existence of FP even in a setting where mating occurs and in natural populations, versus just captivity.

      Major Comments:

      I found it difficult to impossible to sort out exactly what the researchers did and with what lizards. For example, in line 107, they refer to a "systematic MS analysis" for all individuals of gonochoristic species in their laboratory, but where are these data? Indeed, at this early spot in the paper, the introduction from here on out suddenly reads like a discussion. What would be better here would be to summarize what was known and wasn't known about the system and questions involved, why gaps in knowledge were important, and what the researchers actually did for this paper. In my opinion, the paper would be a much easier read if the researchers left the results and interpretation for later in the paper.

      As a consequence of the reviewers’ comments, the text of the manuscript has undergone major revision, and we trust that reviewer 2 will find this new version far more accessible. The MS data collection of more than 1000 individuals is the subject of another ongoing study and was only mentioned peripherally here to put the identification of FP into context. As most of the MS data relates to gonochoristic reproduction and interspecific hybridization, we are only presenting the data that are directly relevant to this manuscript as part of this study. To our knowledge, there is no common repository to upload raw MS data, but we have provided the data for the FP animals and controls discussed in this paper in the Github repository (see section “Data availability”).

      Even with this suggested fix, however, the data are still too inaccessible and analyses too opaque. For example, in line 202, a critical definition is laid out regarding heterozygous sites as those having "equal support" for two alleles. What do the researchers mean by "equal support"? My presumption is that this is something about equal or close to equal numbers of reads, but this definition needs to be spelled out and justified because it underpins much of the downstream analyses. A similar problem occurs in line 208-209, where the authors make a statement about limiting further analysis to positions in the genome where the coverage is "equal" to the mean sequencing depth.

      We have changed the text to “we defined heterozygous sites as those having two alleles supported by an equal number of reads. This stringent requirement was chosen to limit the search to apparent heterozygous sites with strong support, decreasing the chance of false positives.”. We further look at only sites where the coverage is equal to the average sequencing depth to exclude regions where over-assembly and collapse of repetitive elements would artificially increase the coverage.

      Another data/analysis issue emerges with the components of the manuscript that deal with mixoploidy. As far as I can tell, these data come from one sexually produced lizard, one FP A. marmoratus, and one FP A. arizonae. While the reports of bimodality of nuclear size are certainly interesting, the data and discussion are no more than an anecdotal case study in the absence of careful replication across multiple FP lizards and comparison to sexually produced lizards. Without these data, the conclusion that “Animals produced by facultative parthenogenesis are characterized by mixoploidy” (Figure 4 caption; also see lines 324-331) is far too strong.

      We have added animal IDs to figure legends 4 and S10 to clarify that these erythrocyte staining come from two FP A. marmoratus, and one FP A. arizonae. In addition, imaging from two sexually produced control animals (1 A. marmoratus and 1 A. arizonae) have now been included in S10 (as S10B and S10D). We also have included an extra panel of flow cytometry data (new Figure 4C) as a complementary methodology for ploidy determination. Both imaging and flow cytometry support similar amounts of haploid cells. With the additional data and clarification, we hope that the reviewer agrees that the observations of mixoploidy are well beyond “anecdotal”. Nevertheless, we have changed the title for Figure 4 to “Detection of mixoploidy associated with facultative parthenogenesis.” We hope that our observations here will indeed inspire future studies to see if mixoploidy is a widespread phenomenon in FP outside of whiptails as indicated by earlier work in birds.

      I had a similar reaction to the discussion of developmental abnormalities and embryonic lethality of embryos of FP origin presented in lines 263-281 (also lines 307-309). What is the baseline level of such abnormalities and the frequency of lethality in sexually produced eggs/embryos/hatchlings, and especially those produced via inbreeding? These comparisons are needed to interpret the significance of the patterns observed in the FP eggs/embryos/hatchings. Analogously, the comparison of the ovaries and germinal vesicles from one FP individual relative to one sexual individual do not tell us anything nearly so definitive as the text in lines 279-281 (also see Fig. S12 title, which is too broad of a conclusion for N = 1). This overly ambitious conclusion also underpins the discussion regarding the potentially adaptive nature of FP with respect to genome purification (lines 341-363; also see lines 47-50). If FP does not actually increase the rate of purging in FP lizards relative to inbred sexual counterparts (sounds like inbreeding is common from line 339), it seems less likely that we can view FP as adaptive at least from this perspective.

      We have now included a comparison between defects seen in sexually produced animals vs FP animals: “six out of 16 FP animals (37.5%) hatched with no discernable developmental defects (Fig. S11A-B). This is in stark contrast to sexually produced animals, where over 98% of hatchlings showed no abnormalities. Additionally, most of the defects noted in sexually produced animals were less severe than in FP animals including bulges in tails or truncated digits.”

      We agree that our statement on the lack of differences between sexually produced and FP animals was too general. We have modified the title of Fig. S12 from “No differences between ovaries and germinal vesicles of Aspidoscelis marmoratus produced by facultative parthenogenesis or fertilization” to "Ovaries of Aspidoscelis marmoratus FP animal 8450 and germinal vesicles of FP sister 8449 revealed no differences in structure and anatomy compared to fertile sexually reproducing animals.” Due to instant complete homozygosity, FP would indeed have a higher rate of purging than inbreeding. While one hypothesis is that FP is adaptive (in large enough populations), our intentions were to highlight the alternative that FP could be detrimental in smaller populations (that already would likely experience high inbreeding rates). We would expect inbreeding to not be common in whiptails relative to other lizards given that they tend to have large population sizes and actively range across generalist habitats.

      A final data concern is with the use of liver tissue for whole-genome sequencing and reference genome assembly (lines 389-390) and then using these data and the reference genome to make conclusions about ploidy/coverage. Liver tissue is very commonly endopolyploid, meaning that coverage could be artificially high for animals for which liver (vs. tail) tissue was used for DNA extraction. In particular, it would be helpful if the researchers consider whether endopolyploidy could have affected their ability to make accurate estimation of coverage and thus, heterozygosity, when libraries generated from diploid (tail) tissues are aligned to a reference genome generated from a polyploid tissue as was done here.

      This is an interesting point and indeed hepatic cells in various organisms have been documented to be polyploid. The proportion of polyploid cells though vary and as far as we are aware, all published studies on polyploid hepatocytes are in mammals (DOI: 10.1016/j.tcb.2013.06.002). Reference genomes have been generated from a variety of tissue sources and liver is commonly used. As most assemblies are for haploid genomes, polyploidy (unlike aneuploidy) does not impact the assembly quality. The reference genome was also from an animal of FP origin and therefore has genome-wide homozygosity that aids in a more contiguous genome assembly by eliminating the phasing problem. For the 10 animals sequenced, genomic DNA was derived from liver for three animals and the rest from tail tissue. The sequencing data generated from either liver or tail resulted in similar coverage levels (Figure S6) and similar levels of heterozygosity (Figure 2A). Minor Comments:

      Line 410: Please explain why the BLAST cutoff was changed from the default.

      The BLAST cutoff was changed from the default 1e-03 to 1e-06 to be more stringent and thereby increase confidence in the BUSCO results.

      Lines 441-443: Please explain why this dataset was seemingly larger than expected.

      Animal 122 was sequenced on one flow cell without any multiplexing with other samples and therefore yielded more reads than other animals sequenced. We subsampled the reads from this animal for analysis, so it is directly comparable with the other WGS data.

      Line 510: The link to the Github repository was broken, so I was unable to access the code and data denoted as available here.

      We apologize for the unavailability of the link at the time of review. Review Commons did not request a reviewer token. The repository will be made public upon journal acceptance. We would be happy to provide a reviewer token in the meantime upon request by Review Commons.

      Figure 1, and other figures featuring comparisons of MS data across parents and offspring: The authors need to engage here with the alleles that do not match either parent here (e.g., allele 282 at MS7), explaining the likelihood that these alleles indeed represent a binning error (or, perhaps, stepwise mutation from parental allele), and these alleles should be flagged. Instead, they bin these unique alleles with the most similar parental allele without any explanation or flagged. The authors do bring this point up in Figure S1, but this issue needs to be addressed in the main text (related point: the mix of red/green in MS16 offspring appear more green than red. Is this meant to denote a probability different than 50:50? If not, the authors should adjust the shading so that this shape is half green, half red).

      We have added to the figure legend that single nucleotide differences are most likely binning errors and are therefore not considered “de novo” alleles. Instead, they are assigned it to the most similar parental allele, consistent with Figure S1. The shading at MS16 has been removed so that it is consistent with Figure 3.

      Figure 3: Indicate that white background for alleles means that allelic inheritance is not determinable, or use the mix of colors applied in Fig. 1 to indicate as such. Unique offspring alleles should be flagged rather than just automatically assigned to the most similar parental allele. Finally, it would be helpful if the alleles were presented within loci from the shorter to the longer alleles.

      We have included in the figure legend that non-shaded alleles are those for which multiple potential parents share the same allele and the inheritance therefore remains ambiguous for this locus. Single nucleotide differences are also now addressed, and sizes are ordered from smallest to largest.

      Figure S7. Indicate visually which panels indicate FP animals.

      We have now indicated which animals are FP and included this in Figure S6 as well.

      Fig. S13. The 5 animals that had especially low heterozygosity should be flagged. The title of this figure should be toned down in light of the tentative nature of the conclusions regarding FP in nature: low heterozygosity could instead reflect, for example, a long history of inbreeding. My reaction to the data is also that the % heterozygosity distribution for many of the species looks continuous rather than the bimodality one might expect under FP vs. sexual reproduction.

      Since FP has not been further confirmed in these animals, unlike those examples from our captive colony, there could indeed be other reasons for low heterozygosity. We have changed the title of the figure from “Facultative parthenogenesis in whiptail lizards collected in nature” to the more neutral “Heterozygosity estimates of whiptail lizards collected in nature.” Since there are so relatively few animals, one would not necessarily expect a bimodal distribution to be apparent in the current data. We did show that the animal with the lowest calculated level of heterozygosity (deppii LDOR30) was a statistical outlier when compared to other individuals of the same species though. Since these animals were sampled across different locations and habitats, the effective population sizes would be assumed to be different as well, reflecting the range of heterozygosity estimates seen here. This has been made clear in the text.

      Reviewer #2 (Significance (Required)):

      General assessment: strengths and limitations. The paper's strengths include the combination of data from lab and natural populations, the characterization of an unexpected means of achieving FP, with dramatic genetic consequences, and the data suggesting that this type of FP is fairly common and occurs even in the context of mating.

      Audience: The biological questions of relevance to these discoveries are of broad interest, and the paper is likely to garner some attention from the life sciences community as whole and the popular press.

      Advance: These data fill an important knowledge gap regarding the mechanisms potentially driving FP in vertebrates, how often FP is likely to occur, and its genetic consequences. The discoveries are potentially conceptual/fundamental, though the extent to which they are ground breaking is not clear in the absence of functional characterization of how FP occurs as well as the need for more rigorous comparisons and replication that I outlined above.

      We thank reviewer 2 for summarizing the strengths of this manuscript, pointing out the broad interest and stating that this work fills an important knowledge gap.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The occurrence of facultative parthenogenesis has been described in a number of vertebrate lineages but the underlying cytological mechanism(s) have remained largely speculative due to sparsity of data. Here, Ho & Tormey et al. provide a detailed analysis of facultative parthenogenesis in gonochoristic species of the lizard genus Aspidoscelis. They show that parthenogenesis leads to a complete loss of heterozygosity (LOH) within a single generation. They attribute the LOH to diploidization through duplication of the oocytes haploid genome after completion of meiosis. This mechanism is consistent with their finding of mixoploidy in erythrocytes of asexually produced offspring. Based on LOH the authors additionally show that facultative parthenogenesis in Aspidoscelis is not condition dependent (no developmental switch): it can occur in the presence of males, alongside with sexual reproduction in the same clutch, and both in captivity and the wild. Finally, the authors show that facultative parthenogenesis is associated with developmental aberrations, likely caused by expression of homozygous recessive deleterious mutations.

      Major comments: In my opinion, this study presents a very comprehensive, careful documentation of mechanistic aspects and consequences of facultative parthenogenesis in a vertebrate. The genomic and microsatellite results leave little to no doubt that facultative parthenogenesis has led to complete LOH in Aspidoscelis. I am particularly impressed by the meticulous analysis of genomic coverage to exclude e.g. false positive heterozygosity due to merged paralogs in the assembly. I also follow the authors conclusion that a post-meiotic "gamete duplication"-like mechanism is likely causative for the LOH (and the mixoploidy of erythrocytes; but I am no expert on that). I was wondering if terminal fusion automixis together with a complete absence of recombination would be worth mentioning as an (probably very unlikely) alternative in the discussion. It would be exciting to corroborate the conclusion of diploidization by genome duplication in the future, e.g. via early embryonic DNA stainings to show the duplication "in action" (if that is practically possible)...? As for this manuscript, I suggest emphasizing the indirect nature of the evidence for the mechanism of parthenogenesis a little bit more.

      We thank the reviewer for highlighting the effort that went into the genomic analysis that led us to our conclusions. In terms of terminal fusion without recombination, we argue that this is not an obvious alternative explanation as a large body of work has established that at least one crossover per homologous chromosome pair is required to advance into meiosis I in many organisms (e.g. see https://doi.org/10.3389/fcell.2021.681123) and therefore the absence of recombination would likely not produce the polar bodies necessary for automixis.

      We have added to the text: “In whiptail lizards, we have not been able to examine post-meiotic oocytes as locating the post-meiotic nucleus within a large yolked egg is inherently difficult. The difficulty is compounded by the unpredictability of which eggs will undergo FP development and the need to sacrifice animals to remove eggs.”

      While the genome duplication mechanism we propose is indeed indirect because we are unable to visualize developing FP embryos, the most parsimonious explanation from the whole-genome sequencing analysis is genome duplication because of the lack of heterozygous regions associated with automixis. In the text, we have made sure to state genome-wide homozygosity as the basis for our conclusion.

      I agree that facultative parthenogenesis in the presence of males hints at a baseline rate of parthenogenesis without requiring a developmental switch. However, this makes it difficult to rule out that sperm played a role in activation of embryonal development (gynogenesis; however I am only aware of gynogenesis in fishes and amphibians)... maybe, the authors want to take this up in the discussion. Were the five parthenogenetic individuals for whole genome sequencing actually produced in the presence of males, too?

      FP has been reported to occur in isolated females for other reptile and bird species, suggesting that sperm activation is at least not a general requirement in FP of amniotes. (Watts, et al. 2006, W. W. Olsen, S. J. Marsden 1954). In all cases in this study, the female mothers were housed with conspecific or heterospecific males. While we cannot completely rule out a non-genetic contribution of sperm in these cases, it would seem to be an unlikely explanation in light of the sperm-independent reproduction by obligate parthenogenesis in other species of whiptail lizards (unlike the sperm-dependence of all unisexual reproduction in amphibians and fish). We decided to not include speculation on sperm-dependence in this manuscript as we have no evidence in favor of it, nor is there any evidence for this in the literature relating to other amniotes. In fact, most examples of FP were reported from isolated females, most likely because offspring were not expected in those cases and prompted further analysis as to their origin.

      I agree with the interpretation of the LOH in the RADseq data as a likely case of facultative parthenogenesis in the wild. However, when looking at figure S13 I noticed some bimodal looking distributions (e.g. in A. guttatus). It may be interesting for future studies to look into what factors influence heterozygosity in natural populations of Aspidoscelis (e.g. inbreeding vs parthenogenesis). Could there be different mechanisms of facultative parthenogenesis in different Aspidoscelis species explaining different LOH intensities?

      The continuous nature of the data may reflect natural variation between individuals and collection at various locations with possibly different effective population sizes and levels of hybridization. Low levels of heterozygosity could be indicative of inbreeding or FP in some cases. This is important to note in future studies and we have added this to the manuscript (“Further fieldwork and analysis will be required to assess the level of FP in natural populations of gonochoristic Aspidoscelis species (and other factors that could influence the observed heterozygosity such as population size, levels of hybridization, and inbreeding) …”). While there are different mechanisms of FP in other vertebrate groups, the most parsimonious hypothesis is that within a genus, the mechanism would be the same.

      The manuscript is well written, the introduction nicely explains the significance of the study, the methods are fully appropriate and the results (and supplementary results) displayed comprehensibly and in great detail. The discussion might benefit from going a bit more generally into the occurrence and mechanism of obligate asexuality in Aspidoscelis. One might e.g. speculate on whether the ability for facultative parthenogenesis in gonochoristic species has facilitated the transitions to obligate parthenogenesis in the hybrid lineages and what peculiarities might predispose Aspidoscelis to parthenogenesis (e.g. are centrioles contributed by sperm required?). In addition, I think the occurrence of LOH due to gamete duplication (facultative and obligate) in invertebrates (e.g. due to Wolbachia) is worth mentioning in the discussion: e.g. there is a similar case in facultative asexual Bacillus rossius stick insects, where the early dividing cells are haploid. Some of them diploidize via duplication later and form the embryo.

      Thank you for complimenting each section of the manuscript and referring to it as well-written. Our lab has a long-standing interest in obligate parthenogenesis. While it is interesting that both obligate and facultative parthenogenesis occur alongside each other in this genus, the mechanisms appear to be fundamentally different, and we would like to focus the discussion on FP in a variety of systems and its potential implications in conservation and evolution. Parthenogenesis in general is a fascinating topic for a broad audience and not discussing another form of parthenogenesis (obligate in this case), the focus remains on FP and keeps the manuscript more accessible for non-specialists. We have included the stick insect as another example of diploid restoration through genome duplication in the discussion.

      Minor comments:

      39-41: I am a bit puzzled by the usage of the term "post-meiotic" to contrast the diploidization through duplication with automixis. Wouldn't one consider polar body fusion after completion of meiosis II also post-meiotic? Maybe I am just not aware of how the term is usually used in this context here...

      We use the term “post-meiotic” because the restoration of an entirely homozygous diploid cell can only occur after the completion of both meiotic divisions. It is our understanding that polar body fusion and meiotic restitution after meiosis I or meiosis II are generally considered meiotic mechanisms in the specialized literature, even though polar body fusion would also occur after the meiotic divisions.

      65: isn't that gynogenesis (sperm-dependent parthenogenesis) in the amazon molly?

      While sperm is required for parthenogenesis in the Amazon Molly, it is an all-female species that exclusively reproduces through gynogenesis. In this case, it is considered an example of obligate parthenogenesis rather than FP.

      78: the term "economically viable" may be a bit puzzling for a biologist's audience. "Economically sustainable" could be an alternative.

      This has been changed.

      129: the Arizona male was referred to as ID 4272 above. Here it is ID 4238?

      This has been corrected. The correct ID is 4272.

      218: please define over-assembly (see line 207)

      The definition of “over-assembly” is collapsing paralogous loci into a single representative sequence. This is now explained in the text.

      263-281: please, indicate a hatching rate/ rate of malformations of sexually produced offspring for comparison.

      A comparison has been added: “This is in stark contrast to sexually produced animals, where over 98% of hatchlings had no abnormalities noted.”

      333: in the haploid cells recessive deleterious mutations would be exposed in the hemizygous state but in the diploid cells in the homozygous state.

      The text has been modified to reflect the difference between haploid and diploid cells.

      470: please, provide more detail for the RADseq analyses (variant calling, calculation of heterozygosity etc.)

      We have elaborated on the analysis in the methods.

      Figure 1B: please, mention in the legend that the shown mechanisms are not exhaustive, e.g. first polar body fusion could occur right after meiosis 1 or polar body formation could be skipped completely.

      This has been added.

      Figure 1C: it may be interesting for non-specialists to name the distinctive morphological characters setting apart the three species in the figure legend and highlight them e.g. with arrows in the figure.

      We have now included in the figure legend characteristic color patterns for each species: “(C) Photographs of Aspidoscelis arizonae with characteristic blue ventral coloration (top), A. gularis with light spots in dark fields that separate light stripes on dorsum (middle), and A. marmoratus with light and dark reticulated pattern on dorsum (bottom).” Since the descriptions are specific and apparent, we did not add arrows to the pictures.

      Reviewer #3 (Significance (Required)):

      Significance: The study by Ho & Tormey et al. substantially enhances the understanding of (facultative) asexuality in vertebrates. In particular, while most reports of facultative parthenogenesis in vertebrates have been attributed to a form of automixis, the authors conclusively show an instance of diploidization through genome duplication, a mechanism functionally similar to "gamete duplication". The study is novel, very comprehensive and of interest for a general audience within the field of evolutionary biology.

      We thank reviewer 3 for pointing out that our study substantially enhances the understanding of asexuality in vertebrates, is very comprehensive and of interest for a general audience within

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We appreciate the thoughtful comments of the reviewers. We have revised the manuscript according to these comments as detailed below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Efficient proteostasis in cells demands efficient clearing of damaged or misfolded proteins, and an important pathway involved in such clearance is the ubiquitin-proteasome pathway. In this system, proteins are tagged with ubiquitin to target them for degradation by the 26S proteasome complex. The conventional 26S proteasome complex consists of a core particle (CP or 20S proteasome) and one or two regulatory particles (RP, or 19S proteasome) to form the singly or doubly-capped proteasome, respectively. Proteasome assembly is a well-orchestrated process that requires proper stoichiometry of proteasome subunits and dedicated proteasome assembly chaperones. This is maintained by fine-tuning their transcriptional and translational regulation.

      This manuscript elucidates an important aspect of how the different proteasome components are transcriptionally regulated upon denervation in mouse muscles for timely and efficiently assembling 26S proteasome. The authors present data that point out towards the model whereby a two-phase transcriptional program (early: day 3-7 and late: day 10-14) activates genes encoding proteasome subunits and assembly chaperones to boost an increase in proteasome content. This involves the coordinated functions of two transcription factors, PAX4 and alpha-PAL(Nrf1) which were important for both early and late phase of the transcriptional program. Their roles were not redundant as loss of one transcription factor was sufficient to prevent induction of various proteasome genes in muscle after denervation.

      In summary, the authors report a novel bi-phasic mechanism elevating proteasome production in vivo, which involves the coordinated functions of two transcription factors, PAX4 and alpha-PAL(Nrf1).

      Major points: 1) It is not clear why PAX4 and alpha-PAL(Nrf1) are both fully required for the transcriptional induction of some proteasome genes upon denervation (with good overlap), while only PAX4 is important for increased proteasome assembly. The authors speculate that this could be due to a stoichiometry problem but an alternative scenario where translation is increased upon alpha-PAL(Nrf1) inhibition would also be possible. This would explain why, for example, the induction of PSMC1 gene expression upon denervation is abolished upon alpha-PAL(Nrf1) inhibition (Fig. 5C) while the protein level is still increased (Fig. 6H). Is that also true for PSMD5 and Rpn9? Could it also be that the loss of function of alpha-PAL(Nrf1) is too detrimental for the muscle so that they induce an alternative stress response pathway increasing proteasome subunit translation?

      We thank the reviewer for this comment. To better clarify this important point, we conducted further experiments to examine the differential effects between PAX4 vs. α-PALNRF1 on proteasome assembly chaperons (Fig. S4b). Our new data show that PAX4 promotes the induction of the assembly chaperone, PSMD5 (S5b) at 3 days after denervation (Fig. S4B). This induction is critical for the increase in PSMD5 protein levels because PAX4 knockout results in decreased PSMD5 protein levels at both 3 and 10 days after denervation (Fig. 4K). α-PALNRF1, however, does not affect the mRNA levels of this chaperone (Fig. S4A). This new result strengthens our conclusion that induced expression of assembly chaperones by PAX4 is key to raising proteasome levels after denervation.

      We cannot rule out an indirect effect of α-PALNRF1 knock-down on protein synthesis, and therefore this potential alternative mechanism is now discussed in the text. It appears unlikely, however, that α-PALNRF1 knock-down is too detrimental to muscle as we do not find any evidence phenotypically for any type of stress or abnormalities.

      2) Pax4 controls Rpt1-2 transcription and these two Rpt proteins form a pair. As Rpt4 is also regulated by Pax4, is Rpt5 also controlled by Pax4?

      We believe the reviewer meant to request the data for Rpt4, because the data for Rpt5 was already included in original Fig. 4G-H. Therefore, we repeated the RT-PCR analysis of PAX4 KO mouse muscles for Rpt4 and now show that its induction requires PAX4 at 10 d after denervation, just when proteasome content is increased (Fig. 4G). At 3 d after denervation, Rpt4 induction is probably regulated by other transcription factors because its mRNA levels at this early phase were similar in muscles from WT and PAX4 KO mice (Fig. 4H). These data, strengthen our conclusions that coordinated functions of multiple transcription factors control proteasome gene expression in vivo. In future studies, we will investigate the specific mode of cooperation and mechanisms by which various transcription factors and co-factors collaborate to enhance the expression of proteasome genes in the early and delayed stages of gene expression within a living organism.

      What about the assembly chaperone for these two pairs: PSMD5 and p27? It would be very interesting to know if there is a transcriptional coregulation based on proteasome assembly intermediates.

      The referee raises an important point, which we also discuss in the text. We now present data showing that PAX4 promotes the induction of the assembly chaperon PSMD5 at 3 d after denervation (Fig. S4B), correlating nicely with the observed changes in protein levels of this chaperon (Fig. 4K). The expression of PSMD9 (p27) however, does not require neither PAX4 nor α-PALNRF1 (Fig. S4). Consequently, we conclude that PAX4 promotes proteasome biogenesis by promoting PSMD5 induction, and in the absence of α-PALNRF1 proteasome subunits can still efficiently assemble into the proteasomes (even though their expression is reduced), due to the induced expression and increased action of the assembly chaperone PSMD5. Our data highlight the intricacy in controlling proteasome levels, through transcriptional regulation of proteasome genes and assembly chaperones during muscle atrophy. We now further document and discuss the regulation of proteasome biogenesis by these two transcription factors in the text and Discussion (p.28).

      3) Fig. 4J: PSMD5 and PSMD13 are not tested in Fig. 4A, G and H. This needs to be done if the authors want to draw the parallel mRNA-protein levels, as in their conclusion. Moreover, the protein levels seem to be much more induced than the mRNA levels, could that be due to increased translation? This could be discussed.

      We accepted this thoughtful suggestion and now present the mRNA levels for PSMD5 and PSMD13 in Figs. 4A, G and H and Fig. S4. The new data does not change our conclusion that protein abundance largely correlate with the transcript levels (Figs. 2 and 4K).

      The reviewer raises an important question that we hope to resolve in the future. As we point out in the revised Discussion section, “the substantial rise in protein levels compared to mRNA levels after denervation suggests potential increased protein translation due to PAX4 loss. Whether PAX4 regulates protein synthesis and thus can affect protein levels beyond gene expression are intriguing questions for future research”.

      4) The conclusion is not correct in this sentence: "Moreover, analysis of innervated and 10 d denervated muscle homogenates from WT, alpha-PAL(Nrf1) KD or PAX4/alpha-PAL(Nrf1) KD mice by native gels and immunoblotting or LLVY-cleavage indicated that loss of both transcription factors is necessary to effectively block accumulation of active assembled proteasomes on denervation (Fig. 6H)". This is not correct, as the loss of PAX4 is sufficient to block accumulation of active assembled proteasomes on denervation (Fig. 4K). So, it could just be that alpha-PAL(Nrf1) KD has no effect on the induction of proteasome assembly after denervation and that all the effect of the double mutant is due to PAX4 loss. This needs to be corrected.

      We thank the reviewer for this thoughtful comment. The text has been revised accordingly.

      Minor points:

      1) I would rephrase the sentence "baseline at 14 d after denervation and showed a sustained low mRNA levels until 28 d (Fig. 2A-F).", as the mRNA levels are still significantly higher that the basal levels for most proteasome genes. Same for the sentence: "RNA sequencing (RNA-Seq) analysis of TA muscles at 14 d after denervation indicated that expression of most proteasome genes is low at 14 d (Fig. S1)". Expression is low compared to what and not being induced doesn't mean they are low. This needs to be rephrased.

      We revised the text accordingly and thank the reviewer for these suggestions.

      2) Microscopy images need more explanation: define the green and red channel and what they are used for in the legend.

      The legends have been updated as requested.

      3) Columns have moved from the Table 2.

      The tables have now been submitted as separate files.

      4) Fig. S3: RT-PCR on NRF-1(NFE2L1) need to be performed to see the extent of inhibition by shRNA.

      We thank the reviewer for this important comment. The data, which was added as new Fig. S3A, shows an efficient knockdown of NRF-1NFE2L1 with shNFE2L1.

      5) In the sentence: "PAX4 maintaining subunit stoichiometry for increased proteasome assembly.", could it be due to the much higher levels of PSMB8, 9 and 10 immunoproteasome subunits upon alpha-PAL(Nrf1) KD (Fig. 6F)?

      We addressed this aspect in Major Point #1, regarding the difference between PAX4 and α-PALNRF1; please see our response. As for the Reviewer’s comment concerning Fig. 6F, we think that the increased expression of PSMB 8, 9, and 10 in α-PALNRF1-KD compared to the double KD or PAX4 KO further suggests a distinct cooperative interaction between these transcription factors in promoting proteasome expression, assembly, and function, which we plan to thoroughly investigate in future separate studies. However, the increased expression of PSMB 8, 9, and 10 can affect the composition of the CP (by replacing their normal ounterpart), but not the RP assembly. CP and RP are known to assemble separately with their own dedicated chaperones; RP and CP then associate to complete the assembly of proteasome holoenzyme (RP-CP complex). Thus, it is unlikely that increased CP assembly alone would increase overall RP-CP assembly.

      **Referees cross-commenting**

      All other comments are relevant.

      Reviewer #1 (Significance (Required)):

      Overall, the work is impactful and timely, reporting the participation of a novel transcription factor, alpha-PAL(Nrf1), along with PAX4, in regulating the transcription of proteasome genes and the subsequent assembly of conventional proteasomes in mouse muscle upon denervation. One limitation is that alpha-PAL(Nrf1) kockdown is only inhibiting proteasome genes expression but proteasome assembly, the reason being still unknown. Most of the conclusions drawn in the manuscript are supported by the experimental data. Better understanding how proteasome homeostasis is regulated upon stressful conditions is an important fundamental aspect of proteasome biology. I would support publication of this manuscript providing the more specific concerns listed are addressed.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The main limitation of this study is that is based on a single model of muscle atrophy: that induced by cut of the sciatic nerve. Another one will nicely complement the findings as fasting atrophy or cancer cachexia model, to see if the two phase is recapitulated with regard to proteasome modulation.

      The referee raises an interesting point, but as we explained throughout the manuscript, we did not use denervation in this study as a model for atrophy but rather as an in vivo model system to investigate mechanisms of protein degradation and proteasome homeostasis in a whole organism in vivo. The reason we selected denervation as an in vivo model for accelerated proteolysis is due to the gradual nature of muscle loss, which allows us to dissect the various phases of proteasome homeostasis effectively. Fasting, as an alternative model, is too rapid for addressing the specific questions that we asked in this study. In addition, in the rapid atrophy induced by fasting the primary physiological mechanism to increase protein degradation in vivo is believed to be through post-synthetic modification of proteasomes, rather than the production of new proteasomes (VerPlank et al., 2019). In future separate studies, we will thoroughly investigate whether the mechanisms discovered here are applicable to other types of atrophy (e.g. diabetes, aging, cancer). The obtained results will be published and fully discussed separately, in part because covering all types of atrophy within a single paper is impractical and goes beyond the scope of the current manuscript.

      Another major concern is that the author do not measure over time during denervation atrophy the mRNA and protein content expression of the two transcription factors that they found crucial in the proteasome induction and assembly.

      We agree with the reviewer that time course would strengthen our conclusions that the two transcription factors are important for proteasome gene induction and assembly. We have added these data showing that PAX4 (Fig. 4I) and α-PALNRF-1 (Fig. 6E) both accumulate in the nucleus at 7 d after denervation, just when proteasome content is maximal (Fig. 3A) and protein breakdown is accelerated (Cohen 2009; Volodin 2017; Aweida 2021). The mRNA levels of PAX4 were presented as original Fig. 4F and indicate that PAX4 is induced already at 3 d after denervation. We have added new RT-PCR data for α-PALNRF-1 showing that α-PALNRF-1 is induced at 7 d and 10 d after denervation (Fig. 6D).

      Major and minor concerns are as follows:

      Typos now and then are present all over the text, as holoemzyme shall be replaced with holoenzyme on page 9, on page 12 proteasome is misspelled on mid page, as well as cellls. By cotrast shall be corrected on page 19. References on page 22 shall be formatted.

      We have corrected the typographical errors.

      • reference 29 on page 7 seems out of context together with the sentences it is coupled with.

      The reference is appropriately located within the text in terms of context, and precisely aligns with the sentence to which it is associated. Reference 29 (Boos 2019) describes a cellular state in which all proteasome genes rise simultaneously.

      • muscle electroporation of plasmid shall be replaced by AAV9 injection that causes less inflammation and more expressing fibers

      We do not understand and see no basis for the referee’s assertion that the “muscle electroporation of plasmid shall be replaced by AAV9 injection”. On the contrary, the electroporation methodology is widely used by many labs because of its many advantages. This in vivo gene transfection approach is extremely useful to study transient gene (or shRNA) effects in adult muscles, while avoiding the developmental effects of genes (or shRNA) that are often seen in transgenic or knockout animals (e.g., the inducible knockout of α-PALNRF-1 caused lethality, see Fig. 6B-C).

      In addition, the electroporation technique offers great advantages from its speed and major cost savings. We have been using it routinely in our lab for in vivo studies, and articles using it from many laboratories worldwide have appeared in all major journals, e.g. see our papers in Nature Communications, J Cell Biol, PNAS, EMBO rep, and papers from late Alfred Goldberg (Harvard), Marco Sandri (Padova, Italy), Jeff Brault (Indiana Univ.) and others. In all studies included in this manuscript that involve electroporation, contrary to the reviewer’s impression, there was no damage or inflammation to the muscles, and we routinely examined histological sections. Finally, for our studies, we always use muscles that are at least ~70% transfected, which has proven adequate for observing gene effects in mouse muscle. In each experiment, transfected muscles are always compared and analyzed in parallel to control muscles (transfected with scrambled shLacz control). In fact, the validity of the in vivo electroporation technique is further confirmed herein by our investigations of transgenic inducible knock-down mice, showing similar effects on proteasome gene expression.

      • the shGankyrin data shall be complemented with overexpression of the same chaperone to see the effects of proteasome expression and assembly.

      We understand the reviewer’s concern but do not believe that such an experiment is necessary since it is well known and there is already extensive evidence in the literature showing that the chaperon Gankyrin is essential for proteasome assembly (Kaneko et al. Cell 137, 914–925, May 29, 2009 (DOI 10.1016/j.cell.2009.05.008). Thus, various Gankyrin mutants have often been used as an inactive control for proteasome assembly in vitro and in vivo (Kaneko et al. Cell 137, 914–925, May 29, 2009 (DOI 10.1016/j.cell.2009.05.008). In fact, Gankyrin’s known function in ensuring not only the proper subunit composition, but also proper conformation of the proteasome holoenzyme (Lu et al., Mol Cell. 2017 Jul 20;67(2):322-333.e6).

      • another important transcription factor driving MuRF1 expression is Twist and it is totally ignored in the discussion, please add it.

      We regret this oversight. We did not mean to slight any authors, although our major new discoveries and focus is on proteasome genes and not MuRF1. However, to satisfy the reviewer, we now discuss in the text Twist and other transcription factors (including SMAD2/3, glucocorticoid receptors and NFkB) capable of inducing the major atrophy-related genes (among them MuRF1).

      • WB in Fig 2 shall be complemented by one in the Supp with more replicates per timepoint

      We accepted this thoughtful suggestion and now present blots from additional normal and atrophying denervated mouse muscle samples as new Fig. S1B. This approach, however, does not change any of our conclusions.

      • please justify why only PSMD10 (gankyrin) has been silenced and not any of the others (POMP, PSMD5, PSMD9)

      We silenced PSMD10 (Gankyrin) as a representative RP assembly chaperone, since it is better characterized than the other RP assembly chaperones (PSMD5 and PSMD9). We kept POMP (a CP assembly chaperone) intact. Since the formation of one proteasome holoenzyme (RP2-CP) requires two RPs and one CP, increasing proteasome assembly is expected to be more demanding for RP assembly than CP. This led us to predict that disrupting RP assembly should be sufficient to block the induced proteasome assembly. This prediction is supported by our data (Fig. 3), and this justification was also added to the revised text to enhance clarity.

      The originality is limited by the fact that Pax4 was already shown to have a role in muscle atrophy and drives the expression of p97 by the same authors. I would be curious to see if treatments in vitro know to induce the proteasome as starvation etc acts through the biphase mechanism showed in this paper, to understand how extendable to other kinds of atrophy is.

      We respectfully disagree that the originally of the present findings is limited, because previously we validated a single proteasome subunit (Rpt1) as a target gene for PAX4 (Volodin 2017), and here we discover novel global coordination of proteasome gene expression by multiple transcription factors.

      As we mention above, muscle denervation was used here as an in vivo model system of catabolic conditions. Unlike prior reports that were limited to cultured cells, our studies focus on the physiological setting in vivo to reveal mechanisms of proteasome homeostasis. In any case, regulation of proteasome gene expression by multiple transcription factors in other types of atrophy has not been investigated but is possible because common transcriptional adaptations activate protein breakdown in different types of muscle atrophy, including a coordinated induction of numerous components of the ubiquitin proteasome system (Jagoe 2002; Lecker 2004; Gomes 2001). In future independent research, we intend to investigate if the two-phase mechanism reported here can in fact be generalized to other atrophy (or stress) conditions.

      Reviewer #2 (Significance (Required)):

      The authors Gilda and co-workers made a great attempt to dissect the induction of proteasome activity during denervation muscle atrophy and discovered a two-phase process which involves two transcription factors Pax4 and NRF1. The manuscript is clearly written and the experiments fully delineated.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Using denervated mouse muscle as a model, Gilda et al. demonstrated that a two-step transcriptional program operates in the process of muscle atrophy after denervation and that proteasome expression-induced enhancement of protein degradation is important. Gilda et al. clarified that the transcription factors PAX4 and PAL/NRF-1 act on this proteasome expression induction and that the induction of these transcription factors and the expression induction of the proteasome gene cluster after denervation are necessary for muscle atrophy using an in vivo mouse model. The experiments were logically designed, and the results presented are considered clear and reliable. However, some of the descriptions in the text lack accuracy and courtesy, and some experiments require additional data to support and strengthen the author's claims. In particular, it is unclear whether PAX4, FOXO3, and NRF-1 work together or whether they have distinct functions. Although the authors claim that there are two stages of proteasome expression induction after denervation, this remains unclear. The authors should clarify the differences in target sequence or target genes and the substitutability of each transcription factor.

      Major comments: 1: In Figure 3A, the results of the immunoblot of SDS-PAGE against 20S proteasome subunits should also be shown to confirm the increase in proteasome activity and amount.

      We would like to clarify this aspect. We show the increased levels of proteasome holoenzyme complex (RP2-CP) by immunoblotting of the native gel, rather than SDS-PAGE gel. This is because the blots of the native gel can assess the levels of the actual proteasome complex, not simply subunit levels in their denatured state as in SDS-PAGE; SDS-PAGE cannot distinguish between free subunits and ones that are incorporated into the proteasome.

      If proteasome activity was increased due to some other mechanisms, proteasome levels would remain relatively constant, while proteasome activity would have increased. However, this is not the case here since our data demonstrates that both RP2-CP activities and levels peak at day 7. Furthermore, the in-gel peptidase assay (Fig. 3A panel b) directly tests the 20S CP activity within the proteasome holoenzyme (RP2-CP complex) using the fluorogenic model substrate, LLVY-AMC. The 20S CP is activated for substrate degradation, only upon its association with RP (RP2-CP complex), since RP opens the substrate entry gate of the 20S. Free 20S itself is inactive, as its gate for substrate entry is closed; for this reason, free 20S can be detected, only after its substrate entry gate is artificially opened by SDS (see free 20S in panel b, but not in panel a).

      2: In Figure 3, the reviewer assumed the conflict between the results of peptidase activity and SDS-PAGE in 14d. Therefore, quantification and statistical analysis should be performed on the results of proteasome peptidase activity and immunoblots to clarify the relationships between proteasome activity and amounts. Immunoblotting against ubiquitin is also needed to confirm the requirement and efficiency of proteasome induction.

      As the reviewer pointed out, it might seem discrepant that peptidase activity at 14 d denervation is lower than its peak at 7d (Fig. 3A, panel a), but SDS-PAGE signal for proteasome subunits seems still high (Fig. 3A, panel d, Rpn2). SDS-PAGE detects total cellular content of proteasome subunits (free subunits as well as ones assembled within proteasomes). However, at any given moment, these subunits are not only in the proteasome holoenzyme complex, but also in different assembly intermediates. When proteasome subunits are transcriptionally induced as in this study, proteasome assembly process is also increased. However, proteasome assembly is a multi-step process, and the fold-induction for each specific subunit is different (Fig. 2A-B). This means that the rate of a certain assembly step would be differently affected for a given subunit, depending on their fold-induction. For this reason, some subunits seem to exist at a high level at 14d (e.g. Fig. 3A, panel d, Rpn2), but they are not yet incorporated into the proteasome complex, because they might be still undergoing assembly process.

      As for the ubiquitin blot, it can be a good indicator for proteasome activity, when proteasome activity is decreased than normal. In such situations, ubiquitinated proteins accumulate (i.e. their signals increase as compared to control), due to their deficient degradation. However, our present study pertains to the opposite situation, where proteasome activity is increased in degrading ubiquitinated proteins. In normal cells, ubiquitinated proteins are hardly detectable due to their rapid degradation. Thus, when proteasome activity is greater than normal, ubiquitinated protein levels will be further decreased than normal. Data become unreliable when the signals are below the detection threshold. For this reason, we provided functional readouts involving the number of muscle fibers (for example, Fig. 3D).

      3: In Figure 3C, the sample labels of shGankyrin and shLacZ are repeated. Would it be mislabeled? In addition, NATIVE PAGE immunoblot analysis against Gankyrin and proteasome subunits are needed to prove the knockdown efficiency and to reveal the assembly defect of proteasome by Gankyrin knockdown.

      To present our findings more clearly, we show one of each sample in the revised figure, rather than the duplicates as in the previous figure (Fig. 3C). We also included the immunoblot data to show that Gankyrin knockdown disrupts proteasome assembly, as seen by the reduced proteasome complex activity and level (Fig. 3C, panels a, b, c, lane 3, see RP2-CP). In Gankyrin knockdown samples, proteasome holoenzyme complex exhibited smeary appearance (Fig. 3C, panel c, see bracketed region in lane 3), as opposed to a discrete band in the controls (lanes 1, 2). This smeary appearance reflects more heterogeneous proteasome populations, due to defects in their composition and/or conformation. This is in line with Gankyrin’s known function in ensuring not only the proper subunit composition, but also proper conformation of the proteasome holoenzyme (Lu et al., Mol Cell. 2017 Jul 20;67(2):322-333.e6).

      4: In Figures 4A, 4G, 4H, 4J, and 4K, the results of shPAX4 against innervated muscle should be shown to estimate the contribution of PAX4 in steady-state conditions. To clarify the innervated muscle-specific function of PAX4, histological analysis and quantification of proteasome gene expression in multiple organs in PAX4 KO mice are needed.

      The reviewer raises an interesting point, but as we explained above, we concentrate here on the major new discovery that multiple transcription factors increase proteasome content in a catabolic condition in vivo, correlating directly with the accelerated protein loss. Regulation of the basal levels of proteasome in normal conditions in various types of cells and tissues is certainly an important issue meriting in depth study and will be the subject for future studies, but it is beyond the scope of this lengthy paper. This point is now discussed in the revised text.

      The tissue distribution of PAX4 and the detailed description of the phenotype of KO mice are also needed to understand and evaluate the role of PAX4 in muscle.

      We added the requested data about PAX4 distribution as Fig. 4I. These data shows that PAX4 accumulates in the nucleus already at 3 d after denervation. Furthermore, we are happy to add further information about the knock-out mouse model. The requested information and a detailed description of how PAX4 KO mice were generated were added to the text. The PAX4 KO mice showed no abnormalities and did not appear in any way different from the wild type littermates.

      5: In Figure 4C, immunoblot analysis against PAX4 is essential to confirm the PAX4 protein knockout.

      We agree and representative blots were added to Fig. 4C.

      6: In Figure 5, peptidase activity and immunoblotting in NATIVE PAGE are needed to reveal the contribution of FOXO3 and NRF-1 in denervated muscle as shown in Figure 4.

      The requested data for FOXO3 using FOXO3 dominant negative (as in Fig. 5A-B) were added as new Fig. 5C-D, showing no effect on proteasome content by FOXO3 inhibition. These new data are consistent with our findings that the expression of only two proteasome subunit genes was affected by FOXO3 inhibition at 10 d after denervation (Fig. 5B). The data for α-PALNRF-1 and the effects of its knockdown on proteasome content and activity were shown as original Fig. 6H (now Fig. 6J).

      The expression of FOXO3 and NRF-1 should also be shown by RT-PCR and immunoblotting as shown in Figure 4.

      We thank the reviewer for this thoughtful suggestion, and as requested, we now show representative blots of transfected muscles to support the graphical data (Figs. 5C-F). These data confirm the efficient expression of HA-FOXO3ΔC or FLAG-α-PALNRF-1 dominant negative inhibitors in transfected muscles. It is important to note that these inhibitors are mutant forms designed to interfere with the normal function of the wild-type endogenous FOXO3 or α-PALNRF-1 proteins, without affecting their transcript levels. Given this mechanism, we believe that Western blotting is a more appropriate technique for assessing their impact, as it provides direct insights into protein expression. In the revised main text and methods, we have now clarified this point.

      Similar to previous comments, the expression of the dominant negative form of Foxo3 and NRF-1 should be performed in innervated muscles to reveal the significance and specificity of Foxo3 and NRF-1 function in denervated muscles.

      As mentioned above, regulation of the normal basal levels of proteasomes is certainly an important issue meriting in depth study and will be the subject for future studies, but it is beyond the scope of this lengthy paper, which focuses on the mechanisms increasing protein content in catabolic conditions in vivo. With respect to FOXOs, there is a large literature on its regulation and roles in normal muscle (please see papers by late Alfred L Goldberg, Marco Sandri and others). Under normal conditions FOXO3 is largely inactive via phosphorylation by insulin-PI3K-AKT signaling (Stitt 2004; Latres 2005; Zhao 2007).

      7: In Figure 6D, the list of genes should be served especially about 27 genes and 69 genes that show common features between NRF-1 KD and PAX4 KO.

      The requested data is now presented as new Table 4.

      8: In Figure 6F, the list of genes that change expression in PAX4 and NRF-1 KD mice is needed.

      We agree and the requested data has now been added to table 5.

      9: In Figure 6H, immunoblotting against ubiquitin is needed to evaluate the contribution of proteasome induction to protein degradation.

      We clarified this aspect in the Major Point #2. Please see our response.

      10: This study lacks the detailed mechanisms by which PAX4, Foxo3, and NRF-1 regulate the expression of proteasome genes. The contribution of these transcription factors is revealed by experiments, but the specific sequence that these transcription factors bind and how transcription factors are induced in denervated muscles is not clarified. As shown in the figures, the ChIP assay provides convincing results, but the detailed sequence or map of the promoter region of proteasome genes must be shown in the figures to clarify the target sequences of NFE2L1 and PAX4, FOXO3, and NRF-1. In addition, the luciferase assay would support the results of the ChIP assay.

      Again, the reviewer raises an important question that we plan to resolve in the future. As mentioned, our findings strongly suggest a novel coordinated mechanism involving multiple transcription factors that control proteasome content in catabolic states in vivo. The enclosed revised manuscript primarily focuses on elucidating the contributions of individual transcription factors (α-PALNRF-1, PAX4, NRF-1NFE2L1 and FOXO3) to the induction of proteasome genes, revealing a significant overlap in genes regulated by multiple transcription factors. The specific mode of cooperation among these and other transcription factors and cofactors is certainly an important question for future studies, but it is beyond the scope of this lengthy paper. In the revised text we have now clarified this point (page 27). In addition, we agree that clarifying how the transcription factors are induced in denervated muscles merits some considerations and a paragraph was added to the Discussion (page 26) concerning possible mechanisms. For example, it is possible that the transcription factor STAT3 is involved in PAX4 induction because, based on previous microarray and ChIP data in cultured NIH3T3 cells, PAX4 was identified as a target gene of STAT3 (Snyder et al., 2008), and STAT3 becomes activated after denervation (Madaro et al., 2018).

      We are delighted that the reviewer found the results obtained through the ChIP assay convincing. Given the extensive scope of our investigation and rigorous analyses of dozens of genes, it is not feasible to generate luciferase-encoding plasmids for all of them. However, in response to the reviewer's request, we have carried out predictions of the binding sites of the 4 transcription factors within the minimal promoter regions (300 up- and 1000 down-stream to TSS) of the 64 proteasome sequences. The predicted binding sites are now listed in Table 2A-D. These new data further support our key findings that multiple transcription factors control proteasome gene expression in a catabolic physiological state in vivo.

      11: The results of the loss of transcription factors are well done, but the authors should also try to estimate the effect of overexpression of transcription factors in muscle. If the overexpressed transcription factors cause proteasome induction and muscle fiber mass reduction, these results strongly support the importance of transcription factor-mediated proteasome enhancement.

      We understand the reviewer’s comment but do not believe that such an experiment is necessary to support our key findings about proteasome gene induction by multiple transcription factors in vivo. In fact, we have specifically refrained from pursuing overexpression studies in this context due to the apparent coordination and some potential interdependence between the functions of PAX4 and α-PALNRF-1 transcription factors in inducing proteasome genes. Manipulating one specific gene through overexpression could potentially disrupt this delicate coordination and yield misleading results.

      In addition, there are several limitations of gene overexpression in mouse muscle, as it may not be as efficient and does not represent physiological conditions. Therefore, to validate gene functions in a physiological setting in vivo, we generated transgenic animals with the gene of interest specifically knocked-out or knocked-down. Utilizing transgenic mice lacking the gene of interest, though time-consuming, is a widely accepted and common approach that proves to be the most suitable method for specifically demonstrating the involvement of a particular gene in a physiological process, enabling a targeted and controlled investigation of its role and providing valuable insights into its contribution to the observed effects.

      Minor comments:

      12: The authors should describe the inducible KO mice more carefully and correctly. In the Results section on P12, the description of "whole body Cre+ mice" confuses the readers in understanding the mechanism of inducible Cre-mediated KO.

      We agree and have added the information requested about the KO mice to the main text and a detailed description in the methods section.

      13: In Figures 6B and 6C, the number of mice and the meaning of the asterisk should be described correctly. Is it statistically significant?

      We agree. By accident the number of mice and sign for statistical significance were omitted during processing. The correct sign was added to Fig. 6B-C, and the number of mice used, and the meaning of asterisks were added to the corresponding legend. N=10 mice per condition. **, P

      14: There is no description of Figure 6E in the manuscript. The authors should include it.

      In the original version of this paper, we refer to Fig. 6E in the text on pages 21 and 25. Also, the presented illustration is fully described in the corresponding legend.

      Reviewer #3 (Significance (Required)):

      This paper clarified a novel mechanism of proteasome induction by transcription factors in denervated muscles other than Nrf1 (NFE2L1), which has been shown to contribute to the induction of proteasome gene expression in cultured cells. This is an important paper for expanding the understanding of the field. It is also important because it has demonstrated the potential for new therapeutic targets in diseases such as type 2 diabetes and cancer.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We would like to thank all three reviewers for their careful and comprehensive reviews of our manuscript. We have taken on board all the comments and have made appropriate changes to improve the manuscript. The more substantive changes are to the structuring of the text in Introduction section, and to improving the clarity of Figure 2 after reviewers’ comments (we have added extra panels to A, F and G). Other minor changes are individually signposted in each paragraph of the point-by-point response attached below.

      We performed a number of pieces of additional analysis to address reviewer comments. To be as transparent as possible we make these and all other data analyses available in the form of .html files exported by Rmarkdown, hosted at https://joebowness.github.io/YY1-XCI-analysis/.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      • *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript uses differentiation of the highly informative inter-specific hybrid mouse ESC to follow features of genes that inactivate slowly. Resistance to silencing is reflected in reduced change in chromatin accessibility and the authors identify YY1 and CTCF as enriched amongst these 'slow' genes. This finding is provocative as these factors have been reported to enrich at both human and mouse escape genes. The authors go on to demonstrate that eviction of YY1 is slowly evicted from the X, and that removal of YY1 increases silencing.

      Minor Comments: Overall, the manuscript's conclusions are well supported; however, the brevity of the presentation in some places made it difficult to follow, and in other places seemed a missed opportunity to more fully examine or present their data.

      1. Introduction is only 2 paragraphs and half of the last is their new findings. First part of results/discussion is then forced to be very introductory. In addition, some discussion of escapees, even if predominantly human, seems warranted in the introduction. There are multiple studies that have tried to identify features enriched at genes that escape inactivation that could be mentioned.

      We have now written the introduction as 3 paragraphs instead of 2. In doing this, we have moved the sentence introducing chromatin accessibility from the results section to the introduction. Additionally, we now discuss the studies that focus on escapees (in mouse XCI) in the second introduction paragraph.

      Variation in silencing rates. 'Comparable rankings' cites multiple studies (oddly previous sentence cites only two) - how concurrent are they? Developing this further (perhaps a supplementary table) would inform whether the genes assessed are ones that routinely behave similarly across different studies/lines; and also serve as a resource for future studies.

      To avoid double-citing, we have made this one sentence and have cited at the end of the sentence 7 studies which describe gene-by-gene variability in rates of silencing. The majority of these studies include comparisons of their categories of fast and slow-silencing gene with previous classifications, and they all conclude that there is substantial concurrence. Some examples:

      • Marks et al, 2015, Table S3,
      • Loda et al, 2017, Figure 5,
      • Barros de Andrade E Sousa et al. 2019, Figure 2
      • Pacini et al. 2021, Figure 6e,i We believe this is sufficient evidence for our claim that these studies report “comparable categories” (“ranking” changed to “categories” as not all studies strictly rank). A comprehensive gene-by-gene comparison table would likely serve only to highlight differences due the various silencing assays/model systems/classification approaches used in the studies. If required, however, we would be willing to include a supplemental table which collates where gene silencing categories are discussed in each publication, and links to any supplemental files which provide full lists of X-linked genes.

      It would be helpful to give insight into informativity of cross - what proportion of ATAC-seq peaks were informative with allelic information (and similarly, what proportion of genes expressed had allelic information?

      Of the 2042 consensus ATAC-seq peaks we defined on ChrX via aggregating macs2 peaks over all time course samples, n = 821 passed our initial criteria for allelic analysis in the iXist-ChrX-Dom model line (ie they are proximal to the Xist locus in ChrX 0-103Mb, overlap SNPs, and contain sufficient allelic reads). A small number of peaks were additionally filtered out during fitting of the exponential decay model, leaving a final ATAC-seq peak set of n = 790 elements (38.6%) which we focus on in this study. We have added this information to the text (first Results paragraph).

      Our collections of ChrX genes amenable to allelic analysis were not redefined for this study. We used lists of genes defined in our previous ChrRNA-seq study (10.1016/j.celrep.2022.110830). In general, allelic analysis of gene expression is not as limited by the frequency of SNPs, because the sequence length of transcripts (including introns, which are a significant fraction of the reads in ChrRNA-seq data) is much greater than for ATAC-seq peaks. Only a few very lowly expressed genes are not amenable to allelic ChrRNA-seq analysis.

      P5: "can be influenced by Xist RNA via a variety of mechanisms" seems like it this sweeping statement could use expansion, or at least a reference. Authors could also clarify that 'distal elements assigned by linear genomic proximity is their definition of nearest gene.

      The statement that “both [chromatin accessibility and gene expression] can be influenced by Xist RNA via a variety of mechanisms” is intentionally broad to support a negative argument that we do not wish to mechanistically over-interpret the observation that Xi chromatin accessibility loss occurs slower than gene silencing. Nonetheless, we have added two references to studies which report mechanisms for how Xist may influence chromatin accessibility; via recruiting PRC1 (Pintacuda et al 2017) or antagonising BRG1 (Jegu et al 2019). That multiple molecular pathways simultaneously contribute towards the effect of Xist RNA on gene silencing is well established in the field (see reviews such as Brockdorff et al 2020, Boeren et al 2021, Loda et al 2022).

      We have clarified in the text that our definition of “distal” is all REs which do not overlap with promoter regions (TSS+/-500bp). We have also made it clearer that our definition of “nearest” gene refers to linear genomic proximity in both the Results and Methods sections.

      Figure S1 - there are 6-8 other regions that fail to become monoallelic - what are they?

      The regions which stand out most by the colour scheme of the heatmap in Figure S1 are those where accessibility increases on Xi, most notably the loci of Firre, Dxz4 and Xist, which are known to have unique features related to the 3D superstructure of the inactive X chromosome. A few other regions which do not become monoallelic harbour classic “escapee” genes. We have now labelled the locations of escapees Ddx3x, Slc25a5 and Eif2s3x in FigS1.

      The other regions noticeable in the heatmap have no obvious features which explain why they fail to become monoallelic. We have highlighted a region containing intragenic peaks within Bcor (a gene which is silenced in iXist-ChrX mESCs), but many other regions are not in the vicinity of genes. Some of the persistently Xi-accessible peaks within these regions contain strong YY1 or CTCF sites, although many others do not.

      It is also possible that some Xi-accessible peaks are artefacts of mismatches between the Castaneous or Domesticus/129Sv strain SNP databases and ground truth iXist-ChrX genome sequence. The number of these cases are small, and if a misannotated SNP is the only SNP present in a single peak, the peak is discarded by our allelic filtering criteria as it will appear monoallelic in uninduced mESCs.

      Is there any correlation between silencing speed and expression (as previously reported)? If yes, then is there also a correlation with YY1 presence - and is this correlation greater than or less than seen on autosomes?

      The data we present here pertaining to gene silencing kinetics is reused from our previous study. In that work we did indeed observe a significant association between silencing rate and initial gene expression levels (10.1016/j.celrep.2022.110830, Supplemental Information Figure S5F), which has also been reported by multiple groups previously.

      To correlate YY1 binding with gene expression levels, we calculated transcripts per million (TPM) for all genes from our genome-wide mRNA-seq data of uninduced iXist-ChrX-Dom cells (GSE185869). It is indeed true that, on average, X-linked genes classified as “direct” YY1-targets in our analysis have higher levels of initial expression (median TPM 70.8, n=64) compared to non-target genes (median TPM 30.7, n=346). Autosomal YY1 targets are also relatively higher expressed (median TPM 29.6, n=1882) than non-YY1 genes (median TPM 8.0, n=9983). Within the list of YY1-targets, there is no additional correlation between quantitative levels of YY1 ChIP enrichment (calculated in this study using BAMscale (Pongor et al, 2020)) and gene expression (R=-0.05, Spearman correlation).

      Therefore, we appreciate that this correlation between YY1-binding and gene expression levels may be a covariate in the correlation we report in this study between YY1-target genes and slow-silencing. This does not invalidate a potential functional role for YY1 in impeding silencing, as it could affect both variables via common or distinct mechanisms. Nevertheless, in an attempt to account for initial expression level as a covariate, we compared the silencing halftimes of YY1-targets versus non-targets within genes grouped by similar expression levels (low, medium and high-expressed genes). YY1-targets have slower halftimes in each comparison, and this difference is highly significant (p=1.9e-05, Wilcoxon test) for the “medium-expressed” gene group. This implies that YY1 contributes towards slower gene silencing kinetics independently of initial gene expression levels. We have added this panel to Fig2 with an associated sentence in the Results section.

      These new analyses are also appended to the documentation of the R scripts used to generate the main figures in this study (Figure2_YY1association.Rmd), which will all be published to Github.

      It is also important to note that this analysis approach is complicated by the methodology we use to classify YY1 target genes. In this study, we define YY1 targets based on the presence of ChIP-seq peaks overlapping the gene promoters, which is reasonable and widely accepted practice when defining targets of transcription factors. However, as briefly discussed in the Methods, in YY1 ChIP-seq data samples with very high signal:noise (eg Fig3), minor peaks of YY1 enrichment can be detected at almost every active promoter. As enrichment at these peaks is typically much less than at peaks with occurrences of the YY1 consensus DNA motif, we hypothesise that these small peaks result from secondary YY1 cofactors enriched at promoters (eg P300, BAF, Mediator) rather than direct sites of binding to DNA/chromatin. Therefore, for annotating genes as “direct” YY1 targets, we chose to use the YY1 peak set defined from lower signal:noise ChIP-seq data in iXist-ChrX produced with the endogenous YY1 Ab. Nevertheless, this behaviour is likely to confound any analysis correlating YY1 ChIP binding with gene expression.

      Figure 2: Have the authors considered using quartiles rather than an arbitrary division into depleted and persistent?

      We primarily chose this binary classification of REs as either Xi-“persistent” or Xi-“depleted” to maximise the numbers of sequences that could be used in each group as input for the HOMER motif enrichment software.

      It is also not trivial to separate REs into quartiles because our “Xi-persistent” classification includes peaks defined as “biallelically accessible in NPCs”, as well as peaks with slow accessibility halftimes. This is explained in both the Results and Methods but we now have edited Fig2A to make it clearer. Instead of quartiles, we have performed an analysis which keeps “biallelically accessible REs” as a separate category and subdivides the remaining peaks into three groups by halftimes (slow, intermediate and fast accessibility loss). The same trends are evident with this four-category approach as with the two-category approach.

      Importantly, our follow-up analyses which confirm the association between YY1 binding and slow Xi accessibility loss (Fig2E) and slow silencing (Fig 2F-H) are independent from categorisations of REs which rely on arbitrary thresholds.

      1. Could simplify secondary labels to solely YY1 and CTCF. D & F do not print in black and white. Overall the mESC versus NPC can be confusing, perhaps mESC (no diff) would be helpful?

      We have simplified the secondary labels in Fig2B and modified the colour scheme of FIg2D and Fig2F as suggested. “mESC” is now modified to “mESC no diff” in Fig2H, FigS2B, Fig3C and Fig3E to reduce the potential for confusion.

      The numbers appear to suggest YY1 is generally enriched on X, but not at promoters?? Is this true?

      The explanation for this is that clear peaks of YY1 ChIP are found at young LINE1 elements in iXist-ChrX mESCs (specifically over L1Md_T subfamilies). These elements are highly enriched (>2-fold) on the mouse X chromosome compared to autosomes (Waterston 2002), and the majority are not promoter-associated. We chose not to include a discussion of YY1 enrichment at repetitive LINE1 elements in this study primarily because of a) issues related to multiple-mapping reads, such as difficulties distinguishing ChrX vs autosomal reads, and b) the absence of strain-specific SNPs within annotated ChrX L1Md_Ts means that none of these elements are amenable to allelic analysis so we cannot compare Xi versus Xa. However, these LINE1 peaks are a significant fraction (262/521) of the numbers of YY1 ChIP-seq peaks in Fig2C.

      For Figure 2f, it might be helpful to show autosomal genes - are Fast depleted or Slow enriched for YY1 relative to autosomes?

      We have calculated these numbers as part of the analysis of gene expression on ChrX and autosomes above. Overall, the fraction of genes defined as YY1-targets is the same on ChrX as on autosomes (~0.16). Accordingly, fast-silencing genes are depleted for YY1 compared to autosomes, whereas slow-silencing genes are enriched for YY1 compared to autosomes. Fig2F is now redesigned to include the total numbers of YY1-target genes on ChrX and autosomes.

      More generally, is YY1 binding on the X lost more slowly than YY1 binding on autosomes, or is the slow loss a feature of YY1. While I agree YY1 could have direct up or down-regulatory roles, Figure S3 could also be reflecting a secondary impact.

      We agree that many of the differentially regulated genes after 52 hours of YY1 degradation could be secondary effects and have added a sentence on this to the relevant paragraph in the text.

      Figure 3, 4 and supplementary - the chromosome cartoon introduces the LOH in iXist, but this needs to be described in text. Describing the reciprocal as a biological replicate seems challenging given this LOH.

      It is true that the reciprocal lines iXist-ChrX-Dom and iXist-ChrX-Cast are not true biological replicates, and we try to avoid referring to them as such. Writing this in the legend of Fig3 was an error which we have corrected. We have now also mentioned the recombination event in the iXist-ChrX-Dom cell line at the point where data from this line is first discussed (paragraph 1 of Results section).

      For the latter parts this work (Figs 3 and 4), we made the conscious decision to proceed with two YY1-FKP12F36V cell lines from different reciprocal iXist-ChrX backgrounds (aF1 in iXist-ChrX-Dom, cC3 in iXist-ChrX-Cast), rather than “biological replicate” clones from either iXist-ChrX-Dom or iXist-ChrX-Cast. Our reasoning was to control against potential confounding effects of strain background on our experiments related to the role of YY1. Although there were some minor differences between the clones, aF1 and cC3 demonstrated essentially equivalent phenotypes in all analyses we performed.

      Could a panel of TFs be used rather than OCT4 which has its own unique properties to emphasize that YY1 is unique?

      This would indeed be worthwhile, and we did consider attempting to perform ChIP-seq for additional TFs other than OCT4 in order to collect more points of comparison for the slow rate of loss of YY1 binding to Xi. However, it is admittedly hard to identify appropriate candidate TFs in mESCs which a) have similar numbers of discrete peaks of binding in promoters and distal elements on ChrX and b) it is possible to reliably perform ChIP-seq for at sufficiently high signal:noise to allow for quantitative allelic analysis.

      We have changed the text to acknowledge that our comparison only to OCT4 limits the scope of the statements we can make about unique properties of YY1 binding.

      Figure 4 - by examining 'late' genes, a change in allelic ratio is observed, but what about escape genes (e.g. Kdm5c, Kdm6a)? Do they now become silent? It would be helpful to have all this data as a supplementary table so people could query their 'favourite' gene.

      YY1 degradation experiments performed for Figure 4 were performed on mESCs without cellular differentiation (YY1-ablated cells do not survive in our mESC to NPC differentiation protocol). In undifferentiated mESCs, silencing of the inactive X does not reach completion, and in fact all X-linked genes are residually expressed at a higher level than in equivalent timepoints of Xist induction with NPC differentiation (see Figure 4D, Bowness et al 2022). We write in the text “slow-silencing genes are residually expressed from Xi” because genes of this category account for the majority of expression under these conditions, and indeed almost all slow genes would all be classed as “escape genes” in this setting by a conventional definition of >10% residual expression from Xi (see also Figure 4D, Bowness et al 2022). Our analysis in Fig4D (of this study) includes all genes, and we share processed .txt files of allelic ratio and allelic fold changes in GEO, so querying the behaviour of a favourite gene would be easy (GSE240680).

      Incidentally, when we do perform NPC differentiation of iXist-ChrX NPC, at late stages very few genes show any expression from Xi (Ddx3x, Slc25a5, Eif2s3x and Kdm5c clearly escape, but even Kdm6a is entirely silenced). Unfortunately, with such a small number of “super” escapees it is hard to make any general conclusions, so in this study we can only make inferences about escape via the transitive property that many “slow-silencing” genes are facultative escapees in other settings without induced Xist overexpression. We now write about this consideration in the introduction and final paragraph of the main text.

      It seems surprising that loss of YY1 has no demonstrative impact on the Xa. Figure S3B suggests that over 1000 genes are significantly impacted - primarily down regulated. How many of those are X-linked? Perhaps they could be colored differently?

      For the broad-brush differential expression testing in FigS3B, we use all the ChrRNA-seq samples (6 x untreated, 6 x dTAG) as “pseudo-replicates”, disregarding any confounding effects related to induced Xist-silencing as effecting untreated and dTAG sample groups equivalently. We did specifically investigate the behaviour of X-linked genes in this volcano plot, however only a very small number of genes were differentially expressed (n=22 X-linked genes appeared significantly downregulated compared to n=4 genes upregulated). This can be seen in our analysis records uploaded to Github.

      Additionally, there is actually a minor effect of YY1 loss on expression of YY1-target genes on Xa. This can be seen in Fig4F, where the median lines of YY1-target boxes lie below the horizontal line of 0-fold change.

      Since XIST+/undifferentiated cells retain YY1, is YY1 binding sensitive to DNAme? Indeed, are X chromosome bound sites in islands that become methylated? Figure S4 shows YY1-targetted X genes in SMCHD1 knockout; can CTCF targets also be shown? While identified in Figure 2, CTCF was not examined the way YY1 was, although it has also been identified in somatic studies of genes that escape X inactivation.

      Binding of YY1 is indeed sensitive to DNA methylation; specifically it is reported to be blocked by CpG methylation (see refs (Kim et al, 2003; Makhlouf et al, 2014; Fang et al, 2019). Thus, crosstalk with the DNA methylation pathways, which deposit de novo CpG island methylation as a late event of XCI (Lock 1987, Gendrel 2012), did appeal to us as a potential mechanism of YY1 “eviction”. However, preliminary analysis we performed to investigate this revealed limited overlap between YY1 binding sites and de novo meythlated CpG islands in the iXist-ChrX model cell line.

      FigS4 presents ATAC-seq data from two iXist-ChrX SmcHD1 KO clonal cell lines, comparing the accessibility loss kinetics between YY1-binding and non-YY1 REs in these cells.

      Although FigS4 in this paper does not show genes, we have previously published ChrRNA-seq data from these SmcHD1 KO lines over a similar Xist induction + NPC differentiation time course (Figure 6, Bowness et al, 2022). A reanalysis of this ChrRNA-seq data by YY1-target vs non-target genes shows a similar trend to the accessibility data, although this is expected from the strong overlap of both “YY1-target” and “SmcHD1-dependent” genes with slow-silencing genes in our model.

      With respect to CTCF, we have performed a similar analysis of this data separating ATAC-seq peaks by CTCF-binding rather than YY1-binding. This shows a similar trend to YY1, but is overall less pronounced, and is now included in our analysis records. We have reported previously that loss of CTCF from many binding sites on Xi requires SmcHD1 (Gdula et al, 2019).

      When the authors use cf. do they simply mean see also, or as wikipedia suggests: "the cited source supports a different claim (proposition) than the one just made, that it is worthwhile to compare the two claims and assess the difference". Perhaps it would be worth spelling out to clarify for the audience.

      We used “cf.” in the text to mean “compare with”, when referring to a plot/observation/piece of data outside of the figure being immediately discussed (either in another study or different section of the paper). We were not aware of the recommendation to only use the cf abbreviation when the two items are intended to be contrasted. We do not believe this to be a universal grammatical convention, but nevertheless have changed incidences of cf. to “see also”.

      Reviewer #1 (Significance (Required)):

      General assessment: An important question in human biology is how much the sex chromosome contributes to sex differences in disease frequency. Genes that escape X inactivation in humans seem to have considerable impact on gene expression genome-wide. While there are not as many genes in mouse that escape inactivation, the use of the mESC cell differentiation approach allows detailed assessment of the timing of silencing during inactivation. The authors utilize an inter-specific cross and it would be interesting to know the limitations of such a system (in terms of informative DHS/genes that are informative).

      Advance: As the authors note, there are multiple studies of similar systems that have revealed differences in the speeds of silencing of genes. However, this is the first study to my knowledge that has then tried to assess timing with gene-specific factors. There are multiple studies in humans comparing escape and subject genes for TFs, but lacking the developmental timing that this study incorporates.

      Audience: While generally applicable to a basic research audience interested in gene regulation, the applicability to human genes that escape inactivation may interest cancer researchers or clinical audiences interested in sex differences.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors studied the molecular basis of a variation in the rate of individual gene silencing on the X undergoing inactivation. They took advantage of ATAC-seq to observe the kinetics of chromatin accessibility along the inactive X upon induction of Xist expression in mESCs. They demonstrated a clear correspondence between the decrease in chromatin accessibility and the silencing of nearby genes. Furthermore, they found that persistently accessible regulatory elements and slow-silencing were associated with binding of YY1. YY1 tended to associate longer with genes that required more time to be silenced than those that became silenced fast on the inactive X during XCI. The acute loss of YY1 facilitated silencing of slow genes in a shorter period. They suggest that whether or not the transcription factors stay associated longer is another factor that impacts the variation in the rate of gene silencing on the Xi.

      Reviewer #2 (Significance (Required)):

      It has been suggested that the rate of gene silencing during XCI is varies depending on the distance of individual genes from the Xist locus or the entry site of Xist RNA on the X, as well as their initial expression levels before silencing. This study provides another perspective on this issue. The persistent association of transcription factors during XCI affects the rate of gene silencing. Although the issued addressed here might draw attention from only the limited fields of specialists, their finding advances our understanding of how the efficiency of silencing is controlled during the process of XCI. The experimental data essentially support their conclusion, and the manuscript was easy to follow. However, I still have some comments, which I would like the authors to consider before further consideration.

      Major concerns 1. Based on the results shown in Figure 3E and F, the authors concluded that YY1 was more resistant than other TFs against the eviction from the X upon Xist induction. I am not still convinced with this. YY1 binds DNA via the zinc finger domain, while Oct4 binds DNA via the homeodomain. The difference in the binding module between them might affect their dissociation or the response to Xist RNA-mediated chromatin changes. In addition, given that YY1 has been reported to bind RNA, including Xist, as well, Oct4 might not be a good TF to compare.

      We acknowledge and agree that our singular comparison between YY1 and OCT4 is insufficient to support a general conclusion that YY1 is unique with respect to its binding properties on Xi. This was also alluded to by Reviewer #1 (see 10.), where in response we write about the difficulties of selecting other appropriate/feasible candidate TFs for ChIP-seq in order to widen the comparison beyond OCT4. In consideration of this concern, we have re-phrased our conclusions regarding this point in the text, both at the point where it is first presented (Fig3F) and in the first discussion paragraph.

      Furthermore, the difference in allelic ratio change between YY1 and OCT4 is admittedly not dramatic, and this metric can be influenced somewhat by the properties of the sets of peaks used (which is also why we have not tried to add statistical significance to this comparison in Fig3F). In order to make the comparison with OCT4 (a classic pluripotency factor), we were also limited to using mESC culture without differentiation conditions. It is possible that more pronounced differences between YY1 and other TFs would be observed under conditions where XCI is able to proceed further.

      Even so, we contend that our observation that YY1 binding is lost from the Xi relatively slowly likely stands without a requirement for a comparison with OCT4 or other transcription factors. The decrease in allelic ratio for YY1 ChIP occurs more slowly than overall loss of chromatin accessibility from REs, which is arguably a more general proxy for TF binding, and much slower than kinetics of gene silencing (Fig3D and FigS2C). In addition, no other TF motifs (except CTCF, which has its own unique properties) were found significantly enriched within persistently-accessible REs, which would be an expectation if a different factor had similar properties of late-retained Xi binding as YY1.

      Thus, overall we have tried to write the paper without overstating in isolation the importance of our claim that YY1 binding on Xi is relatively resistant to Xist-mediated inactivation, instead emphasising that it should be considered alongside the other pieces of data in the study.

      I don't think that Kinetics of YY1 eviction upon Xist induction in SmcHD1 KO cells during NSC differentiation fit the phenotype of Smchd1mutant cells. Although their previous study by Bowness et al (2022) showed that Smchd1-KO cells fail to establish complete silencing of SmcHD1-dependnet genes, their silencing still reached rather appreciable levels according to Figure 6 of Bowness et al (2022). This is, in fact, consistent with the idea that XCI initially takes place in the mutant embryos, at least to an extent that does not compromise early postimplantation development. On the other hand, a significant portion of YY1 appears to remain associated with the target genes on both active and inactive X (Figure S4), which I think suggests that the presence of YY1 is compatible with silencing of SmdHD1-dependent genes. This is contradictory to the proposed role of YY1 that sustains the expression of X-linked genes in this context.

      At any given timepoint of XCI, our data sets of gene silencing (ChrRNA-seq) consistently show a more pronounced allelic skew compared to chromatin accessibility (ATAC-seq). This behaviour is discussed in relation to Figure 1 in the text (see Results paragraph 2). We do not wish to overinterpret this quantitative difference because the assays are technically different and accessibility is not linearly correlated with gene expression. With this in consideration, we interpret the ATAC-seq data presented in Figure S4 to be fully consistent with the iXist-ChrX SmcHD1 KO ChrRNA-seq data in Figure 6 of our previous publication ie. a small increase in residual Xi gene expression from SmcHD1 KO NPCs is accompanied by a more appreciable increase in residual Xi chromatin accessibility. In line with this, it would not be contradictory for substantially increased Xi YY1 binding to sustain a quantitively small (but nonetheless meaningful) increase in residual gene expression from Xi.

      Additionally, the context in which we include this SmcHD1 KO ATAC-seq data in the current paper is to hypothesise a potential role for SmcHD1 in contributing towards the eventual removal of YY1 binding from Xi. This hypothesis is essentially based on two observations; 1.) There is substantially more residual YY1 binding to Xi in mESC no diff conditions (Figure 3) and 2.) One difference between no diff and diff conditions is absence of SmcHD1 recruitment in the former (Figure 5 in our previous study). The new SmcHD1 KO ATAC-seq data adds a third observation which supports the hypothesis - that YY1-bound REs are appreciably more accessible from Xi in SmcHD1 KO. However, none of these observations are direct evidence of a link between SmcHD1 and YY1, and more experiments would be required to substantiate this potential mechanism. If confirmed, it would be logically reasonable to suggest a role for YY1 in contributing towards the residual expression of X-linked in the context of SmcHD1 KO, but we do not yet claim this, and a potential link with SmcHD1 KO is not the main focus of the paper.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Bowness and colleagues describe the interesting finding that the transcription factor YY1 is associated with slow silencing genes in induced X-Chromosome Inactivation (XCI). The authors have conducted a comprehensive characterization of X-linked gene silencing and the loss of chromatin accessibility of regulatory elements in induced XCI in ESCs and during NPC differentiation. X-linked gene silencing was classified into four categories, ranging from fast-silenced genes to genes that escape silencing. Motif enrichment analysis of regulatory elements associated with slowly silenced genes identified YY1 as the transcription factor most significantly enriched. The separation of YY1-target and non-target genes confirmed that most genes bound by YY1 indeed exhibit slower silencing kinetics. A comparison of the binding kinetics of YY1 to another transcription factor, OCT4, during XCI revealed that YY1 is evicted more slowly compared to OCT4 on the inactive X, suggesting that slower eviction is a unique property of YY1. Conditional knock-outs of YY1 using protein degradation during induced XCI in mESCs demonstrated that the loss of YY1 at target genes enhances silencing. This supports the hypothesis that YY1 serves as a crucial barrier for slow-silenced genes during XCI. Finally, the authors propose a hypothesis regarding the mechanism of YY1 eviction, suggesting a potential connection to the role of SmcHD1 during XCI.

      The authors provide an in-depth analysis of the role of YY1 in gene silencing kinetics during induced XCI and believe this manuscript should be published if our comments are addressed.

      Major comment:

      Based on the allelic ratio in figure 3C only minor loss of YY1 binding occurs in induced XCI in mESCs on the Xi, while silencing is established properly as shown in figure 4C (left panel, red boxplots). This suggests that YY1 eviction is not necessarily required for these genes to be effectively silenced. Could the authors explain this discrepancy in the data regarding their manuscript conclusions? It seems this is true for XCI happening during differentiation towards NPCs, but not if cells are stuck in the pluripotency stage?

      Whilst indeed substantial, we do not consider the silencing seen for 6-day mESCs in Fig4C to be “established properly”. We refer to our previous publication (Figure 4 Bowness et al., 2022), which shows that silencing at equivalent timepoints under differentiation conditions (d5-d7) is significantly more pronounced (near-“complete”). Indeed, the level of silencing reached by YY1-FKBP mESCs (Xist induced but no dTAG treatment) aligns with the plateau of silencing in undifferentiated mESCs we describe in our previous study (median allelic ratio of approximately 0.1).

      We conclude that YY1 contributes somewhat to sustaining this residual expression in mESCs, because a) substantial YY1 binding remains on Xi at these timepoints in mESCs and b) silencing increases with degradation of YY1 (the latter is more direct evidence). Notably, silencing does not progress to completion (allelic ratio of 0) in the absence of YY1, so we do not claim that YY1 is the only factor sustaining residual Xi gene expression in mESCs.

      We interpret this comment to be a fundamentally similar concern to that raised by Reviewer #2 (2.), but in the context of undifferentiated mESCs rather than SmcHD1 KO. As stated above, we do not think it inherently contradictory for substantially increased Xi YY1 binding to sustain a quantitively small (but nonetheless meaningful) increase in residual gene expression from Xi.

      Minor comments:

      1. In the abstract lines 7-8, the authors state that the experiments were performed in mouse embryonic stem cell lines, but much of the data shown is acquired in NPC differentiations. Please adjust abstract.

      We have adjusted this sentence in the abstract to include that many of the experiments in the paper involved differentiation of iXist-ChrX mESCs.

      The last sentence of the abstract states that YY1 acts as a barrier to silencing but as stated in my major comment, that does seem to be the case in ESC differentiation towards NPCs, but not in ESCs themselves. Please tone down this sentence. Moreover, we do not fully understand where the 'is removed only at late stages' comes from? Is this because of the Smchd1 link? We find this link quite weak with the data presented. We would tone down that last abstract sentence.

      We have toned down the final sentence of the abstract accordingly. We agree that “removed only at late stages” is unsubstantiated since YY1 binding on Xi decreases over the entire time course (albeit slowly). However, we maintain that a connection between YY1 and late stages of the XCI process is reasonable to infer from the various pieces of evidence we provide in the study (egs YY1 is persistently enriched in accessible REs, it is associated with slow-silencing genes, and it remains bound to Xi in undifferentiated mESCs).

      Several comparisons to human XCI have been made in the article. We do agree that there are similarities between mouse and human XCI. However, there is insufficient data that substantiates that these genes are regulated in a similar manner in humans. We believe the comparisons should be removed altogether or attenuated.

      We agree that there is nothing in our data that directly pertains to human XCI. Comparisons to human are only made twice in the paper: Initially in the introduction to make a broad statement that many mechanisms of Xist function are conserved between species, and finally as speculation in the last discussion paragraph. We think it is relevant to acknowledge the parallels between our study, which links YY1 binding with resistance to Xist-silencing in a mouse ESC model, and literature describing a similar association between YY1 and XCI escape in humans.

      At bottom of page 4, the authors say that for any given gene, the allelic ration of accessibility at its promoter decreased more slowly than it silenced and then write Fig 1B. They probably mean S1C? Since 1B only shows 4 genes.

      The phrase “any given” was used colloquially (ie imprecisely), so we have replaced it with “individual”.

      Figure 1B shows the average allelic ratio of multiple clones for genes representing different silencing speeds. Each data point is the average of multiple clones for these representative genes, could the authors show the individual data points or the standard deviation?

      Fig1B predominantly shows the averages of only two replicate time-courses of Xist induction with NPC differentiation using the same parental clonal cell line, iXist-ChrX-Dom, but performed on different dates and passages. We regenerated the panel without merging the replicate data points, but this has little effect on the plot (see the Rmarkdown html file of Figure 1 on Github).

      Figure 1B. Loss of promoter accessibility lags behind loss of chromatin-associated RNA expression for these 4 genes. What about distal REs? Do the allelic ratios for the distal REs more closely follow chromatin-associated RNA expression? Could the authors show this in a supplemental figure?

      We comment from FigS1C on the general trend that accessibility decrease from Xi occurs slower than gene silencing (measured by ChrRNA-seq). We then find in FigS1D that distal elements lose accessibility slightly faster than promoters. Although overall the allelic ratio decrease of distal (non-CTCF) RE accessibility is slightly closer to the trajectory to that of gene silencing, it remains substantially slower (see again the Rmarkdown .html file of Figure 1 on Github).

      An equivalent plot to Fig1B showing distal REs would rely on our simplistic assignment of distal elements to their nearest genes. We believe this is reasonable generalisation for investigating chromosome-wide trends but unlikely to be sufficiently accurate at the level of specific genes.

      Figure 1B: gene silencing trajectory is depicted left while the legend says right. Same for promoter accessibility.

      The legend is now corrected.

      Figure S1A shows only part of the X chromosome. The area downstream of Xist is missing. Is this because the iXist-ChrXDom cell line is missing allelic resolution as shown in figure S2A? Could the authors explain in the figure legend that part of the X-Chromosome is missing?

      We have now included a reference to the recombination event in the iXist-ChrXDom cell line both when we present data from this background in the first paragraph of the Results section, and in the legend of FigS1A.

      Figure 2C shows that 94 TSSs bear a YY1 peak, yet Fig 2F shows 62 are targets of YY1. Is this because the rest are not properly silenced or are escapees?

      Fig2C shows the numbers of ChrX YY1 ATAC-seq peaks which overlap with “promoters” (ie regions +/- 500bp of a TSS). By contrast, Fig2F shows ChrX genes classified as direct YY1-targets for allelic silencing analysis. The discrepancy between these numbers is due to a number of reasons:

      1. It is possible for multiple YY1 peaks to overlap the same promoter (eg one peak overlaps 500bp upstream, a separate peak overlaps 500bp downstream).
      2. The count in Fig2C is not restrictive to one TSS per gene in cases where there are multiple transcript isoforms in the gene annotation, thus multiple YY1 peaks can overlap different promoters for the same gene.
      3. A few genes do not pass our filters for allelic silencing analysis (eg they are too lowly expressed). Some YY1 peaks may overlap these genes. We hope the revised version of Fig2F, which includes numbers of direct YY1 target genes on autosomes and ChrX, makes the distinction between these two numbers clearer.

      Moreover, YY1 has ~4-fold more peaks on the X chromosome on distal elements compared to promoters. Yet figure 2F exclusively shows the proportion of YY1 binding sites on TSSs. Would distal REs show similar proportions for the silencing categories? Could the authors show the differences in a Supplemental figure?

      As discussed in the response to Reviewer #1 (point 8.), a large fraction of distal YY1 peaks on ChrX are at LINE1 elements, which are not amenable to allelic analysis. Excluding these peaks results in a smaller number of distal elements bound by YY1. The application of our filters for allelic analysis reduces the number of distal YY1-bound REs even more, and our assignment of distal REs to their nearest gene is imprecise. For these reasons, we do not think a comparison of genes classified by whether they are putative targets of distal YY1-bound enhancers is informative.

      The authors switch between different model systems in the figures, which makes quite confusing which type of XCI is being discussed. We would like to see clearly stated above all panels which cell culture condition is being studied (mESCs or NPCs).

      We have tried to improve this potential source of confusion by modifying “mESC” to “mESC no diff” in the relevant figure panels (see response to Reviewer #1 comment 7B), and adding “in mESCs without differentiation” to the title of Figure 4.

      In Figure 3E and 3F the authors look at the binding retention of OCT4 during XCI in ESCs. However, it is not clear why the authors choose OCT4. Could the authors explain why specifically OCT4 was chosen for these analyses?

      In our responses to the other reviewers, we discuss the limitations of only having one other TF to compare to YY1. The choice of OCT4 was primarily dictated by our experience and confidence in being able to generate high quality ChIP-seq data of this factor.

      As it was essentially arbitrary for the purposes of this paper, we have added a comment to this effect in the text (“with that of a different arbitrary TF, OCT4”).

      What is the expression level of YY1 in NPCs compared to mESCS? In Supplemental S2A, it seems that YY1 protein levels decrease over time during NPC differentiation. Is part of the increased eviction a result of lower protein levels of YY1? Probably not since you calculate ratios between Xi and Xa. Can you please comment on this?

      We were similarly intrigued by this apparent decrease in YY1 protein levels in NPCs (there is no decrease on the RNA level) and initially considered if it could contribute to the relative.

      In FigS2A specifically, the d18 NPC band is probably just a poor quality sample extraction. Our ChIP-seq data generated from the same sample is similar poor compared to the others (FigS2B). In other YY1-FKBP12F36V clones we derived and characterised by Western (not described further in this study, but will likely be published as raw source data for the cropped blots we show in FigS2A), the apparent difference in YY1 protein levels in NPCs is less pronounced. Although a minor decrease in YY1 protein in NPCs seems to be robust, we do not think it relevant in the context of our analysis of YY1 and XCI, as we almost always use Xa as internal comparison for any observations made about Xi.

      On page 7 the authors state that degrading YY1 does not affect Xist spreading and/or localisation. Indeed, it has been previously shown by other groups that YY1 is required for Xist localisation during XCI. Could the authors elaborate further on the why their cells behave differently compared to the Jeon 2011 paper?

      We are working with a mouse ESC model of inducible Xist from its endogenous locus on ChrX and using the dTAG system to degrade YY1 protein. By contrast, Jeon 2011 worked with an Xist transgene integrated at random in the genome of mouse embryonic fibroblasts (MEFs) and siRNA knockdown of YY1. The difference in our observations could be linked to any of these 4 differences (ie cellular context, Xist genomic location, Xist introns, knockdown strategy), but we cannot identify a specific explanation.

      In figure 4G and figure S3D elevated levels of Xist are observed in the dTAG conditions. As the authors point out, this could then result in accelerated silencing of the X seen upon YY1 loss. Are these elevated Xist levels that result in enhanced silencing in figure 4 relevant for the kinetics of silencing? Moreover, YY1 could act as transcriptional regulator of those genes in the X and by removing YY1, one would expect decreased transcription, which would be read as accelerated silencing. The authors could see whether the genes that show accelerated silencing are regulated by YY1 in ESCs (+ dTAG, - Dox).

      We agree that these points are important to consider when interpreting the results of the YY1-FKBP12F36V ChrRNA-seq we present in Figure 4. However, we believe they are covered in the text during our discussion of the data.

      In relation to the final suggestion, the silencing of almost all X-linked genes is increased upon YY1 removal so separating a specific set of genes which show accelerated silencing would be difficult. Nevertheless, in Fig4F we report that the increases in Xi silencing are strongest for direct YY1 target genes. In fact, these genes also show a minor decrease in expression in the + dTAG - Dox condition (see response to Reviewer #1 point 12.). However, by-and-large the differences in Xa log2FCs between YY1-target and non-target genes are less statistically significant. Non-significant p-values are not shown on Fig4F, but can be found in our Rmarkdown analysis records.

      Can the authors explain why they decided to put the Smchd1 part after the conclusion? Before the conclusion would have been better? The probable link between YY1 and SmcHD1 is definitely something important to investigate.

      Supplemental FigS4 relating to SmcHD1 is more speculative and we lack direct mechanistic evidence linking YY1 and SmcHD1. It would require more experiments to substantiate this as a mechanism. We think these experiments could potentially be very interesting, but are beyond the scope of this study.

      In the paper the authors cite Bowness et al., 2022. In it, Figure 5F studies silencing times with respect to silencing dependency on SmcHD1. What is the overlap between SmcHD1 target genes and YY1 target genes? This would provide more data about the correlation between YY1 and SmcHD1.

      There is an association between YY1 target genes and our previous categories of genes based on SmcHD1 dependence (13/56 SmcHD1_dependent genes are YY1 targets compared to only 8/101 of SmchD1_not_dependent genes). However, this enrichment of YY1 targets in SmcHD1 dependent genes is not so striking to warrant inclusion into the (very short) discussion of SmcHD1 in this paper. This association is also expected from the fact that both YY1-target genes and SmcHD1-dependent genes associate with the set of slow-silencing genes.

      Of note, our categories of SmcHD1 dependency were in fact defined in a previous study (Gdula et al., 2019) from a different cellular model (SmcHD1 KO MEFs).

      The authors hypothesise that SmcHD1 might play a role in the eviction of YY1 in NPC differentiation. The current data shows impaired silencing of slow silencing genes and YY1-dependent genes in the SmcHD1 knock-out. However, it doesn't show SmcHD1 is required for YY1 eviction. Could the authors provide direct evidence for their hypothesis by performing NPC differentiation in wild type and SmcHD1 knock-out cells and investigate YY1 binding using ChIP-seq?

      The data we show in FigS4 is ATAC-seq data. It shows that YY1 target REs are particularly more accessible from the Xi in SmcHD1 KO, which is not direct evidence but does align with a potential role for SmcHD1 in mediating removal of YY1 binding from Xi (see our response to Reviewer #2’s comment 2.). We agree that YY1 ChIP-seq over the same time course would be an interesting experiment, but arguably this would also only be indirect evidence (ie increased Xi YY1 enrichment may be due to a confounding consequence of SmcHD1 KO). We therefore believe the full suite of experiments needed to rigorously test the hypothesis are beyond the scope of this paper.

      In figure S4A and S4B no significance is indicated among the different conditions across the different differentiation days. Could the authors add this?

      At all timepoints, differences of Xi accessibility between YY1-binding vs non-YY1 REs are significant. P values are now added to FigS4 and the statistical test is described in the legend.

      Finally, we would like the authors to elaborate in the conclusion about the order of events. As they correctly state at the top of page 5 (and we agree), delayed loss of promoter accessibility compared to gene silencing does not automatically mean that it is downstream of gene silencing. Can you elaborate on this? Also, in light of Fig S2C where loss of YY1 binding seems to happen after gene silencing.

      We mention in the text and in the above response to Reviewer #2 (point 2.) that we do not wish to overinterpret this quantitative difference because the assays are technically different and accessibility is not linearly correlated with gene expression.

      It is possible to speculate plausible biological explanations for this discrepancy in kinetics between accessibility loss, TF binding and gene silencing. For example, a change in the landscape of histone modifications at a promoter may have little effect on its accessibility to TFs but directly hinder RNA Polymerase II in initiation and/or elongation of transcription of the gene. However, we prefer to keep this speculation out of the main text of the paper.

      Reviewer #3 (Significance (Required)):

      This manuscript highlights a novel role for YY1 in XCI. The manuscript provides an analysis of the correlation and causation of YY1 in gene silencing during XCI. There is a clear correlation between YY1 and delayed silencing of genes on the Xi. To our knowledge, this is the first time such an analysis has been performed for YY1. It advances our conceptual and mechanistic understanding of gene silencing kinetics and what the factors involved in it are. We believe it is an important contribution to the XCI field and will be of great value to the XCI community.

      Strength:

      This study presents a comprehensive and in-depth characterization of X-linked gene silencing during XCI.

      Two different types of inducible XCI are studied and compared (ESCs vs differentiation towards NPCs), which we are grateful for.

      Systematic and stepwise analysis of the data is very strong.

      Many data points have been collected which provide stronger conclusions.

      Weakness:

      Some sentences in the abstract should be toned down.

      YY1 eviction on the inactive X doesn't seem crucial to establish X-linked gene silencing in mESCs.

      The mechanistic approach at the end of the manuscript with relation to SmcHD1 could be studied further.

      This paper will be suited for a specialised audience in XCI and transcription factor control of gene expression, i.e. basic research.

      Field of expertise: XCI, epigenetics, Xist, gene silencing, X chromosome biology.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statement We very much appreciate the reviewers' thorough comments and are sincerely grateful for their kind remarks on the novelty and interest of our manuscript. We are confident to have addressed all the points that they have raised including new data, as well as revised figures and text.

      Point-by-point description of revisions All the revisions have been already carried out and included in the transferred manuscript.

      Reviewer #1

      Major comments:

      > The number of the replicates/animals for the experiments described in Figures 1 and 2 should be reported either in the figure legends or in the methods (statistical analysis). We have added the required numbers to the corresponding revised figures, as requested.

      > A relevant part of the discussion repeats what the authors have already said in the results. I would recommend to reorganize this section, emphasizing the importance of these results in the context of human brain tumors.

      Following our own style, we have written a very short (46 lines in length!) Discussion. We dedicate a few lines to highlighting two points: (1) the suggestion, derived from our allograft experiments, that the initial stages of tumour development and long-term tumour growth may be molecularly distinct events, and (2), the unique effect of the combined loss of TrxT and dhd on mbt tumour transcriptomics -unique because none of the suppressors of mbt reported before are as effective in erasing both the MBTS and SDS mbt signatures. Neither of these points are raised in Results. In the remaining few lines we put our results in the context of human Cancer/Testis and elaborate on the fact that the TrxT and dhd pair qualify as head-to-head, CT-X genes, like those reported in human oncology. This is as far as we are willing to go at this stage at emphasizing the importance of our results in the context of human tumours.

      Reviewer #2

      > 1. Figures should include information regarding the sex of the larvae, particularly as there has been a previously reported sex-linked effect in the phenotypes analysed. (e.g. in Figure 2 and Figure S1, where Indication of the sex of the animals should be provided in the figure OK and not just in the figure legend). We fully agree. Sex must always be taken into account as a biological variable. All the experiments reported in the manuscript were carried out with sexed samples, and were annotated accordingly in the original text. In compliance with the reviewer's request we have added this information also to the revised figure.

      *> 2. Data regarding fertility. Can this be shown in a table format? Are dhdKO females fully sterile? What are the fertility levels of Df(1)J5? * Please note that we are not discovering anything here but merely corroborating what has been published before: the lack of TrxT does not affect fertility in either sex; the lack of Dhd results in female sterility (Torres-Campana et al., 2022, Tirmarche et al., 2016, Svensson et al., 2003, Pellicena-Palle et al., 1997). Adding a table would not be justified. Moreover, it would be a rather simple table: all single-pair mating tests (n=10 for each genotype) with Trxt KO and Dhd KO males, and TrxT KO females were as fertile as control flies, while all single-pair mating tests (n=10) with Dhd KO females were sterile.

      > 3. Are dhd and TrxT the only genes affected by Df(1)J5? Is there transcriptional data from Df(1)J5 animals to suggest that nearby genes are not affected by the deficiency? Of particular interest would be to assess if snf is affected or not as it is a known regulator of gene expression and splicing. Yes dhd and TrxT are the only genes affected by Df(1)J5. That is the case according to Flybase (citing Svensson et al., 2003, and Salz et al., 1994) and confirmed by our own RNAseq data. No other transcripts, including snf, are affected by Df(1)J5.

      > 4. In Figure 1C, statistical test plus indication of significance is not presented. The requested statistical test and significance data have been added as required to the revised figure and figure legend.

      > 5. Related to Figure 1D. Additional neural markers could be assessed in dhdKO and TrxTKO flies. Whilst the gross morphology of the brain does not seem to be affected, there is a possibility that cell specification is affected. Specific markers for the NE, MED and CB could be used to assess this in more detail, particularly as the DE-cad images shown for dhdKO and TrxTKO flies seem to differ slightly from the control. We believe that there may be a small misunderstanding here. We have made this point clear in the revised version by referring to substantial published data showing that expression of these two genes is restricted to the germline and that, female fertility aside, TrxT and dhd deficient flies' development and life span are perfectly normal. If anything, Figure 1D is redundant. However, we would rather keep it as a control that our CRISPR KO mutants behave as expected.

      > 6. Related to Figure 2A, images from TrxTKO; l(3)mbtts1, dhdKO and l(3)mbtts1 should be added at the very least in a supplementary figure. Additionally, data for NE/BL ratio should be provided for dhdKO, TrxTKO and Df(1)J5 in the absence of l(3)mbtts1 tumours. Related to Figure S1, quantification of NE/BL ratio for female lobes should be added to the figure. All the requested images and data have been included in the revised version in new figures Figure S2B, Figure S2A, and Figure S1A.

      > 7. Related to Figure 2B and Figure S1, three rows of images are presented for each genotype. It is unclear whether these correspond to brain lobes from different larvae or different confocal planes from the same animal. This should be clarified in the figure and/or figure legend. This point has been clarified as requested in the revised figure legend. Each group of three rows correspond to brain lobes from different larvae of the same genotype.

      > 7 cont. Related to this, in addition to the anti-DE-cadherin data, it would be informative to include immunofluorescence data using antibodies such as anti-Dachshund (lamina), anti-Elav (medulla cortex) and anti-Prospero (central brain and boundary between central brain and medulla cortex) (as assessed in e.g. Zhou and Luo, J Neurosci 2013) in the mbt tumour situation to accurately describe regions disrupted by the tumours. There is no denying that taking advantage of the many cell-type specific markers that are readily available in Drosophila could be of interest. The same applies to cell cycle markers like PH3, FUCCI, and many others. However, we believe that interesting as they may be, none of this markers will give us the clue on the molecular basis of TrxT and Dhd tumour function that is, of course, the open burning question that we are trying to address now.

      > 8. Authors should clarify how the NE was defined when mbt tumours are generated, as it is severely affected. From the images provided, it is unclear which region corresponds to NE or how the NE/BL ratio was measured. It would be helpful to outline these regions in the images or, as mentioned above, use antibodies to define them. The figure has been modified to include the requested outlines defining the NE that indeed is correspond to the channel showing DE-Cadh staining.

      > 9. Figure 2C does not have indication of statistical significance for the comparisons stated in the text. Potential explanations for the different roles of Dhd and TrxT in long-term tumour development should be explored in the discussion. The requested statistical significance data for these comparisons were stated in the second last paragraph of that section. To make these data more prominent we have also added this information to revised Figure 2C.

      >9 cont. Related to this, does the analysis of the RNA-seq data from TrxTKO; l(3)mbtts1 and dhdKO; l(3)mbtts1 animals reveal why they have similar effect on mbt tumour development but do not synergistically contribute to long-term growth? Unfortunately our analysis of the RNA-seq data from TrxTKO; l(3)mbtts1 and dhdKO; l(3)mbtts1 animals does not give us any clue that could help us understand why they have similar effect on mbt tumour development, but not in long-term growth (allografts). To further explore this point, we have added new Figure S3 that includes a Venn diagramme showing the overlap between the affected mMBTS genes in TrxTKO; l(3)mbtts1 and dhdKO; l(3)mbtts1, together with the lists of enriched GOs among overlapping and non-overlapping genes. GO differences are tantalising, indeed, However, they do not immediately suggest any direct explanation for the different roles of Dhd and TrxT in long-term tumour development.

      > 10. Authors should clarify if there is any overlap between the affected M-tSDS and F-tSDS in the TrxTKO; l(3)mbtts1 and dhdKO; l(3)mbtts1 conditions. Would the limited overlap suggest that TrxT and dhd act in parallel rather than synergistically? This might also explain the differential effects on long-term tumour development. Additionally, the stronger effect observed in Df(1)J5 animals may be due to TrxT and dhd functional redundancy. Currently, there is limited evidence to suggest that TrxT and dhd act synergistically to regulate mbt tumour growth based on the presented data. See below.

      > 11. Authors should include a Venn diagram depicting affected genes (M-tSDS and F-tSDS) in the TrxTKO; l(3)mbtts1, dhdKO; l(3)mbtts1 and Df(1)J5; l(3)mbtts1 genotypes as this could clarify the percentage of overlap of gene signatures in these different conditions. Related to this point, authors could provide results from GO analysis to investigate whether specific functional clusters are altered in the different conditions. We have taken the liberty of fusing points 10 and 11 that are conceptually similar. The requested Venn diagrams showing the overlap between the affected M-tSDS and F-tSDS genes in the TrxTKO; l(3)mbtts1, dhdKO; l(3)mbtts1, and Df(1)J5; l(3)mbtts1 conditions, and GO analysis are now shown in new Figure S5. Unfortunately, these new data do not suggest any obvious explanation for the differential effects of these two genes, nor do they allow us to derive any further conclusions regarding the nature of the pathways through which TrxT and dhd cooperate to sustain mbt tumour growth. However, our analyses demonstrate that efficient suppression of mbt phenotypic traits (in larval brains) and transcriptome requires the combined elimination of both germline thioredoxins, while the effect of individual removal of either of them is only partial. These data demonstrate the synergistic nature of TrxT and dhd function in mbt tumour growth.

      > 12. In Figure 3E, authors should indicate more explicitly in the figure panel and/or figure legend which genes display significant differences in expression in the different samples. We apologise for not having made this point clear in the original version: All (21) genes shown in this Table are significantly downregulated in DfJ5;ts1 vs ts1. From these, nanos and Ocho are also significantly downregulated in TrxTKO;ts1 vs ts1, and Ocho, HP1D3csd, hlk, fj, Lcp9, CG43394, and CG14968 are significantly downregulated in dhdKO;ts1 vs ts1. These data have been included in the revised figure legend. Data on all other comparisons are included in Table S1.

      > 13. In Figure S2C-F it is not clear if the graphs represent data from all tissues or data from male and female tissues separately, as shown in Figure 4. Apologies for the confusion. All samples were from male tissues as indicated in the original figure legend. To make it more clear, we have labelled all four panels in the revised figure.

      > 14. Are TrxT and dhd also deregulated in other tumour types? Or is this specific for mbt tumours? This information could be provided to enhance the scope of the manuscript. Thank you for raising this point. TrxT and dhd are not dysregulated in the other tumour types that were analysed in Janic et al., 2010 (i.e pros, mira, brat, lgl and pins).

      > 15. Authors conclude that TrxT and dhd cooperate in controlling gene expression between wild-type and tumour samples and that they act synergistically in the regulation of sex-linked gene expression in male tumour tissue. However, the link between the two observations (if indeed there is a link) has not been well explained. Is the effect on gene expression in tumours simply a result of the regulation of sex-linked transcription? Our data show that TrxT and dhd synergistically contribute to the emergence of both the MBTS (i.e tumour versus wild type) and SDS (i.e. male tumour versus female tumour). The only certainty at this time regarding the interconnection between both signatures is that they overlap, but only partially, which answers one the questions raised by the reviewer: the effect on gene expression in tumours is not simply a result of the regulation of sex-linked transcription. Beyond that, the link (if indeed there is a link) between these two signatures has not been investigated. The lack of insight on this issue is not surprising taking into account that, in contrast to classical tumour signatures (tumour versus healthy tissue), the concept of sex-linked tumour signatures is relatively new and only a handful of such signatures have been published. Moreover, the vast majority of classical tumour signatures have not been worked out in a sex-dependent manner.

      Reviewer #3 Comments: > - In the first section of the results, as a first step to study the role of TrxT and dhd genes on mbt tumors the authors generate CRISPR knock outs of these genes and correctly validate them. However, afterwards, the experiment where the authors test the KO of these genes in a wild-type larva brain is not contextualized with the rest of the section. It might be best to first address the role of these genes in a tumor context and only then complement with the experiments in wild-type (in supplementary material). We do appreciate the reviewer's view, but respectfully disagree. In our opinion, the manuscript flows better by presenting the tools that we have generated in Figure 1, By corroborating published data showing that these two germline genes do not affect soma development (Torres-Campana et al., 2022, Tirmarche et al., 2016, Svensson et al., 2003, Pellicena-Palle et al., 1997) this first figure not only validates our CRISPR KO mutants, but also sets the stage to highlight their significant effect on a somatic tumour like mbt.

      > - Fig 2 B - To back up the quantifications in Fig 2A the authors could include images of l(3)mbt ts1 tumors with TrxT KO and dhd KO also. The requested images are shown in new figure Figure S2B.

      > Fig 2 B and C - Indeed, the results suggest that TrxT seems to be responsible for most tumor lethality upon l(3)mbt allografts, but not dhd. This is curious since l(3)mbt; dhd KO brain tumors have the same partial phenotype as l(3)mbt; TrxT KO (fig 1A). It would be interesting to further explore these phenotypes by staining l(3)mbt; TrxT KO and l(3)mbt; dhd KO brains with, for instance, PH3 to understand if the number of dividing cells of these tumors could be different. In addition, to back up this information, the authors could look at what happens to l(3)mbt tumors with TrxT KO and dhd KO at a later stage of development (or to larva or pupa lethality if that is the case) and compare it with l(3)mbt brains. We did explore the possibility of looking at later stages. Unfortunately, the onset of the lethality phase compounded by major tissue reshaping from larval to adult brain make these stages unsuitable to reach any meaningful conclusion. With regards to staining for PH3, we think that like FUCCI and a long list of other useful labels that could be explored, it is potentially interesting, but hardly likely to give us the clue on the molecular basis of TrxT and Dhd tumour function, that is of course the one important question that we are addressing now.

      > - Fig 2 B - What happens to the medulla in a l(3)mbt brain tumor? Although the ratio of NE/BL is the same for wild-type and D(1)J5; l(3)mbt, it still seems that the medulla in D(1)J5; l(3)mbt brains is substantially bigger, although quantifications would be required. Do the authors know if the NE in D(1)J5; l(3)mbt brains is either proliferating less or differentiating more? There are no significant differences in medulla/BL nor in CB/BL ratios. The corresponding quantifications have been added to the revised version. As for the question on proliferation versus differentiation, the simple answer is that we do not know.

      > Figure S1 - Although the effects of TrxT KO and dhd KO in male mbt tumors seem to be enhanced in relation to female tumors, the authors should include some form of tumor quantification for female tumors like in Fig 2 A. We have carried out the requested quantifications and added the results in a new panel in revised Figure S1A.

      Moreover in the 2nd section of the results, relative to Fig 1S in "...Df(1)J5; l(3)mbtts1 female larvae although given the much less severe phenotype of female mbt tumours, the effect caused by Df(1)J5 is quantitatively minor." to say "quantitatively" minor, the authors should include not only quantifications, but a form of comparison between female tumors vs. male tumors. The requested quantification was published in Molnar et al., 2019. However, we agree on the convenience of doing it again with our new samples. The new data, that confirm published results, are now shown as a new panel in revised Figure S1C.

      > - Fig 3D - The hierarchical clustering was done according to which parameters? A brief explanation could help a better interpretation of this results section. The requested information has been added to the Methods section. Hierarchical clustering was done using the function heatmap.2 in R to generates a plot in which samples (columns) are clustered (dendogram); genes (rows) are scaled by “rows"; distance = Euclidean; and hclust method = complete linkage. Expression levels are reported as Row Z-score.

      > - Fig 3D - It could be beneficial for the authors to include an analysis of the downregulated genes shared between TrxT KO mbt tumors and dhd KO mbt tumors, as well as the genes that are not shared (besides MBTS genes). Could be something like a Venn diagram. Thanks for pointing this out. New Figure S3 shows the requested Venn diagram, as well as the list of enriched GOs for each group.There are no enriched GOs in the list of overlapping genes. TrxTKO; l(3)mbtts1-specific genes are enriched for GOs related to game generation, sexual reproduction, germ cell development and simlar GOs. dhdKO; l(3)mbtts1 -specific genes are enriched for GOs related to chitin, molting and cuticle development. Tantalising as they are, these observations do not immediately suggest any direct explanation for the different roles of Dhd and TrxT in long-term tumour development. We are happy to add these supplemental information, but we do not deem it worth of any further discussion at this point.

      > - Results section 3 - "Expression of nanos is also significantly down-regulated upon TrxT loss, but remains unaffected by loss of dhd" - to corroborate the idea that TrxT and dhd work as a pair, but contribute to different functions within the tumor, it would be interesting for the authors to do an allograft experiment of dhd KO; l(3)mbt male tissue with nanos knock down in the brain, if genetically possible. The suggested experiment is published. The gene in question (nanos) is a suppressor of mbt tumour growth: In a nanos knock down background, l(3)mbt allografts do not grow (Janic 2010).

      Minor comments: * > - In the first section of the results, the authors claim that "Consistent with the reported phenotypes of Df(1)J5...", but then the study is not mentioned.* The corresponding references (Salz et al., 1994; Svensson et al., 2003; Tirmarche et al., 2016) have been added.

      > - Fig 1 B - It is a bit confusing to follow where TrxT and dhd are in the Genome browser view. I am guessing we should follow the TrxT-dhd locus from A, but the authors could make it clearer. Figure 1 has been changed to make this point more clear.

      > - In the same section, in the next sentence, the homozygous and hemizygous is a bit confusing. "...homozygous TrxTKO females, dhdKO males, and TrxTKO males", should be corrected. We appreciate the suggestion, but would rather stick to classical terminology and refer to KO/KO females as homozygous and to KO/Y males as hemizygous.

      >- In the same section (Fig 1C): "RNA-seq data also shows that TrxT is significantly upregulated in l(3)mbtts1 males compared to females (FC=7.06; FDR=1.10E-44) while dhd is not (FC=1.89; FDR=2.00E-14)." - But dhd is nevertheless upregulated, although less, in l3mbt males, right? The authors might need to rephrase. We refer to comparing males versus females, not wild type versus tumours. The text has been rephrased in the revised version to make this point clear.

      > - Fig 2 A (quantifications), should be after the confocal images (Fig 2 B). We respectfully disagree on this minor point. We initially organised this figure in the order recommended by the reviewer, but we eventually found it easier to write the article using the order shown in the submitted figure. We would rather stick to this version.

      > - Fig 2 B and Fig S1 - Please include an outline of at least neuroepithelia and, if possible, Central brain or medulla so that these regions can more clearly identified. Moreover, these results will be easier to interpret if you add a male symbol in this image and a female symbol in Figure S1, otherwise, it might seem like the same figure Outlines and symbols have been added to the revised figure, as required.

      > - In results, section 2, "Consequently, in spite of the strong sex dimorphism of mbt tumours, the phenotype of Df(1)J5; l(3)mbtts1 larval brains is not sexually dimorph" - to back this up, quantifications of Df(1)J5; l(3)mbtts1 female vs male tumor size, as well as statistical analysis are needed, like previously said. The requested the new data is now shown in revised Figure S1C.

      > - In results section 2 - "For allografts derived from, female larvae, we found that differences in lethality rate caused by TrxTKO; l(3)mbtts1, dhdKO; l(3)mbtts1, Df(1)J5; l(3)mbtts1, and l(3)mbtts1 tissues (7-23%) were not significant (Figure 2C)" - there is no statistical analysis to conclude that the lethality rate is not significant, from 7% to 23% still seems like a difference. Thanks for pointing this out. We did of course generate the requested statistical analysis data, but failed to include it in the manuscript. Chi-square statistical test gives a p value=0.2346. These data have been added to the revised version.

      > - Last paragraph of section 2 of results - very long and confusing sentence. Please rephrase text. We have rephrased this sentence to make it shorter and clearer.

      > - On section 3 of results: "The vas, piwi and CG15930 transcripts are not significantly down-regulated following either TrxT or dhd depletion alone." - in Fig 3E, not only these transcripts seem to suffer a slight downregulation, but there is also no statistical analysis supporting this. There seems to be a misunderstanding here. The requested statistical data for each gene were shown in Table S1

      > - First paragraph of section 3 results - the first sentence is written in a confusing way. Moreover, more context is needed in the sentence afterwards: "we first focused on transcripts that are up-regulated in male mbt tumour samples compared to male wild-type larval brains (mMBTS)." but using which data? The RNA seq data? Agreed; this paragraph has been amended in the revised version.

      > - Brief conclusion missing on the second paragraph of the last section of results. As far as the results presented in this paragraph are concerned, we can only mention the two potentially interesting observations, which were pointed out in the original version: (i) the suggestion that nanos upregulation could be critical for sustained mbt tumour growth upon allograft, and (ii) the fact that three genes (vas, piwi and CG15930), also known to be required for mbt tumour growth, are downregulated in Df(1)J5; l(3)mbtts1, but remain unaffected following either TrxT or dhd depletion alone. We are unable to derive any other conclusion from these observations.

      > - In the end of 3rd paragraph of last section of results: "...M-tSDS and F-tSDS genes is partially reduced in l(3)mbtts1 brains lacking either TrxT or dhd, but it is completely suppressed upon the lack of both." - "completely" might not be a correct word to use in this case, as there is still some small differences As requested, we have changed "completely" for "strongly".

      > - 4th paragraph of last section of results: Either mention the male results and then female (to be in order with the figure, as the female graphs come after the male graphs) or change the order in the figure. Also, this paragraph is not very clear, could benefit from a better explanation of the results and conclusions. Point taken. Figure 4 has been changed and female graphs come before male graphs. The paragraph is clearer now. The conclusion from this paragraph is included in the final paragraph of this section.

      > - Fig 4 C,D,E,F: to make it more clear, please write the name of the genotypes in question in the figure. At the reviewer's request, the genotypes in question are now written in each panel. Please note that we did not do so before because all four panels correspond to the same genotype: Df(J5); l(3)mbtts1 vs l(3)mbtts1, as we mentioned in the original figure legend.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      The paper by Yammine et al addresses a major problem in peculiarities of genotype to phenotype manifestation in collagen II and chondrodysplasia. It is a lucid and comprehensive study detailing what they see as the fundamental mechanism of Gly1170 Ser mutated Col2a1 gene.

      At the heart of the matter is the debunking of the results from a mouse model generated by liang et al (Plos one 2014) paper in which the authors suggested that the phenotype seen only homozygous mice (heterozygous mice appear normal), was related to ER stress -UPR-apoptosis cascade resulting in the chondrodysplasia. Yammine et al paper uses a different model, a robust human iPSC-based tissue, with a CRISPRed variant show that despite the ability of the variant chondrocytes to deposit a Gly1170Ser-substituted collagen II in both the hetero- and homozygous models, is not accompanied by any substantive UPR. The authors of this current paper also argue that their model system is most closely resemble the human context, where heterozygous individual show pathology.

      We appreciate the Reviewer highlighting the significance of this manuscript addressing a major issue in the field.

      I have sieved all the data related to this topic and have gone back to examine the data and what struck me was the repeated use of the phrase "slow to fold" in the current paper and wondered whether the element of "TIME" is as important in the chondrogenesis of either models and it is this element that generate the difference between the two results? While iPSC-based tissue takes up to 44 days, a female mouse would have had two litters in this time and made many more growth plates. Could it be by "slowing" the chondrogenesis pathway, which is part of the procedure of differentiation of iPS cells into chondrocytes, the ER is not as "stressed" as in mouse development? I would like the authors to reflect and comment and put forward their view given UPR signaling pathways play a crucial role in chondrocytes in phases of high protein synthesis, e.g., during bone development by endochondral ossification (Journal of Bone Metabolism, vol. 24, no. 2, pp. 75-82, 2017).

      The Reviewer here emphasizes a likely benefit of the human model that we had not previously considered, as differentiation and growth in our model are indeed far more similar to humans than the rapid timeline in mice. We also note that the evidence that collagen is slow to fold comes from the gold standard assay in the field for collagen folding rate, a point discussed in greater detail in response to Reviewer 2’s query (see below).

      The literature evidence does indicate that transient UPR signaling is relevant for chondrogenesis. We selected UPR timepoints that do not interface with the differentiation process, but rather the tissue deposition process while chondrocytes are still actively depositing and maintaining the extracellular matrix, to be able to distinguish a physiological transient UPR during differentiation from a potential chronic and possibly pathologic one. We now clarify this point in the manuscript (see text below).

      “These timepoints were selected to reflect an early and a late stage of cartilage maturation, but with both timepoints harvested post-chondrogenesis so as not to interfere with the physiologic transient UPR activation that can be important in that process.”

      This is not withstanding the good argument given by the authors in defending their robust results, namely that the there is no evidence that the hydrophilic triple-helical domain of pro-collagen binds BiP, the main detector of accumulated misfolded proteins. What then do they make out of the immunostaining and qPCR with ER stress related genes in Liang et al paper?? I know that the data is not theirs but a comment on the indisputable data gives the reader a better understanding.

      It is critical to note that the evidence for ER stress that induces the UPR is, at best, exceptionally weak for heterozygotes in the Liang et al paper, and arguably also weak for homozygotes. Liang et al observed, via quantitative PCR, that the mRNA levels of just Chop (which is also a marker of the integrated stress response and not a good readout for UPR activity) and ATF6 (whose RNA-level upregulation is not a standard marker of the UPR) were significantly upregulated in the disease-relevant heterozygous mice – given that other (more valid) UPR markers were not altered, this is not so different from our observation of a lack of UPR in the disease-relevant heterozygotes.

      A somewhat more comprehensive set of UPR markers, including Chop, Xbp1(Total and Spliced), Grp78 (BiP), ATF4, and ATF6, was significantly upregulated only in homozygous mice compared to wild-type. PERK is one of many kinases upstream of ATF4 and Chop that can be activated by a variety of processes (the pathway is part of the integrated stress response, for example). Moreover, transcriptional upregulation of ATF4 (which is actually induced translationally, not transcriptionally) and ATF6 (which is actually induced proteolytically, not transcriptionally) are not normally used to read out UPR activation, so it is not so clear to us that a robust UPR was induced even in homozygotes. Moreover, there was not a substantial increase in Xbp1-S (S = spliced) relative to Xbp1-T (T= total) in the study of homozygous mice, which is the most appropriate measure of UPR activation – rather than change in Xbp1-T and Xbp1-S. The use of mostly non-standard genes to assess UPR induction, the weak upregulation of BiP (2.5-fold), and the unchanged ratio of Xbp1-S to Xbp1-T raise some questions regarding UPR induction even in the homozygotes. Regardless of these homozygote data, as noted above, the evidence for a UPR in heterozygotes is very weak, despite ER stress being the focus of the Liang et al paper.

      With respect to immunostaining, Liang et al observed that tissue from homozygous mice (but not heterozygotes) contained significantly more apoptotic cells. Apoptosis could be a result of chronic, unresolved UPR signaling, but it could also result from any number of other pathways and is certainly not direct evidence for UPR-inducing ER stress. Additionally, for the homozygote apoptosis assay, Liang et al do not note how many mice were analyzed for each genotype, a value they did report for their other assays. While examining multiple sections for each genotype is valuable (they state ≥10), the assessment of biological replicates (additional mice) seems critical to confidently reach a conclusion.

      Although I understand the choice of cell lines for overexpression, the transfection of the HT-1080 cells using wild-type and Gly1170Ser COL2A1encoding plasmids are not a match to the in vivo model (variation of efficiency, etc.) the appearance of BiP even at a lower fold increase does not negate ER stress, as the authors acknowledge but more important is what other paracrine signals which triggers the UPR signally pathway which is not linked to BiP? or an iPS system may lack? Is there anything else not only ATF6α (activating transcription factor 6 alpha), but IRE1α (inositol-requiring enzyme 1 alpha), and PERK (protein kinase RNA-like endoplasmic reticulum kinase).

      Our finding that the UPR is not activated is based on comprehensive RNA-sequencing performed in the physiologically more relevant iPSC-derived chondrocyte, as opposed to the tumor cell line HT-1080. Our interactomic finding that BiP interacts to the same extent with wild-type and Gly1170Ser procollagen-II (in HT-1080 cells) strongly supports our proposal that the reason the UPR is not activated is that BiP fails to recognize unfolded triple-helical domains.

      We note that, although HT-1080 cells are not a perfect match, they are the most accessible option for interactome-based studies. Because there is no MS-grade antibody for collagen-II IP, we need to IP a transfected, tagged collagen. We cannot do this in chondronoids, or in isolated chondrocytes that transfect poorly and rapidly dedifferentiate. Critically, Prockop and co-workers extensively validated HT-1080 cells as a platform for fibrillar collagen biochemical studies in Matrix 1993, 13, 399. Our own lab further characterized their capacity to properly handle fibrillar collagen variants in great molecular detail in ACS Chem Biol 2016, 11__, __1408.

      Since our chondronoid system contains only chondrocyte cells, as is the case in cartilage, the cells can receive paracrine signals from other chondrocytes, but not other cell types. In joints within a whole animal, it is true that paracrine crosstalk occurs between different cell types of different tissues, including inflammatory cells for example. The chondronoid is very useful for elucidating the defects that occur at the chondrocyte-level, without confounding secondary effects. At the chondrocyte-level, Gly1170Ser-substituted procollagen-II does not activate the UPR.

      The Reviewer’s comment regarding the absence of paracrine signals in an iPSC-based system is well-taken, and we added discussion as follows:

      “These observations indicate that the chondrocytes were not raising such stress responses, at least when examined in the absence of paracrine signals from other cell types in the joint.”

      The authors have given us plenty of alternatives that are relevant, and they prepared us for yet another paper on articular cartilage using iPS tissue model which I am looking forward to.

      We are also excited about the upcoming potential of this model system!

      Significance

      I think this paper is publishable and it is important in understanding the mechanism by which mutation in collagen type II affect chondrogenesis and therefore bone formation. This paper will appeal to musculoskeletal scientist especially those who are interested in bone and its pathology. It would be important for the authors to respond to the critique of "TIME" and speed of protein synthesis which create a duress in the ER pathway.

      We greatly appreciate the Reviewer’s comment again on the significance of this work, and their scholarly input which has substantially improved the paper. We hope they will agree that the manuscript is now ready for publication.

      Reviewer #2

      Evidence, reproducibility and clarity

      *System: The investigators have used a human iPSC chondrocyte model system to investigate the biochemistry of the Chondrodysplasia caused by the p.Gly1170Ser mutation in the type II collagen gene (COL2A1). They studied presumably homogeneous chondronoids formed by 3 cell lines they previously reported in which the chondrocytes were either homozygous wild type for the gene, homozygous for the Cas Crispr induced mutation or heterozygous for the two alleles (their refs 42-45). In addition, they utilized cultured HT1080 human fibrosarcoma cells transfected with wild type and mutant Col2A1 to study differences in the interactomes of the two proteins.

      *

      *Analytic Parameters: They investigated the extracellular matrix formed by the three cells using collagen and proteoglycan staining and TEM and the transcriptional responses in chondronoids expressing the wild type and mutant genes.

      *

      *Observations: matrix formation was defective in the two mutation bearing cell populations, reflecting defective fibril formation proportional to the abnormal gene dose. They found increased accumulations of post-translational modifications (hydroxylation, and O-glycosylation) on the mutant collagen extracted from the chondronoids and EM evidence of collagen retention in the ER. They studied the comparative transcriptional profiles in the three phenotypes and failed to find a profound UPR response late in culture and only a mild upregulation of UPR genes in the young cultures. They could not find evidence for activation of the ISR except in the homozygous mutant cells.

      *

      *Using transfected HT-1080 cells (previously shown by these investigators not to express endogenous pro-collagen II but able to synthesize transfected pro-collagen genes) they were able to study the comparative wt and mutant pro-collagen interactomes.

      *

      Conclusions: They conclude that the p.gly1170ser mutation in Col2A1 results in abnormal folding which results in trapping of the protein in the ER and some interaction with cellular elements of the proteostatic response. They concluded that the cellular proteostasis machinery can recognize slow-folding Gly1170Ser through increased interactions with certain ER network components but not in the same fashion that has been described for liver cells producing mutated versions of high volume secreted proteins.

      We appreciate this careful summary of our work.

      *Major comments:

      *

      Their first conclusion, stated in the abstract, "Biochemical characterization reveals that Gly1170Ser procollagen-II is notably slow to fold and secrete." that the mutant polypeptide chain is slower folding than the wild type chain is based on the premise that the longer the chains are in the ER the greater the degree of lysine hydroxylation and O-glycosylation. Although this may be true, they do not provide a reference and I could not find a definitive description of the phenomenon. Their reference 48 only discusses the occurrence of intracellular post-translational modification of the lysines and continuing modification extracellularly but does not relate these phenomena to the rate at which the peptides traverse the cell. I think the reader would benefit from seeing experiments in which the rate of folding and secretion of the wild type and mutant chains are measured and the degree of post-translational modification are compared. Cabral WA et al showed differences in collagen folding and secretion rates in cyclophilin wt, knockouts and heterozygotes osteoblasts and fibroblasts by western blots. (2014) Abnormal Type I Collagen Post-translational Modification and Crosslinking in a Cyclophilin B KO Mouse Model of Recessive Osteogenesis Imperfecta. PLoS Genet 10(6): e1004465. doi:10.1371 / journal.pgen. 1004465). Performing such experiments in their chondronoids would confirm the authors' interpretation that the increased post-translational modification portrayed in their figure 4 reflects slowed folding and secretion related to the mutation.

      We apologize for failing to provide essential background references and information to assess our assay for slow folding/secretion of procollagen. In fact, slow migration on SDS-PAGE is not only a widely used assay for comparing the rate of folding of procollagens, it has also remained the gold standard in the field for the past forty years. The studies cited below are some of the seminal papers in the field linking collagen’s rate of folding with its extent of posttranslational modifications and its electrophoretic mobility. We have now updated our citations accordingly.

      1. Bateman, J.F.; Mascara, T.; Chan, D.; Cole, W.G. “Abnormal type I collagen metabolism by cultured fibroblasts in lethal perinatal osteogenesis imperfecta” Biochem J 1984, 217, 103.
      2. Bonadio, J.; Holbrook, K.A.; Gelinas, R.E.; Jacob, J.; Byers, P.H. “Altered triple helical structure of type I procollagen in lethal perinatal osteogenesis imperfecta” J Biol Chem 1985, 260, 1734.
      3. Bateman, J.F.; Chan, D.; Mascara, T.; Rogers, J.G.; Cole, W.G. “Collagen defects in lethal perinatal osteogenesis imperfecta” Biochem J 1986, 240, 699.
      4. Godfrey, M.; Hollister, D.W. “Type II achondrogenesis-hypochondrogenesis: Identification of abnormal type II collagen” Am J Hum Genet 1988, 43, 904. The basis for this collagen-specific assay of folding rate is that the ER-localized procollagen proline and lysine hydroxylases require monomeric collagen strands as substrates, and cannot accommodate a folded triple helix in their active sites. Thus, accumulation of post-translational modifications on collagen depends on the procollagen triple-helical domain’s residence time as an unfolded monomeric region of the assembling triple-helical trimer within the ER. Some fraction of the hydroxylated lysines are later glycosylated, which slows migration on SDS-PAGE gels. We have now clarified our slow folding conclusion with more precise references and discussion in the manuscript.

      Pulse-chase experiments like those suggested by the Reviewer would indeed be beneficial if they were possible in this system, but they simply are not. Although it might be possible to soak in a radiolabeled amino acid over a short time period, the assay still relies on separating the cell fraction from the secreted fraction. This is possible in monolayer cultures, but in a chondronoid composed of complex cartilage and cells we have no way to do it. One could propose that we extract the chondrocytes and then do the pulse-chase in a monolayer culture, but this unfortunately is also not possible as chondrocytes do not behave well outside the tissue setting and rapidly differentiate into other cell types. Fortunately, the procollagen overmodification assay is a widely used and well-accepted measure of slow folding, and thus addresses the issue.

      I think Figure 4 needs more explanation for the reader. While, as expected, the homozygous mutant band is much slower than the homozygous wild type band, in the heterozygotes the band is intermediate rather than showing a discrete mixture of wild type and mutant proteins, reflecting different degrees of post-translational modification. Is this a function of mixed triple helices with heterogeneous degrees of post-translational modification? It deserves more comment, since the argument relating the degree of post-translational modification to the rate of folding is dependent on this observation. It would also be helpful to show the whole gel with collagen II markers.

      We modified Figure 4 __to show the whole gel (in the SI, see __Fig. S3) and molecular weight markers. It also shows the wild-type collagen-II band. Most of the procollagen produced by the heterozygote is heterotrimeric for the disease-causing substitution (>87% of trimers will contain at least one mutant chain and thus experience delayed folding) and, therefore, the diffuse banding structure is to be expected. Further, we would speculate that in these challenged ER, even the folding of wild-type only trimers is impaired. The Reviewer’s comment suggests there may be some basis for that speculation. We added a note to this effect.

      “The presence of a single broad, slow-migrating band as opposed to distinctive overmodified mutant versus normally modified wild-type strands is due to fact that the vast majority of trimers formed in heterozygotes (>85%) contain at least one Gly1170Ser strand that delays triple-helix folding.”

      Another approach to the question of intracellular accumulation due to a slow rate of folding of the mutant collagen would be to perform pulse chase labeling of the three types of chondronoids with radiolabeled amino acids and sugars and processing the media and lysates with analysis using antibodies specific for the two collagen chain types. Given the authors extensive experience in studying collagen biosynthesis (e.g. Chan et al J. Biochem. Biophys. Methods 36 (1997) 11-29), such a supporting study would firmly establish whether the rate of folding/secretion differs between the wt and the homozygous and heterozygous chondroidinomas. Until the slow folding can be directly demonstrated in a quantitative fashion rather than by monitoring the secondary phenomenon of post-translational modification the hypothesis remains unproven.

      Discussed above in response to the Reviewer’s earlier suggestion of pulse-chase and question regarding the post-translational modification assay, unfortunately the pulse-chase experiment is infeasible. Fortunately, the modification-based assay is already the gold standard in the collagen field.

      Another issue that does not appear to be addressed is the consequence of having misfolded collagen chains in the dilated ER. Liang et al, using mice transgenic for one or two copies of the mutant human gene showed apoptosis in the homozygotes but not in the hets a finding similar to that of Kimura et al using transgenics carrying a different human COL2A1 mutation. Okada et al, using chondrocytes converted from human fibroblasts with clinical collagenopathy (heterozygous), although not the same mutation as in the present study, showed dilated ER and some level of apoptosis in the cultured cells. Hintze et al, examining chondrocytes expressing different mutants associated with different forms of spondyloepiphyseal dysplasia, suggested that the degree of stability of the mutations might determine whether apoptosis occurred, i.e. the thermolabile p.R989C was associated with apoptosis while cells expressing the more thermostable mutants p.275C, P.719C and p.G853E did not reveal any evidence for ongoing apoptosis R989. Is it possible that the smaller size of the homozygous chondronoids reflect fewer cells rather than less matrix (or both) as result of apoptosis? Examination of the chondronoids with reagents for caspase 3 or Tunel staining. One could also measure by Col/DNA ratio in wt, hets and homos. It might also have been useful for these experiments been more quantitative, i.e. by cell sorting rather than by eye. Would ImageJ software been helpful?

      We greatly appreciate this suggestion. We now added results of TUNEL assays performed on sections of the chondronoids (see Fig. 8), including quantification of the results. Notably, we do not observe a significant difference in apoptosis between genotypes at the timepoint considered. This result is also supported by our transcriptional data, where we do not observe upregulation of apoptosis-related pathways, via the UPR or otherwise.

      It is also unclear as to the conformation of chains trapped in the ER. There are many examples in which the natural tendency of misfolded proteins is to aggregate. This is certainly true in the neurodegenerative diseases. While at the magnification used here in the TEM's the ER inclusions appear homogeneous and amorphous, perhaps at higher magnification/resolution a more discrete structure might be seen.

      From collagen-II immunohistochemistry confocal images, the intracellular collagen appears sometimes as aggregated puncta, and in other cases more diffuse and amorphous. Given this heterogeneity, we were not able to readily obtain clear additional structural characterization of the intracellular procollagen-II fraction.

      *While the choice of time points for the transcriptional analysis, i.e. early and late seems well thought out, the lack of a significant response may be due to the timing and it might have been useful to do earlier or later time points or intermediate time points in case the response was transient, particularly since other laboratories have reported UPR activation and abnormalities in the context of the silencing of Xbp1, the spliced form of which is a major driver of at least one arm of the UPR. *

      While our RNA-sequencing results at the specific timepoints we chose cannot rule out a transient activation of the UPR, they do indicate that chronic, unresolved UPR signaling is not the underlying cause of pathology, which is the main point we are making.

      The notion that pro-collagen is largely hydrophilic without the potential for exposure of hydrophobic regions that might engage BiP, thus is not sensitive to BiP sensing, is interesting. Is it possible that the tendency of the mutant polypeptides to form the triple helix which in itself acts as kind of a self chaperoning structure? Looking at the kinetics of assembly inside the cell, see suggestions above, might provide further insight into the process beyond that obtained by looking at the modified state of the lysines.

      We believe this notion is very strongly supported by the interactomic experiment showing that BiP fails to preferentially engage the poorly folding triple-helical variant. There are, however, many other chaperones and folding enzymes that assist collagen folding, including prolyl isomerases and Hsp47. Hence, it is not clear to us that substantial self-chaperoning occurs. Still, the self-chaperoning idea is intriguing, and we will note that prior work does indicate that triple-helical domains of individual procollagen polypeptides are strongly pre-organized for triple-helix formation (for a review, see Annu Rev Biochem 2009, 78, 929). That said, we hesitate to speculate here on the self-chaperoning idea without additional evidence.__

      __Minor comments:

      As I mentioned above, while the transcriptional interactome experiments are computationally sophisticated the cell biology and biochemistry would benefit from more and better quantitation.

      We have included quantitation of the extent of intracellular procollagen accumulation and the extent of apoptotic cells, which we hope helps to address this point.

      The paper is written in a style in which results and discussion are intermingled. Personally I prefer that the introductions are short, the results clearly and briefly presented and the discussion deals with the interpretation and conclusions. I thought that whole paragraphs could have been omitted. e.g. in the introduction *Omit paragraph "The fibrillar.........achondrogenesis type II" Omit paragraph "Conventional and... for example." Omit "Excitingly........in vitro and in vivo (36)." Results: First paragraph repeats last paragraph of introduction and not necessary in one place or the other, condense. *

      We appreciate this feedback and have accordingly edited the manuscript for clarity and brevity, which includes deleting or significantly shortening all the paragraphs indicated by the Reviewer. These improvements are indicated in the track-changes version of the manuscript we resubmitted.

      Figure 2 by eye MGP (Matrix gla protein inhibits vascular calcification of type II collagen) seems highly over-expressed in the homozygous mutants; MGP is supposedly an inhibitor of calcification, does its over-expression here reflect something about the adequacy of the matrix

      Overexpression of MGP could indeed reflect a defect in the matrix of the homozygous variants. It is also likely a reflection of the delayed hypertrophy and maturation observed in the homozygous variants, as matrix calcification is a step in the endochondral ossification process. We did not follow-up on this particular observation, as it is exclusively observed in the less clinically relevant homozygous variant. We added a note to the manuscript to capture the Reviewer’s point about MGP, as below:

      “The upregulation in the homozygous system of Matrix Gla Protein (MGP) (Fig. 2A), which inhibits vascular calcification of the matrix in vivo, further supports the delay in hypertrophy, and could lead to differences in the biomechanical properties of the matrix.”

      Figure 5 is good but can it be confirmed by quantitative biochemistry?

      We have included quantitation of the extent of intracellular procollagen accumulation and the extent of apoptotic cells.

      __ __Did you stain with antibodies to other ER resident chaperones other than calreticulin?

      Yes, we also stained the ER with PDI. However, the chondronoids require extensive optimization for immunostaining and we could obtain much better images using the ER marker for calreticulin, hence our choice of images to present in the manuscript.__

      __Do cells with large amounts of intracellular G1170S die?

      As indicated by the newly included TUNEL data, interestingly, even cells expressing exclusively the Gly1170Ser variant of procollagen-II do not seem to apoptose at a significantly higher rate than wild-type, at least at the timepoint considered. As mentioned above, we added these data as Fig. 8, and added discussion of these results and methodology in the relevant sections of the manuscript.__

      __Does higher magnification EM reveal any structure of the material within the dilated ER?

      We have so far not been able to use EM to obtain higher-resolution insight into intracellular procollagen structures, but we will work on this idea in future studies.__

      __Are there any inflammatory cells in the Chondronoids? To respond to aberrant proteins?

      There should not be any such cells present in the chondronoids, and we indeed do not observe any inflammatory response. As noted in the response to Reviewer 1, we added discussion regarding the absence of paracrine signals in this type of model system, which we do believe has major advantages for biochemical studies like those performed here.__

      __Paragraph

      * "Bypassing the UPR.......often do not" Is discussion not results*

      Corrected, thanks.

      Significance

      The experimental system described here is clearly the wave of the present. Generating human ipSC's of different lineages is now being exploited to study a variety of disorders, to achieve better understanding of pathogenesis at the molecular level to serve as appropriate models for drug development, particularly in the context of high throughput screening. In addition, as in this case, relatively rare autosomal dominant disorders with phenotypes that resemble more common sporadic disease, may allow the development of treatments that are relevant for the sporadic disorder. While it is likely that the osteoarthritis that develops in the carriers of the COL2A1 mutations is a function of the host response to the aberrant mechanics resulting from the defective extra-cellular matrix caused by the mutation, having a pure system in which the primary defect can be corrected and the predisposing matrix deficit reversed, could allow normal reparative processes to mitigate the functional joint disability. While the transgenic mice are useful as a disease model, they represent not only the expression of the primary defect but the host pathophysiologic response to that defect, i.e. in this case how the mouse responds to the defective matrix state and whether those responses add additional pathogenic factors to the disease course. Having a tool in which to relatively assess the pure chondrocyte effect should allow more granular analysis of the primary process.

      We appreciate the Reviewer’s careful and enthusiastic assessment of the significance of our work.__

      __

      Their findings reinforce the notion that involvement of the UPR as well as the other arms of the proteostatic response in chondrocytes expressing a variety of mutant collagens suggests a degree of heterogeneity, perhaps depending on the mutation involved. While I do not believe that their current data prove or rigorously test their proposed hypothesis, i.e. that "perhaps due to the pathologic substitution occurring within a triple-helical domain that lacks hydrophobic character, this ER protein accumulation is not recognized by cellular stress responses, such as the unfolded protein response", it is worth considering.

      We provide that hypothesis as a reasonable explanation for the absence of a UPR, and it is strongly supported by our interactomic studies. Furthermore, neither we nor others have found evidence for BiP binding the triple helical domain of procollagen in any other studies. Still, that hypothesis is not the core point of the paper and we do appreciate the Reviewer’s perspective.

      Given the fact that this is a relatively small field with a variety of observations concerning the role of proteostasis and the UPR in particular which seem to vary depending on the system, i.e. transgenic mice, transfected fibroblasts, the chondroidomas, these observations particularly with additional biochemistry to confirm their notions regarding folding rates etc, represent a useful technical addition to the field and should be interesting for people working on collagen biology, arthritis and protein folding.

      I am not a collagen biologist hence my knowledge of some of the nuances of collagen biology may not be extensive. My own areas of interest include the assembly of multi-peptide proteins (such as immunoglobulins) for secretion; the mechanisms that allow them to exit the cell and the aggregation of misfolded proteins as exemplified by the amyloidoses and other forms of clinically relevant protein aggregation. Hence, I am very familiar with tissue culture, transgenic animals as disease models, studies of protein aggregation, and as a former rheumatologist, osteoarthritis.

      We greatly appreciate the Reviewer providing such valuable and scholarly input from the perspective of a scientist with deep expertise in the secretory pathway and other diseases of protein misfolding, as well as from rheumatology. Specifically from the perspective of expertise in collagen biology/biochemistry, we hope that our detailed explanations of assays that are possible versus not possible with collagen in this system, the additional context for why our assessment of the modification of procollagen is correlated with folding/secretion rate, and the further analyses added to the paper, now make a convincing case that the improved manuscript is of high significance and is ready for publication.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility, and clarity (Required)):

      1. In this manuscript, Imoto et al. analyze the specific role of the Dynamin1 splice variant Dyn1xA in so-called ultrafast endocytosis, an important mechanism of synaptic vesicle recycling at synapses. In a previous publication (Imoto et al. Neuron 2022), some of the authors had shown that Dyn1xA, and not the other splice variant Dyn1xB, is essential for ultrafast endocytosis. Moreover, Dyn1xA forms clusters around the active zone for exocytosis and interacts with Syndapin 1 in a phosphorylation dependent manner. However, it was unclear which molecular interactions underlie the specific role of Dyn1xA. Here, the authors provide convincing evidence with pull down assays and CSP that Dyn1xA PRR interacts with EndophilinA1/2 with two binding sites. The first binding site lies in the part common to xA and xB, was previously characterized. The second site was previously uncharacterized, is specific for Dyn1xA, and is regulated by phosphorylation (phosphobox 2). The location of these splice variants and mutated forms at presynaptic sites correlate with the prediction made by the biochemical assays. Finally, the authors perform rescue experiments ('flash and freeze' and VGLUT1-pHluorin imaging experiments) to show that Dyn1xA-EndophilinA1/2 binding is important for ultrafast endocytosis. I find the results interesting, providing an important step in the understanding of the interplay between dynamin and the endocytic proteins interacting with it (endophilin, syndapin, amphiphysin) in the context of synaptic vesicle recycling. The manuscript is clearly written and for the most part the data supports the authors' conclusions (see specific comments below). However, there are some issues which need to be clarified before this manuscript is fully suitable for publication.

      We thank the reviewer for noting the importance of our study. Indeed, our previous study has raised the question as to why only the Dyn1xA splice variant mediates ultrafast endocytosis, and our current manuscript now resolves this issue.

      Introduction: the dynx1B Calcineurin binding motif is written PxIxIT consensus but actual sequence is PRITISDP. Is this a typo?

      The sequence is correct. One thing we failed to mention is that the last amino acid in this motif can be either threonine or serine for calcineurin binding, as we demonstrated previously [Jing, et al., 2011 JBC; PMC3162388]. We have amended the text as follows.

      1. calcineurin-binding motif (PxIxI[T/S]) 19.

      Figure 1: the difference between the constructs used in panels C and D is not clear. In D, is it a truncation without residues 796 and 845? If so, it should be labelled clearly in the Western blots. In Panel E, Dyn1xA 746-798 should be labeled Dyn1x 746-798 because it is common to both splice variants.

      We thank the reviewer for pointing this out. Both C and D used the full-length PRRs of Dyn1xA-746 to 864 and xB-746 to 851. To make the labeling clear, we changed Dyn1xA PRR to “Dyn1xA PRR (746-864)” and Dyn1xB PRR to “Dyn1xB PRR 746-851” in Figure 1. In the main text, we made the following changes.

      1. 4: “To identify the potential isoform-selective binding partners, the full-length PRRs of Dyn1xA746-864 and xB746-851 (hereafter, Dyn1xA-PRR and Dyn1xB-PRR, respectively).”

      Figure 1: For amphiphysin binding the authors write that "No difference in binding to Amphiphysin 1 was observed among these peptides (Figure1D-F)." They should write that Dyn1x 746-798 does not bind Amphiphysin1 SH3 domain, confirming the specificity of binding to the 833-838 motif.

      We edited the sentence as suggested.

      1. “Dyn1x 746-798 does not bind Amphiphysin1 SH3 domain (Figure 1G), confirming the specificity of binding to the 833-838 motif as reported in previous studies 29,30. (Figure 1D-F).”

      Figure S2. The panels are way too small to see the shifts and the labelling. Please provide bigger panels

      As suggested, we have now provided bigger panels in Figure S2, and amended the text and Figure legend accordingly.

      We also removed Figure S2B as it was not referred to in the text in any way. (It was the reverse experiment – HSQCs of 15N-labelled SH3 titrated with unlabelled dynamin).l

      Figure 2 panel B. There is a typo in the connecting line between the sequence and the CSP peaks. It is 846 instead of 864 (after 839).

      Corrected.

      Figure 3 panel E. In the text, the authors write that "Western blotting of the bound proteins from the R838A pull-down experiment showed that R838A almost abolished both Endophilin and Amphiphysin binding in xA806-864 (Figure 3D), and reduced Endophilin binding to xA-PRR (Figure 3E)." I think they should write "only slightly reduced Endophilin binding..." it is more faithful to the result and consistent with the conclusion that Endophilin A1 has two binding sites on Dyn1xA PRR.

      We have now provided quantitative data for R838A and R846A (Fig. 3F and G). Endophilin binding is significantly reduced with R846A.

      It is unclear why the R846A mutant affects binding of Dyn1xA 806-864 but not Dyn1xA-PRR-.

      The reviewer asks why the R846A mutant affects binding of Dyn1xA 806-864, but not so much of Dyn1xA-PRR. The explanation is simply that there are two endophilin binding sites in Dyn1xA-PRR. The first is not present in the xA806-864 peptide, while both are present in Dyn1xA-PRR (the full length tail). When doing pull-down experiments, the binding tends to saturate – even when the second site is blocked by R846A. The first site is still able to bind, and the binding appears as normal. The same applies to the R838A mutant.

      Moreover, it affects binding to endophilin as well as amphiphysin, and therefore it is not specific. It is thus not correct to write that "R846 is the only residue found to specifically regulate the Dyn1 interaction with Endophilin as a part of an SDE". In the Discussion (page 11), the authors refer to the R846A mutation as specifically affecting Endophilin binding. This should be toned down, as it also affects Amphiphysin binding. For this important point, the data on quantification of Endophilin binding should be presented.

      The reviewer’s concern is about our claims of specificity of Endophilin A binding in Dyn1xA R846 mutation experiments. The reviewer is correct, and we have now defined specific parameters for those claims. Specifically, we have added new quantitative data from the Western blots in Fig 3E (full-length Dyn1aX-PRR) as Fig 3F-G. We used full-length Dyn1aX-PRR rather than the xA806-864 peptide because the subsequent transfection experiments use full length Dyn1xA. In the new figures 3F and 3G, we quantified Endophilin A, Amphiphysin and Syndapin1 amounts from the multiple Western blots such as Figure 3E (now n=14, 6 experiments, each in with 2-4 replicates for Dyn1xA PRR). R846A mutated in Dyn1xA-PRR significantly reduces the binding to Endophilin A, but it does not significantly affect the binding to Amphiphysin 1and Syndapin1 (Fig 3G). Therefore, this particular Dyn1xA-PRR mutation specifically affects Endophilin A binding, in the context of the full-length tail Dyn1aX-PRR. To make these results clear, we modified the text as below.

      P7. “R838A and R846A caused smaller reductions in Endophilin binding compared to wild-type Dyn1xA-PRR, (Figure 3E, 3F, R838A, median 68.5 ; Figure 3G, R846A, median 59.3 % : R838A reduced the Dyn1/Amphiphysin interaction (Figure 3E, 3F, median 14.2 % binding compared to wild-type Dyn1xA-PRR). By contrast, R846A did not affect Amphiphysin and Syndapin binding to Dyn1xA-PRR (Figure 3E, 3G). Therefore, R846, being part of an SDE, is the only residue we found to specifically regulate the Dyn1 interaction with Endophilin in the context of the full length tail (DynxA-PRR)”.

      Additionally, the reviewer notes that “the authors refer to the R846A mutation as specifically affecting Endophilin binding. This should be toned down, as it also affects Amphiphysin binding.” In the light of the above data and new quantitative analysis (Fig 3F-G), we have clarified the conclusion. However, to be clear that this statement is only correct in the context of the full-length DynxA-PRR, we amended texts as follows:

      P7. “By contrast, R846A did not affect Amphiphysin and Syndapin binding to Dyn1xA-PRR (Figure 3E, 3G). Therefore, R846, being part of an SDE, is the only residue we found to specifically regulate the Dyn1 interaction with Endophilin in the context of the full length tail (DynxA-PRR)”.

      New legends for Figure 3F and G have now been added as follows.

      “(F) The binding of Endophilin A, and Amphiphysin 1 and Syndapin1 to Dyn1xA-PRR (wild type) or R838A mutant quantified from Western blots in (E). n=14 (6 experiments with 2-4 replicates in each). Median and 95% confidential intervals are shown. Kruskal-Wallis with Dunn’s multiple comparisons test (**p (G) The binding of Endophilin A, and Amphiphysin 1 and Syndapin1 to Dyn1xA-PRR (wild type) or R846A mutant quantified from Western blots in (E). n=14 (6 experiments with 2-4 replicates in each). Median and 95% confidential intervals are shown. Kruskal-Wallis with Dunn’s multiple comparisons test was applied (*p

      Figure 3F-G (which are now 3H and 3I in the revised text): what do the star symbols represent in the graphs? I guess the abscissa represents retention time. Please write it clearly instead of a second ordinate for molecular mass, which does not make much sense if this reflects the estimate for the 3 conditions.

      The “stars” are crosses (x) and represent individual data points. The figure legends have been updated for clarity. The reviewer is correct that the X-axis is retention time (min). The second Y-axis is needed to define the points in the curve marked with crosses (x’s). The legends for Figure 3H and I are now changed as follows.

      “(H) SEC-MALS profiles for Dyn1xA alone (in green), Endophilin A SH3 alone (in red) and the complex of the two (in black) are plotted. The x-axis shows retention time. The left axis is the corresponding UV absorbance (280 nm) signals in solid lines, and the right axis shows the molar mass of each peak in crosses. The molecular weight of the complex was determined and tabulated in comparison with the predicted molecular weight. x represent individual data points.

      (I) SEC-MALS profiles for a high concentration of Dyn1xA-PRR/Endophilin A SH3 complex (0.5 mg) (in dark blue) and a low concentration of Dyn1xA-PRR/endophilin A SH3 complex (0.167 mg) (in blue). The x-axis shows retention time. The left axis is the corresponding UV absorbance (280 nm) signals in solid lines, and the right axis shows the molar mass of each peak in crosses. The molecular weight of the complex was determined and tabulated in the table. x represent individual data points.”

      Figure 4: The statement that "By contrast [to Dyn1xA], Endophilin A1 or A2 formed multiple clusters (1-5 clusters)" is not at all clear on the presented pictures. The authors should provide views of portions of axons with several varicosities, for the reader to appreciate the cases where there are more EndoA clusters than Dyn1 clusters.

      In the revised Figure S4, we added additional STED images for a region of axons with more EndoA1/2 clusters than Dyn1xA clusters. The locations of Dyn1xA and EndoA1/2 clusters are annotated in each image based on the local maximum of intensity, which is determined using our custom Matlab analysis scripts (Imoto, et al., Neuron 2022; for the description of the methods, please refer to the Point #14 below). We also added Figure S3 to describe our analysis pipelines. In the Dyn1xA channel, outer contour indicates 50% of local maxima (boundary of Dyn1xA cluster) while inner contour indicates 70% of local maxima of the clusters. In the EndoA1/2 channel, local maxima of the clusters are indicated as points. To reflect these changes, we modified text as below.

      P 9. “By contrast, Endophilin A1 or A2 formed multiple clusters (1-5 clusters) (Figure S4)”

      The legends for Figure S4 are now as follows.

      “Figure S4. Additional STED images for Figure 4.

      (A) The top image shows an axon containing multiple boutons. Signals show overexpression of GFP-tagged Dyn1xA (Dyn1xA) and mCherry-tagged Endophilin A1 (EndoA1). The bottom images show magnifications of four boutons in the top image. Red hot look-up table (LUT) images on the right side of Dyn1xA and EndoA1 images are enhanced contrast images. Outer and inner contours represent 50% and 70% of local maxima of the Dyn1xA, respectively. Black circles represent local maxima of Endophilin A1. In these boutons, multiple EndophilinA1 puncta are present.

      (B) The top image shows an axon congaing multiple boutons. Signals show overexpression of mCherry-tagged Dyn1xA (Dyn1xA) and GFP-tagged Endophilin A1 (EndoA1). The bottom images show magnifications of four boutons in the top image. Red hot LUT images on the right side of Dyn1xA and EndoA2 images are enhanced contrast images. Outer and inner contours represent 50% and 70% of local maxima of the Dyn1xA, respectively. Black circles represent local maxima of Endophilin A2. In these boutons, multiple EndophilinA2 puncta are present.

      (C) STED micrographs of the same synapses as in Figure 4E with an active zone marker Bassoon (magenta) visualized by antibody staining. GFP-tagged Dyn1xA, Dyn1xA S851D/857D or Dyn1xA R846A (green) are additionally stained with GFP-antibodies. Local maxima of Dyn1xA, Dyn1xA S851D/857D or Dyn1xA R846A signals and minimum distance to the active zone boundary are indicated by dark blue lines.”

      Moreover, overexpression of EndophilinA1/2-mCherry is not sufficient to assess its localization. Please consider either immunofluorescence or genome editing (e.g. Orange or TKIT techniques).

      We agree with the reviewer that overexpression obscures the endogenous localization of proteins. To address this point in our previous publication, we titrated the amount of plasmids for Dyn1xA-GFP and transfected neurons just for 20 hours – this protocol allowed us to uncover the endogenous localization of Dyn1xA despite the fact that it was overexpressed in wild-type neurons (Imoto, et al., 2022). We also confirmed this localization by ORANGE-based CRISPR knock-in of GFP-tag in the endogenous locus of Dyn1 just after the exon 23 and confirm the true endogenous localization of Dyn1xA (Imoto, et al., 2022). Similar approaches were taken by the Chapman lab to localize Synaptotagmin-1 and Synaptobrevin 2 in axons (Watson et al, 2023, eLife, PMID: 36729040). We did not emphasize this in the first submission, but we took the same approach for the EndoA1/2 localization. This does not mean that they also unmask the endogenous localization, and the reviewer is correct that additional evidence would strengthen the data here. Thus, as suggested, we have looked at the endogenous EndophilinA1 localization by antibody staining. As the reviewer is likely aware, EndophilinA1 also localizes to other places including dendrites and postsynaptic terminals, making it difficult to analyze the data. However, we observe colocalization of Dyn1xA with endogenous EndoA1. Thus, we believe that our major conclusion here drawn based on EndoA1/2-mCherry overexpression is valid (Reviewer’s Figure 1). Since the Endophilin signals in neighboring processes obscures its localization in synapses-of-interest, repeating this localization experiments with ORANGE-based knock-in would be ideal. However, with the lead author starting his own group and many validations needed to confirm the knock-in results, this experiment would require us at least 4-6 months, and thus, it is beyond the scope of our current study. We will follow up on this localization in the near future, but given that endophilin is required for ultrafast endocytosis (Watanabe, et al., Neuron 2018, PMID: 29953872) and these proteins need to be in condensates at the endocytic sites for accelerating the kinetics of endocytosis (Imoto, et al., Neuron 2022, PMID: 35809574), we are confident that endogenous

      EndoA1/2 are localized with Dyn1xA.

      The analysis of the confocal microscopy data is not explained. How is the number of clusters determined? How far apart are they? Confocal microscopy may not have the resolution to distinguish clusters within a synapse.

      We apologize for the insufficient description of the method. We had provided a more thorough description of the methods in our previous publication (Imoto, et al., Neuron 2022, PMID: 35809574). To make this more automated, we improved our custom Matlab scripts. Please note that all the analysis for the cluster location is performed on STED images, not on normal confocal images. To determine the cluster, first, presynaptic regions (based on Bassoon signals or Dyn1xA signals within boutons) in each STED image are cropped with 900 by 900 nm (regions-of-interest) ROIs. Then, our Matlab scripts calculate the local maxima of fluorescence intensity within the ROIs. To determine the distance between the active zone and the Dyn1xA or EndoA1/2 clusters, the Matlab scripts perform the same local maxima calculations in both channels and make contours at 50% intensity of the local maxima. The minimum distance reflects the shortest distance between the active zone and Dyn1xA/EndoA1/2 contours. To make these points clearer, we modified the main text and the Methods section. In addition, we have added workflow of these analysis as Figure S3.

      P9. Main. “Signals of these proteins are acquired by STED microscopy and analyzed by custom MATLAB scripts, similarly to our previous work23.”

      P20. Methods. “All the cluster distance measurements are performed on STED images. For the measurements, a custom MATLAB code package23 was modified using GPT-4 (OpenAI) to perform semi-automated image segmentation and analysis of the endocytic protein distribution relative to the active zone marked by Bassoon or relative to Dyn1xA cluster in STED images. First, the STED images were blurred with a Gaussian filter with radius of 1.2 pixels to reduce the Poisson noise and then deconvoluted twice using the built-in deconvblind function: the initial point spread function (PSF) input is measured from the unspecific antibodies in the STED images. The second PSF (enhanced PSF) input is chosen as the returned PSF from the initial run of blind deconvolution62. The enhanced PSF was used to deconvolute the STED images to be analyzed. Each time, 10 iterations were performed. All presynaptic boutons in each deconvoluted image were selected within 3030-pixel (0.81 mm2) ROIs based on the varicosity shape and bassoon or Dyn1xA signals. The boundary of active zone or Dyn1xA puncta was identified as the contour that represents half of the intensity of each local maxima in the Bassoon channel. The Dyn1xA clusters and Endophilin A clusters were picked by calculating pixels of local maxima. The distances between the Dyn1xA cluster and active zone boundary or Endophilin A clusters were automatically calculated correspondingly. For the distance measurement, MATLAB distance2curve function (John D'Errico 2024, MATLAB Central File Exchange) first calculated the distance between the local maxima pixel and all the points on the contour of the active zone or Dyn1xA cluster boundary. Next, the shortest distance was selected as the minimum distance. Signals over crossing the ROIs and the Bassoon signals outside of the transfected neurons were excluded from the analysis. The MATLAB scripts are available by request.”

      In the legend of Figure S3,

      “Protein localization in presynapses is determined by semi-automated MATLAB scripts (see Methods).

      (A) Series of deconvoluted STED images are segmented to obtain 50-100 presynapse ROIs in each condition.

      (B) Two representations of the MATLAB analysis interface are shown. The first channel (ch1, green) is processed to identify the pixels of local maxima within this channel. The second channel (ch2, magenta) is normally an active zone protein, Bassoon. Active zone boundary is determined by the contour generated at 50% intensity of the local maxima of ch2. The contours outside of the transfected neurons are manually selected on the interface and excluded from the analysis. Minimum distances from each pixel of the local maxima in ch1 to the contour in ch2 are calculated and shown in the composite image. The plot “Distance distribution” shows all the minimum distance identified in this presynapses ROI (unit of the y axis is nanometer). The plot “Accumulated distance distribution” shows the accumulated distance distribution from the initial to the current presynapses ROI. The plot “Histogram of total intensity” shows the intensity counts around individual local maxima pixels in ch1.”

      For the STED microscopy, a representation of the processed image (after deconvolution) and the localization of the peaks would be important to assess the measurement of distances. If Dyn1xA S851/857D is more diffuse, are there still peaks to measure for every synapse?

      We thank the reviewer for bringing up this important question. In Figure S4C, we have added the position of the local maxima of wild-type and mutant Dyn1xA shown in the main Figure 4E. As the reviewer pointed out, when a protein is more diffuse, it is difficult to find the peak intensity by STED. However, since these proteins are still found at a higher density within a very confined space of a presynapse and synapses are packed with organelles like synaptic vesicles and macromolecules, signals from even diffuse proteins can be detected as clusters, and local maxima can be detected in these images.

      To illustrate this point better, we added Reviewer’s Figure 2 below. In this experiment, we transfected neurons with a typical amount of plasmids (2.0 µg/well) or ~10x lower amount (0.25 µg/well). When the density of cytosolic proteins is high (Reviewer’s Figure 2A), the depletion laser has to be strong enough to induce sufficient stimulated emission and resolve protein localization. Insufficient power would produce low resolution images, leading to inappropriate detection of the local maxima (Reviewer’s Figure 1A). Thus, we set our excitation and depletion laser powers to resolve the protein localization to ~40-80 nm at presynapses. Furthermore, to avoid mislocalization of proteins due to the overexpression, we use 0.25-0.5 ug/well (in 12-well plate) of plasmid DNA for transfection, which is around 10 times lower than the amount used in the typical lipofectamine neuronal transfection protocol (Imoto, et al., Neuron 2022). We also change the medium around 20 hours after the transfection instead of the typical 48 hours (Imoto, et al., Neuron 2022). With these modifications and settings, we can obtain the location of the local maxima of the diffuse signals (Reviewer’s Figure 1B and Figure 4E and Figure S4). We modified the Method section to make these points clearer.

      P 17, “Briefly, plasmids were mixed well with 2 µl Lipofectamine in 100 µl Neurobasal media and incubated for 20 min. For Dyn1xA and Endophilin A expressions, 0.5 µg of constructs were used to reduce the overexpression artifacts23. The plasmid mixture was added to each well with 1 ml of fresh Neurobasal media supplemented with 2 mM GlutaMax and 2% B27. After 4 hours, the medium was replaced with the pre-warmed conditioned media. To prevent too much expression of proteins, neurons were transfected for less than 20 hours and fixed for imaging.”

      P 20, “Quality of the STED images are examined by comparing the confocal and STED images and measuring the size of signals at synapses and PSF (non-specific signals from antibodies).”

      Legends for Figure S4C,

      “(C) STED micrographs of the synapses shown in Figure 4F with an active zone marker Bassoon (magenta). GFP-tagged Dyn1xA, Dyn1xA S851D/857D or Dyn1xA R846A are visualized by antibody staining of GFP (green). Local maxima of Dyn1xA, Dyn1xA S851D/857D or Dyn1xA R846A signals and minimum distance to the active zone boundary are overlaid.”

      Figures 5 and 6: No specific comment. The data and its analysis are very nice and elegant. The comment on the lack of rescue of Dyn1xA on endosome maturation may be a bit overstated, because many "controls" (shRNA control Figure S5 or Dyn3 KO in Imoto et al. 2022) have a significant number of endosomes 10 s after stimulation.

      We thank the reviewer for noting the strength of our data and pointing out this issue on endosomal resolution. In particular, the reviewer is concerned about our interpretation of the ferritin positive endosomes present at 10 s in time-resolved electron microscopy experiments. Indeed, the number of ferritin positive endosomes in Dyn1 KO, Dyn1xA OEx neurons (0.1/profile) is similar to the control conditions: scramble shRNA control (0.1/profile, Figure S5) and Dyn3KO neurons (0.2/profile) in our previous study (Imoto et al. 2022). Although we do not consider Dyn3 KO as a control, given the presence of abnormal endosomal structures, we agree with the reviewer that scramble shRNA control in Figure S5 does indicate that some ferritin-positive endosomes even at 10 s after stimulation. We would like to note that this result is in stark contrast to our previous studies where we observed the number of ferritin positive endosomes returning to the basal level in both wild-type neurons and many scramble shRNA controls (Watanabe et al. 2014, 2018, Imoto et al 2022). Thus, the majority of the data we have indicate that the number of ferritin positive endosomes returns to basal level by 10 s, suggesting that endosomes are typically resolved into synaptic vesicles by this time. However, given that we do not know the nature of the inconsistency here and we cannot exclude the possibility of overexpression artifact of Dyn1xA as an alternative, we changed the following lines.

      P. 10, “Interestingly, the number of ferritin-positive endosomes did not return to the baseline (Figure 5E, F) as in previous studies3,35,36, suggesting that Dyn1xA may not fully rescue the knockout phenotypes or that overexpression of Dyn1xA causes abnormal endosomal morphology.”

      By the way, why did the authors use Dyn1 KO in this study, and not Dyn1,3 DKO as in Imoto et al. 2022?

      This is simply because Dyn3KO displayed an endosomal defect in our previous study (Imoto et al 2022), and we wanted to focus on endocytic phenotypes of Dyn1 KO and mutant rescues in this study.

      In the Discussion, the authors present the binding sites (for endophilin and amphiphysin SH3 domains) as independent. However, these proteins form dimers or even multimers as they cluster around the neck of a forming vesicle. Even though they provide evidence in vitro (Figure 3) that in these conditions of high concentration one dyn1xA-PRR binds one SH3 domain, in cells multiple binding sites on the PRR to these proteins may involve avidity effects, as discussed for example in Rosendale et al. 2019 doi 10.1038/s41467-019-12434-9. For example, the high affinity binding of Dyn1-PRR to amphiphysin cannot be explained only by the sequence 830-838.

      The reviewer suggests “In the Discussion, the authors present the binding sites (for endophilin and amphiphysin SH3 domains) as independent.” However, we do not claim these interactions are functionally independent, except in the context of in vitro experiments where they are sequence-independent.

      They also suggest “However, these proteins form dimers or even multimers as they cluster around the neck of a forming vesicle”. However we do not agree with this in the context of our Discussion, because the evidence of multimers and clustering is convincing but is entirely in vitro data.

      Thirdly they comment that “For example, the high affinity binding of Dyn1-PRR to amphiphysin cannot be explained only by the sequence 830-838.” We fully agree with the statement and felt we had addressed this in the manuscript. To explain, it’s important to point out our relatively new concept here and previously reported by us (Lin Luo et al 2016, PMID: 26893375) of the existence and importance of SDE and LDE for SH3 domains (Endophilin here, syndapin in our previous report). These elements act at a distance from the so-called core PxxP motifs and they provide much higher affinity and specificity than the core region alone. We had further mentioned this in the p11 discussion “Although this is a previously characterized binding site for Amphiphysin and is also present in Dyn1xB-PRR, the extended C-terminal tail of Dyn1xA contains short and long distance elements (SDE and LDE) essential for Endophilin binding, making it higher affinity for Endophilin.” Because the NMR identified F862 as a chemical shift for dynamin, we performed a pulldown with this mutant in the xA746-798 construct (which only contains the higher affinity site) and found that indeed “.F862A reduced Endophilin binding 29% (pOverall, the reviewer correctly points out that “multiple binding sites on the PRR to these proteins may involve avidity effects*” could play a role in vivo. We agree that avidity is an additional possibility, not examined in our study. Therefore, as suggested, we added the following sentence to the discussion on the SDE and LDE impacts.

      P. 11. “Our pull-down results showed that R846A abolished endophilin binding to xA806-864 (which contains only the second and higher affinity binding site and the associated SDE (A839) and LDE (F862)) and reduced about 40% of endophilin binding to the Dyn1xA-PRR (which contains both binding sites) without affecting its interaction with Amphiphysin, providing important partner specificity, although we cannot exclude the possibility that avidity effects may additionally come in play in vivo 42

      Reviewer #1 (Significance (Required)):

      This study provides a significant advance on the mechanisms of dynamin recruitment to endocytic zones in presynaptic terminals. The work adds a significant step by experienced labs (Robinson, Watanabe) who have provided important insight in the mechanisms by many publications in the last years.

      We thank the reviewer for the careful read of our manuscript and positive outlook of our work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      1. This is a compelling study that reports a key discovery to understand the molecular mechanism of ultrafast endocytosis. The authors demonstrate that the Dynamin splice version 1xA (Dynamin 1xA) uniquely binds Endophilin A, in contrast to Dynamin splice version 1xB (Dynamin 1xB) that does not bind Endophilin A and it is not required for ultrafast endocytosis. In addition, the Endophilin A binding occurs in a dephosphorylation-regulated manner. The study is carefully carried out and it is based on high quality data obtained by means of advanced biochemical methodologies, state-of-art flash-freezing electron microscopy analysis, superresolution microscopy and dynamic imaging of exo-and endocytosis in neuronal cultures. The results convincingly support the conclusions.

      We thank the reviewer for supporting the conclusions of our study.

      1. Although additional experiments are not essential to support the claims of the paper there is room, however, for improvement within the pHluorin experiments. These experiments, that are clearly informative and consistent with the rest of experimental data, do not apply the useful approach to separate endo- from exocytosis. The use of bafilomycin or folimycin to block the vesicular proton pump allows the unmasking the endocytosis that is occurring during the stimulus, that should correspond to ultrafast endocytosis. It would be very elegant to demonstrate that such a component, as expected according to the electron microscopy data, requires the binding of Endophilin A to Dynamin 1xA. If the authors have the pHluorin experiments running, the suggested experiments are very much doable because the reagents and the methodology is already in place and the new data could be generated in around six weeks.

      We thank the reviewer for the suggestion. The reviewer is concerned that vGlut1 pHluorin experiment in Figure 6 may not correspond to ultrafast endocytosis. We agree that bafilomycin/folimycin treatment will reveal the amount of endocytosis that takes place while neurons are stimulated. However, we are not certain that endocytosis during this phase would fully correspond to ultrafast endocytosis because reacidification of endocytosed vesicles typically takes 3-4 s (Atluri and Ryan, 2006, PMID: 16495458; although see https://elifesciences.org/articles/36097) and thus, the nature of endocytosis cannot be fully determined by this assay. To claim that endocytosis measured by pHluorin assay during stimulation all correspond to ultrafast endocytosis, we would need to perform very careful work to track single pHluorin molecules at the ultrastructural level and corelate their internalization to pHluorin signals. Perhaps, a rapid acid quench technique used by the Haucke group would also be appropriate to estimate the amount of ultrafast endocytosis (Soykan et al. 2017 PMID: 28231467), but we are not set up to perform such experiments here. Also, our lead author, Yuuta Imoto, is leaving the lab to start up his own group, and it will take us months rather than weeks to get the requested experiments done. Since the point of this experiment was to test whether the interaction of Dyn1xA and EndoA is essential for protein retrieval regardless of the actual mechanisms and the reviewer acknowledges that this point is sufficiently supported by the experiments, we will set this experiment as the priority for the next paper.

      Instead of the bafilomycin or rapid acid quenching experiments, we have now added data from vglut1-pHluorin experiment with a single action potential. With a single action potential, all synaptic vesicle recycling is mediated by ultrafast endocytosis in these neurons (Watanabe et al, 2013 PMID: 24305055; Watanabe et al. 2014, PMID: 25296249). Our electron microscopy experiments in Figure 5 is also performed with a single action potential. As with 10 action potentials, 20 Hz experiments, re-acidification of vglut1-pHluorin is blocked when Dyn1 and EndophilinA1 interaction is disrupted (Figure 6 F-I). We added a description of this result as below.

      P 11. “Similar defects were observed when the experiments were repeated with a single action potential – synaptic vesicle recycling is mediated by ultrafast endocytosis with this stimulation paradigm25 (S851/857 recovery is 73.3% above the baseline; R846A, recovery is 30.0% above the baseline) (Figure S9 A-D). Together, these results suggest that the 20 amino acid extension of Dyn1xA is important for recycling of synaptic vesicle proteins mediated by specific phosphorylation and Endophilin binding sites within the extension.”

      The methods are carefully explained. Some of the experiments are only replicated in two cultures and the authors should justify the reasons to convince the audience that the approaches used have enough low variability for not increasing the n number. The pHluorin experiments, however, are performed only in a single culture; they should replicate these experiments in at least 3 different cultures (three different mice).

      The reviewer is correct. The variability is very low in our ultrastructural studies and STED imaging, and thus, in all our previous publications, two independent cultures are used. We do agree that in the ideal case, we would like to have three independent cultures, but given the nature of ultrastructural studies (control, mutants, and multiple time points), triplicating the data would add another year to our work. We are currently developing AI-based segmentation analysis, and once this pipeline is established, we will be able to increase N. However, please note that for these experiments, we examine around 200 synapses from each condition in electron microscopy studies (Table S2)– these numbers are far more than the gold standard in the field. Likewise, 50-100 synapses are examined for STED experiments (Table S2). To examine variability of our analysis results, we compared a significance between the dataset using cumulative curves and Kolmogorov–Smirnov test (Figure S11). As shown in the summarized data and p value in each condition, there are no significant difference between the datasets.

      For pHluorin analysis, the reviewer is correct. We repeated the experiments twice to increase the N after the initial submission. The data are consistent, and the conclusions are not changed by the additional experiments (Figure 6 and Figure S9). We also changed the Statistical analysis section in Methods as below.

      P. 19. “All electron microscopy data are pooled from multiple experiments after examined on a per-experiment basis (with all freezing on the same day); none of the pooled data show significant deviation from each replicate (Table S2).”

      p 19, “All fluorescence microscopy data were first examined on a per-experiment basis. For Figure 4, the data were pooled; none of the pooled data show significant deviation from each replicate (Figure S11 and Table S2). Sample sizes were 2 independent cultures, at least 50-100 synapses from 4 different neurons in each condition..”

      Legends for Figure S11

      Figure S11. Data variability in Figure 4.

      Cumulative curves are made from each dataset of (A) distance of Endophilin A1 puncta from the edge of Dyn1xA puncta, (B) distance of Endophilin A2 puncta from the edge of Dyn1xA puncta, distance distribution of Dyn1xA from active zone edge in (C) neurons expressing wild-type Dyn1xA-GFP, (D) Dyn1xA-S851/857-GFP and (E) Dyn1xA-R846-GFP. n > 4 coverslips from 2 independent cultures. Kolmogorov–Smirnov (KS) test, p values are indicated in each plot.

      Minor comments: 4. Prior studies referenced appropriately and the text and figures are clear and accurate.

      We thank the reviewer for the careful read of our manuscript.

      The authors should discuss about the mediators (enzymes) responsible for dephosphorylation of phosphor-box 2 that is key for the Dynamin 1xa-Endophilin A interaction.

      We thank the reviewer for the suggestion. We added a discussion on a potential mediator, Dyrk1, as below.

      P. 12. ”What are the kinases that regulate Dyn1? The phosphorylation of phosphobox-1 is mediated by Glycogen synthase kinase-3 beta (GSK3ß) and Cyclin-dependent kinase 5 (CDK5)17, while phosphobox-2 is likely phosphorylated by Trisomy 21-linked dual-specificity tyrosine phosphorylation-regulated kinase 1A (Mnb/Dyrk1)44,45 since Ser851 in phosphobox-2 is shown to be phosphorylated by Mnb/Dyrk1 in vitro32. Furthermore, overexpression of Mnb/Dyrk1 in cultured hippocampal neurons causes slowing down the retrieval of a synaptic vesicle protein vGlut146. Consistently, our data showed that phosphomimetic mutations in phosphobox-2 results disruption of Dyn1xA localization, perturbation of ultrafast endocytosis, and slower kinetics of vGlut1 retrieval. However, how these kinases interplay to regulate the interaction of Dyn1xA, Syndapin1 and Endophilin A1 for ultrafast endocytosis is unknown.”

      It would be very helpful to include a final cartoon depicting the key protein-protein interactions regulated by dephosphorylation (activity) and the sequence of molecular events that leads to ultrafast endocytosis

      As suggested, we made a model figure, (new Figure 7) showing how Dyn1xA and its interaction with EndoA and Syndapin1 increases the kinetics of endocytosis at synapses. Regarding the sequence of molecular events, we think that there are already dephosphorylated fraction of Dyn1xA molecules sitting on the endocytic zone at the resting state and they mediate ultrafast endocytosis. However, it is equally possible that activity-dependent dephosphorylation of Dyn1xA also may play a role (Jing et al. 2011, PMID: 21730063). However, we have no evidence about the sequence of activity dependent modulation of Dyn1xA and its binding partners during ultrafast endocytosis yet. This is much beyond what we have reported in this work and therefore, excluded from the model figure. We added the following to the end of the discussion:

      p13, “Nonetheless, these results suggest that Dyn1xA long C-terminal extension allows multivalent interaction with endocytic proteins and that the high affinity interaction with Endophilin A1 permits phospho-regulation of their interaction and defines its function at synapses (Figure S7)”.

      Figure legend Figure 7,

      “Figure 7. Schematics depicting how specific isoforms Dyn1xA and Endophilin A mediate ultrafast endocytosis.

      A splice variant of dynamin 1, Dyn1xA, but not other isoforms/variants can mediate ultrafast endocytosis. (A) Dyn1xA has 20 amino acid extension which introduces a new high affinity Endophilin A1 binding site. Three amino acids, R846 at the splice site boundary, S851 and S857, act as long-distance element which can enhance affinity of proline rich motifs (PRM) to SH3 motif from outside of the PRM core sequence PxxP. (B) At a resting state, Dyn1xA accumulates at endocytic zone with SH3 containing BAR protein Syndapin 123 and Endophilin A1/2. When phosphobox-1 (Syndapin1 binding) and phosphobox-2 (Endophilin A1/2 binding, around S851/S857) within Dyn1xA PRD are phosphorylated, these proteins are diffuse within the cytoplasm. A dephosphorylated fraction of Dyn1xA molecules can interact with these BAR domain proteins. Loss of interactions including Dyn1xA-R846A or -S851/857D mutations, disrupts endocytic zone pre-accumulations. Consequently, ultrafast endocytosis fails.”

      Reviewer #2 (Significance (Required)):

      This is a remarkable and important advance in the field of endocytosis. The study reports a key discovery to understand the molecular mechanism of ultrafast endocytosis. Scientist interested in synaptic function and the general audience of cell biologist interested in membrane trafficking will very much value this study. The mechanism reported will potentially be included in textbooks in the near future.

      My field of expertise includes molecular mechanisms of presynaptic function and membrane trafficking.

      I have not enough experience to evaluate the quality of the NMR experiments, however, I do not have any problem at all with, in my opinion, elegant results reported.

      We thank the reviewer for the positive outlook of our manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Please find below a point-by-point reply to the reviewers, with our comments in plain text, and reviewer comments in italics. Direct quotations of MS revisions in the below point-by-point reply are in quotation marks.


      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **The manuscript "Circadian regulation of protein turnover and proteome renewal" investigates the role of protein degradation in the circadian control of proteostasis. The researchers suggest that the relatively static levels of protein levels in a cell are incongruent with the known oscillation in protein synthesis. They therefore hypothesize that there should be a compensatory mechanism to counteract rhythmic protein synthesis, rhythmic protein degradation. To investigate this, they employ bulk pulse chase labeling to study the process of degradation. They identify a synchronization between the creation and turnover of proteins in a cell, implying the clock helps to maintain homeostasis through a novel mechanism. They note that these phases align with energy availability, granting a plausible reasoning behind the biological implementation of this regulation. In summary, this is a sound manuscript that adds to the research field. The experiments in this manuscript are well thought out, organized, and explained. In general, the authors do not go further in their conclusions than I think is warranted given the data that they have, though I think that there are some key items that should be addressed before the publication of this manuscript. *

      Thank you for reading and appreciating our work

      Major notes: 1) In figure 1, a clearer idea of what the ** means would be appreciated. What was the standard of significance for this measure?

      Thank you, this was already reported in the methods section but is now reported in the figure legend also.

      * 2) In Figure 1b, it is important to note clearly in the text that the this is not a direct measure of protein degradation, but a subtractive proxy. Though I don't think that necessarily makes the authors conclusions incorrect, the same result could also be obtained if an extra 15% of the proteins were moved into the insoluble fraction. This is the same for Figure 1E and F. *

      Considering only the pulse shown in the left-hand graph of 1B, the reviewer is correct that this could arise by rhythmic partitioning of nascently synthesised proteins between digitonin-soluble and insoluble fractions. This could not readily explain the variation in the % of nascently synthesised digitonin-soluble protein that is degraded however (right hand graph), hence the need for pulse-chase rather than pulse alone. As such, we do not exclude circadian-regulated solubility of nascently synthesised protein or that there is a rhythm of protein synthesis in the soluble fraction, both are likely true. Rather Figure 1B indicates the relative proportion of nascently-synthesised protein in the soluble fraction that is degraded within 1h of synthesis is not constant over time. This is consistent with current understanding of the regulated increase in activity of protein quality control mechanisms (including proteasome-mediated degradation) that are required to maintain protein homeostasis upon an increase in bulk translation (Gandin and Topisirovic, Translation, 2014).

      In contrast, the lysates probed in Fig 1F were extracted in denaturing urea/thiourea buffer and so cannot be explained by variation in protein solubility.

      Considering 1E, to explain this result entirely through solubility changes would require that puromycinylated polypeptides to become more soluble, at discrete phases of the circadian cycle, but only when the proteasome is inhibited. Whilst we cannot formerly exclude this possibility, we are not aware of evidence to support it, whereas there is prior evidence supporting circadian regulation of protein synthesis and proteasome activity.

      To communicate all of this more clearly we have made the following revisions to the text:

      Page 6: ".The experiment was performed over a 24h time series followed by soluble protein extraction using digitonin, which preferentially permeabilises the plasma membrane over organelle membrane."

      Page 6: " Importantly, the proportion of degraded protein varied over time, being highest at around the same time as increased protein synthesis (Fig 1B), indicating time-of-day variation in digitonin-soluble protein turnover which cannot be solely attributed to previously reported circadian regulation of protein solubility (Stangherlin et al, 2021b). Rather, it suggests that global rates of protein degradation may be co-ordinated with protein synthesis rates, and may vary over the circadian cycle."

      Fig 1a legend: "...with digitonin buffer"

      Fig 1e legend: "...in digitonin buffer"

      Fig1f legend: "... and extracted with urea/thiourea buffer"

      * 3) In figure 1c, is the noted oscillation in protease activity due to the oscillation of these proteins? What are the predicted mechanisms behind this? I don't think that this is necessarily within the scope of this paper but should be addressed in the discussion. Also, the peak degradation rate from Figure 1B is 4 hours before the peak enzyme activities. How can this observation be reconciled? *

      Besides this study, our two previous proteomic investigations of the fibroblast circadian proteome detected no biologically significant or consistent rhythm in proteasome subunit abundance (Wong et al., EMBO J, 2021; Hoyle et al., Science Translational Medicine, 2017). Moreover, proteasomes are long-lived stable complexes whose activity is determined by a combination of substrate-level, allosteric and post-translational regulatory mechanisms that includes their reversible sequestration into storage granules (Albert et al., PNAS, 2020; Fu et al., PNAS, 2021; Yasuda et al., Nature, 2020). It is therefore very likely that the observed rhythm in trypsin- and chymotrypsin-like activity occurs post-translationally. Proteasome subunit composition is also known to change, which might be another reason for differences between the protease activities (Marshall and Vierstra, Front Mol Biosci, 2019; Zheng et al., J Neurochem, 2012).

      Due to the nature of the experiment, the degradation rate inferred from Figure 1B does not reflect proteasome activity, exclusively. Rather it reflects the combined sum of processes that remove nascently produced proteins from the cell's digitonin-soluble fraction, which includes proteasomal degradation, but also autophagy, protein secretion and sequestration into other compartments. Therefore, the peak degradation in Fig 1B would not necessarily be expected to coincide with the peak of proteasome activity in Fig 1C. Figure 1A/B is intended as an exemplar for the investigation's rationale and was the first to be performed chronologically.

      To communicate this succinctly, we have revised the relevant text as follows:

      Page 7: "Previous proteomics studies under similar conditions have revealed minimal circadian variation in proteasome subunit abundance (Wong et al, 2022), suggesting that proteasome activity rhythmicity, and therefore rhythms in UPS-mediated protein degradation, are regulated post-translationally (Marshall & Vierstra, 2019; Hansen et al, 2021)"

      * 4) For the pSILAC analysis, the incorporation scheme has a six-hour window between the comparison of the light and heavy peptides. This makes it somewhat difficult to assess whether you are looking a clock effect from T1 or T1+6. This does not negate the findings, but it does question when the synthesis is occurring and what is being compared, which I think should be more clearly discussed in the manuscript. This is discussed later in the manuscript but should be mentioned in this section. *

      Thank you for this suggestion. To communicate this more clearly, we have rearranged the labels at the top of schematic graphs in figures 2b and 3b in order to clearly distinguish the pulse-labelling window from the time of sample collection. The following text has been added to the methods section:

      Page 9: "To enable sufficient heavy labelling for detection, a 6h time window was employed, thus measuring synthesis and abundance within each quarter of the circadian cycle "

      * 5) There are no error bars on figure 2C. What the pSILAC just done in a singlet? If so, the rhythms estimation is likely a large overestimate and should be noted. *

      This first pSILAC experiment was performed in singlet with respect to external time for the RAIN analysis, but is duplicate for the two-way ANOVA that is also reported, by treating each cycle as a separate replicate. In fact, the 6.2% of proteins that were significantly rhythmically abundant by RAIN actually agree well with two previous experiments we performed using mouse fibroblasts under identical conditions: the first with 3h resolution over 3 cycles in singlet (7% rhythmic), the second with 4 biological independent replicates over one cycle (8% rhythmic) (Wong et al., EMBO J, 2021). The curve fits shown in 2C are the standard damped sine wave fits, with p-values from RAIN reported in the figure legend.­­

      Most importantly however, and as noted in the text, the absolute % of rhythmically abundant proteins is rather irrelevant and indeed the absolute numbers of 'rhythmic' proteins can vary wildly, dependent on the analysis method and stringency. The only important point to be gleaned from the estimates shown in Figure 2e is that by either statistical test, most rhythmically abundant proteins are not rhythmically synthesised, and vice versa; however, the % of proteins that are both rhythmically synthesised and rhythmically abundant is 6 to 11--fold higher than would be expected by chance (taking proteins rhythmic by RAIN and ANOVA, respectively; in both cases the overlap between the two sets is highly significant) . This serves as a positive control, i.e., a minority of proteins show correlated rhythms of synthesis and abundance that are consistent with the canonical activity of 'clock-controlled genes' which cannot be explained by overestimation of rhythmicity.

      Odds Ratio comparison synthesis vs total

      Synthesis rhythmic by RAIN - listA size=148, e.g. A8Y5H7, B2RUR8, E9Q4N7

      Total rhythmic by RAIN - listB size=149, e.g. A1A5B6, A2A6T1, A2AI08

      Intersection size=34, e.g. A8Y5H7, O08795, O54910

      Union size=263, e.g. A8Y5H7, B2RUR8, E9Q4N7

      Genome size=2528

      Contingency Table:

      notA inA

      notB 2265 114

      inB 115 34

      Overlapping p-value=5.4e-13

      Odds ratio=5.9

      Overlap tested using Fisher's exact test (alternative=greater)

      Jaccard Index=0.1

      Synthesis rhythmic by ANOVA - listA size=66, e.g. A8Y5H7, O35639, O55143

      Total rhythmic by ANOVA - listB size=83, e.g. A8Y5H7, B2RQC6, E9Q6J5

      Intersection size=16, e.g. A8Y5H7, P22561-2, Q3TB82

      Union size=133, e.g. A8Y5H7, O35639, O55143

      Genome size=2528

      Contingency Table:

      notA inA

      notB 2395 50

      inB 67 16

      Overlapping p-value=9.7e-11

      Odds ratio=11.4

      Overlap tested using Fisher's exact test (alternative=greater)

      Jaccard Index=0.1

      Nevertheless, we agree with the reviewer's general point and have revised the text as follows:

      Page 9: "... and may be susceptible to overestimation of rhythmicity."

      Page 9: "Consistent with similar previous studies, Page 9: "The proportion of such proteins was more than expected by chance (pMethods, Page 21: "...(n=1 per timepoint)"

      * 6) Why were the genes selected in 2C? these are not discussed anywhere else in the manuscript.*

      These are simply illustrative examples so that the reader can better understand what we mean, i.e., two proteins in different phases and one that did not change, all within a similar range of abundance. The selected proteins were not discussed because we do not expect the reader to attach any specific meaning to them. We have revised the figure to include in 2C examples of each rhythmicity category shown in 2E. To make this clear, we now state the following:

      Figure 2 legend: "No specific meaning is inferred from the protein identities”.

      • 7) The authors note that for Figure 2 "These observations are consistent with widespread rhythmic regulation of protein degradation." However, only 5-10% of the proteome is oscillating at any level and less with a discrepancy between synthesis and abundance, so "widespread" is an exaggeration and this statement should be limited to the degradation in the rhythmic proteome. *

      We take the reviewer's point, but the term rhythmic proteome is also inaccurate since half the proteins with rhythmic degradation did not show an abundance rhythm in both mass spec experiments. We therefore revised this sentence as follows:

      Page 10: "These observations are consistent with widespread temporal organisation of protein degradation within the circadian-regulated proteome."

      * 8) The authors note that their more developed strategy in figure 3 would allow for the detection of less abundant proteins. However, they do not discuss that they in fact found less proteins overall, or if they were able to detect proteins of lower abundance. This is of some concern in determining if this is indeed the better method that they predict. How can the authors reconcile this issue? How can they rationalize this explains their increase in oscillating elements? *

      Thank you for raising this point, we did not explain ourselves sufficiently clearly. As stated in the revised text, once we had analysed the first iteration of pSILAC (Fig 2), we realised that detection of heavy-labelled proteins was "inevitably limited and biased the proteome coverage towards abundant proteins with higher synthesis rates". In other words, in order to be considered in our analysis both unlabelled and heavy-labelled peptides needed to be detected in every sample at every time point. In fact, if we do not consider heavy-labelling, the overall coverage in the Fig 3 experiment (6577 proteins) was better than the Figure 2 experiment (6264 proteins), as expected, due to technical improvements in the methods used (by the time of the experiment in Fig. 3, we were able to perform the analysis using mass spectrometry techniques with better fractionation and detection, namely FAIMS and MS3). When the analysis criteria are applied however, this falls to 2302 and 2528 proteins, respectively. Because of the way that mass spectrometry works, many proteins needed to be excluded from analysis because the heavy label wasn't detected in one or more samples. In these cases, we cannot infer that no heavy-labelled protein was present in that sample or even that it was present at lower levels than other samples - it simply wasn't detected and therefore we cannot make any quantitative comparisons. Non-detection of any given heavy peptide may occur for several reasons, the most likely being that it co-elutes from the chromatography column at the same time as other much more abundant (light) peptides and simply escapes detection. This is an unavoidable limitation of the technique, we hope the reviewer can understand our need to restrict the analysis to those proteins whose nascent synthesis, and total abundance in the MMC fraction, can be confidently quantified.

      As the experiments in Fig 2 and Fig 3 were performed independently, with separate TMT sets and different instrumentation, we are also unable to compare absolute abundances of the proteins between the two.

      To communicate this more clearly we have amended Figures 2e and 3e to state the total coverage in the legends, as well as clearly stating the coverage of heavy-labelled proteins in the figure itself. We have also added the following explanation to the text:

      Page 11:

      “Despite enriching for only one cellular compartment, the overall coverage in this experiment was similar to the previous one (6577 and 6264 proteins, respectively), due to the altered and more targeted approach; with heavy peptides detected for 2302 proteins."

      *9) In the comparison of complex turnover rates, the authors need to provide a metric that backs their statement that "the majority of component subunits not only showed similar average heavy to total protein ratios but also a similar change in synthesis over the daily cycle" for figure 3F. *

      Our apologies for this oversight, this is now presented in new Fig S3D.

      * 10) In reference to the AHA incorporation, why is the hypothesis not that, like the puramycin, you would not see oscillation unless you add BTZ? Shouldn't the active degradation regulate the incorporation of AHA such that there is no visible rhythm unless you suppress degradation? *

      AHA is a methionine analogue that is sparsely incorporated into polypeptide chains with minimal effect on protein function/structure (Dietrich et al., PNAS, 2006). Unlike puromycin, therefore, AHA does not lead to chain termination or protein misfolding/degradation (Dermit et al., Mol Biosyst, 2017) and so pulsed application at different phases of the circadian cycle is sufficient to reveal protein synthesis rhythms. The novelty in Fig 3H is the combination of AHA labelling with native PAGE that allows us to validate rhythmic production of high molecular weight protein complexes. This would not be possible with puromycin because prematurely-terminated polypeptide chains are not able to assemble into native complexes unless chain termination happens to occur at the extreme C-terminus and the C-terminus does not partake in any intermolecular interactions within the assembled complex.

      * 11) The authors claim that there is enrichment of the actin cytoskeleton, but where this data can be found should be explained. The only thing that is shown is a few selected graphs of proteins in this pathway. *

      We previously reported circadian regulation of the actin cytoskeleton in Hoyle et al. (Sci Trans Med, 2017). The extremely high relative amplitude of Beta-actin (the structural component of microfilaments) in the MMC fraction is, in and of itself, entirely sufficient to demonstrate a circadian rhythm in the relative ratio of globular to filamentous actin that was originally identified by Ueli Schibler's lab (Gerber et al., Cell, 2013) and then shown to have a cell-autonomous basis in fibroblasts in Hoyle et al (2017). We have included further examples of an actin-binding protein (Corinin1b) and a motor protein (Myosin 6) to further illustrate this, but do not feel further discussion is warranted because it was comprehensively addressed in our previous work. The enrichment for actin was determined by GO analysis, which is now shown in the Fig 4A and referred to in the text.

      The important point in Fig 4C is the difference in phase with the examples shown in Fig 4B and summarised in Figure 4A, i.e., there are a small number of proteins whose presence in the MMC fraction is highest in advance of the majority of rhythmically abundant proteins, but this earlier group doesn't show any significant synthesis rhythm. Actin is one of the most abundant cellular proteins, and by mass it accounts for 67% of the circadian variation of rhythmically abundant proteins that peak in this fraction at the same phase. All these data and analyses are available for scrutiny in Supplementary Table 2.

      To communicate this more clearly we have expanded on this point as follows:

      Page 13: " These proteins were enriched by 9-fold for actin and associated regulators of the actin cytoskeleton (q* 12) The authors note an oscillation in the total levels of p-eif2, commenting that these do not arise from the rhythms in total eif2a but temperature and feeding rhythms. However, unless I misunderstood, this work was done in fibroblast cell culture, so in this case, where would these temperature and feeding rhythms come from? *

      We were insufficiently clear. Daily rhythms of p-eIF2 have been observed under physiological conditions in mouse, in vivo. We do not observe similar rhythms in cultured fibroblasts under constant conditions unless the cells are challenged by stress. By inference therefore, it seems likely that daily rhythms of p-eIF2 in vivo arise from the interaction between cell-autonomous mechanisms and daily systemic cues such as, insulin/IGF-1 signalling and body temperature that are in turn driven by daily rhythms in CNS control, daily feed/fast rhythms and daily rest/activity rhythms, respectively. We have amended the text as follows:

      Page 15: "...and so suggest that daily p-eIF2α rhythms in mouse tissues likely arise through the interaction between cell-autonomous mechanisms and daily cycles of systemic cues, e.g., insulin/IGF-1 signalling and body temperature rhythms driven by daily feed/fast and rest/activity cycles, respectively."

      * 13) In Figure 5d, the treatment impeding degradation is causing cell death while the inhibition of translation does not. However, wouldn't too much, or not enough, translation, without compensatory regulation from degradation cause a problem in the same way that degradation does? *

      It is well-established that acute treatment with high concentrations of proteasomal inhibitors rapidly leads to proteotoxic stress that will trigger apoptosis unless resolved (Dantuma and Lindsten, Cardiovasc Res, 2010). Treatment with CHX is certainly stressful to cells, but in a different way, and cells die through mechanisms generally regarded to be necrotic and certainly do not involve the canonical proteotoxic stress responses that are activated by MG132 and similar drugs. Our findings show that, by whatever mechanisms cells die with CHX treatment, it does not change over the circadian cycle whereas death via proteotoxic stress does, consistent with our prediction. We hope the reviewer agrees it is beyond the scope of our study to explain why CHX-mediated cell death does not show a circadian rhythm in mouse fibroblasts.

      *Reviewer #1 (Significance (Required)):

      *The information that stems from this work is relevant and of interest to circadian clock field as how the regulation of the output of the circadian clock is implemented is still a major question in the field. This manuscript suggests a novel and plausible method for how, at least in part, this regulation occurs. However, the manuscript uses methods that do not measure degradation directly, which is a minor limitation. In addition, the mechanisms by which this regulation is imparted are not addressed in any meaningful way, even in the discussion.

      We are sorry that we did not adequately discuss the extensive previous work that has already addressed regulatory mechanisms. We would like to stress that this manuscript concerns protein turnover and proteome renewal, of which degradation is obviously an important part but not the sole focus.

      To communicate this more clearly, we have amended the title to:

      "Circadian regulation of macromolecular complex turnover and proteome renewal"

      ... which we previously explicitly predicted in the discussion of previous papers (Feeney et al., Nature, 2016; O'Neill et al., Nat Comms, 2020; Wong et al., EMBO J, 2022) and our recent review (Stangherlin et al., Curr Opin Syst Biol, 2021).

      With respect to measurement of degradation - Physiologically, cellular rates of proteasomal degradation are so intimately coupled with protein synthesis that, over circadian timescales, the former cannot meaningfully be studied in isolation. It is possible that the reviewer is alluding to historical methods that measure change over time in the presence of translational or proteasomal inhibitors, but these have long been known to introduce artifacts - because translational inhibition rapidly leads to reduced proteasome activity, whereas proteasomal inhibition rapidly reduces protein synthesis rates through the integrated stress response. We would be interested to hear of any more direct method for measuring protein degradation proteome-wide than the pulsed SILAC method we developed, as we are not aware of any. Even proteasomal proximity labelling coupled with MG132 treatment, recently developed by the Ori lab, does not directly measure degradation (bioarxiv https://www.biorxiv.org/content/10.1101/2022.08.09.503299v1). By definition, degradation can only be measured through the disappearance of something that was previously present, usually by comparing its rate of production with the change in steady state concentration (if any), which we have done using multiple methods.

      With respect to regulation of degradation - We speculated on the mechanisms regulating rhythms in protein turnover in our several previous papers (Feeney et al., Nature, 2016; O'Neill et al., Nat Comms, 2020; Wong et al., EMBO J, 2021; Stangherlin et al, Nat Comms, 2021), whereas outside the circadian field these mechanisms have been addressed extensively. This was also discussed in detail in our recent review on the topic (see Stangherlin et al., COISB, 2021). In this review, we lay out the evidence for a model whereby most aspects of circadian cellular physiology might be explained by daily rhythms in the activity of mammalian target-of-rapamycin complexes (mTORC). This model makes multiple predictions and informs the central hypothesis which is tested in the current manuscript: that circadian rhythms in complex turnover and proteome renewal should be prevalent over abundance rhythms. An enormous body of work over the last two decades has already clearly established mTORC1 as the master regulator of bulk protein synthesis and degradation, and a substantial number of independent observations have demonstrated circadian regulation of mTORC1 activity in vivo and in cultured cells. The mechanisms that drive cell-autonomous mTORC1 signalling are only partially understood (e.g. Feeney et al., Nature, 2016; Wu et al., Cell Metab, 2019), and we continue to explore this experimentally but they certainly lie well beyond the scope of this investigation.

      Therefore, to address the reviewer's concern about inadequate discussion of mechanism, we have expanded on mTORC in the introduction and discussion, as follows:

      Page 3: "Daily rhythms of PERIOD and mTORC activity facilitate daily rhythms of gene expression and protein synthesis. In particular, mTORC1 is a master regulator of bulk 5'-cap-dependent protein synthesis, degradation and ribosome biogenesis (Valvezan & Manning, 2019) whose activity is circadian-regulated in tissues and in cultured cells (Ramanathan et al, 2018; Feeney et al, 2016a; Stangherlin et al, 2021b; Mauvoisin et al, 2014; Jouffe et al, 2013; Sinturel et al, 2017; Cao, 2018). It is plausible that daily rhythms of mTORC activity underlie many aspects of daily physiology (Crosby et al, 2019; Stangherlin et al, 2021a; Beale et al, 2023b)."

      Page 17: "The mechanistic underpinnings for cell-autonomous circadian regulation of the translation and degradation machineries remain to be fully explored, but are likely to be driven by daily rhythms in the activity of mTORC: a key regulator of protein synthesis and degradation as well as macromolecular crowding and sequestration (Stangherlin et al, 2021b, 2021a; Cao, 2018; Adegoke et al, 2019; Ben-Sahra & Manning, 2017; Delarue et al, 2018). In particular, global protein synthesis rates are greatest when mTORC1 activity is highest, in tissues and cultured cells, whereas pharmacological treatments that inhibit mTORC1 activity reduce daily variation in crowding and protein synthesis rates (Feeney et al, 2016a; Lipton et al, 2015; Stangherlin et al, 2021b). Given our focus on proteomic flux and translation-associated protein quality control, autophagy was not directly within the scope of this study but is also mTORC-regulated and subject to daily regulation (Ma et al, 2011; Ryzhikov et al, 2019). In vivo, daily regulation of mTORC activity arises primarily through growth factor signalling associated with daily feed/fast cycles (Crosby et al, 2019; Byles et al, 2021). The mechanisms facilitating cell-autonomous circadian mTORC activity rhythms are incompletely understood but may include Mg.ATP availability (Feeney et al, 2016a) and its direct regulation by PERIOD2 (Wu et al, 2019). This will be an important area for future work."

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: This is a very interesting and well written paper that addresses key questions in the circadian organization of proteostasis. The paper investigates origins of cellular circadian rhythms, invoking a premise early that there is a poor correlation between rhythmic gene expression - regulated by the canonical TTFL - and rhythms of the proteome, which are rather meager. Specifically, they ask how a relatively stable proteome is possible if cells engage in rhythms of cellular protein synthesis? Their hypothesis is that protein degradation must rhythmically compensate for rhythms of synthesis and much of the manuscript is focused on defining the relationship between rhythmic global synthesis and rhythmic degradation. They employ a series of detailed proteomic investigations and biochemical assessments of protein synthesis coupled with various circadian reporters to assess proteosome function. The proteomic experiments reveal a limited number of proteins with oscillations in either synthesis or abundance or both and no discernible pathway organization however, a followup and more refined study that utilized fractionated samples and boosted heavy SILAC identified strikingly, that many proteins in relatively heavy fractions are rhythmic and that these fall into possible complexes including ribosome and chaperonins. Finally, they perform in vivo experiments testing whether the timing of proteotoxic stimuli regulates the degree of the integrated stress response measured as pEif2a. Overall, I think that this is a fascinating paper that addresses and important question but falls short on mechanistically unifying them and completely contextualizing the findings in light of the canonical modes of circadian timekeeping leaving us with an important, but mostly descriptive set of findings. In addition, there are a number of important questions about data interpretation, some issues with data quality that should be addressed outlined below. With revision and further explication, this study will be an excellent addition to the growing field of circadian organization of the cellular proteome. *

      Thank you for reading and appreciating our work

      *Major and minor Comments. Figure 1. Fig 1a. The difference in Pulse and Chase at ZT24 does not appear to reflect the quantified data in 1b. This should be reconciled to make the figure convincing. *

      When working with radioactive cell lysates it is not possible to equalise the level of protein loaded on each gel beforehand as would happen with a western blot, for example. For this reason, the radioactive signal was normalised to the protein level subsequently measured by coomassie staining, as is standard practise for this type of assay, with all 4 replicates being shown in supplementary Fig.1A. An overnight phosphor screen image is presented in the main Fig.1A for illustrative purposes, but we take the point that this might not be immediately obvious. In revised Fig 1A we therefore now also show the relevant coomassie as well as labelling to make clear that the radioactive signal was normalised to protein levels.

      * How was the timing of the chase collection determined? *

      For these proof-of-principle experiments, we empirically determined the minimum duration of pulse and chase necessary to detect a quantifiable signal.

      *Fig 1d-e. What is the evidence that puro labeling results in 'rapid' turnover. *

      Apologies, this has been established for some time. Some additional papers are now cited in this section of the text (Liu et al, PNAS, 2012; Lacsina et al., PLoS One, 2011; Szeto et al., Autophagy, 2006)

      *Fig 1e seems to be missing the data from the treated and untreated conditions? How are the lines produced (e.g. linear versus rhythmic? Are these drawn lines or actual regressions?). *

      Fig 1e depicts the result of the experiment schematically explained in 1d. The only conditions were +Puro or +Puro+BTZ. There was no completely untreated condition, as puromycin incorporation is the basis of the assay (Lacsina et al., PLoS One, 2012; Szeto et al., Autophagy, 2006) and puromycin does not occur naturally in cells. We realise the figure could potentially be confusing without the associated raw data (anti-puromycin blots) - these are shown in supplementary Fig. 2A.

      To explain the method more clearly, the following has been added to the results section where this experiment is described:

      " As determined by anti-puromycin western blots, over two days under constant conditions, puromycin incorporation in the presence of BTZ showed significant circadian variation. In contrast, cells that were treated with puromycin alone showed no such variation, and nor did total cellular protein levels (Fig 1E, Fig S2A).”

      The fit lines are produced by statistical comparison of fits, i.e., our hypothesis (damped cosine fit) vs null hypothesis (no or constant change over time, linear fit, y = mx+c), using sum-of-squares F test. The statistically preferred fit is plotted and p-value displayed on the graph, i.e., the regression line of the preferred fit and parameters are plotted. These details are reported in the figure legends.

      * Why was 30 minutes chosen as labeling time? It seems hard to understand here how protein degradation kinetics can be measured by puromycin labeling if the authors' claim that puromycin labeling potentially changes degradation rates as a function - primary or secondary - of the labeling itself. It seems they are measuring the potential to degrade proteins. *

      Puromycin labelling is a 20 year-old widely-used technique that can be employed in a range of applications. It was first used in a circadian context by Lipton et al (Cell, 2015) whose work we quickly followed (Feeney et al, Nature, 2016). Briefly, puromycin mimics tyrosyl-tRNA to block translation by labelling and releasing elongating polypeptide chains from translating ribosomes. When used at low concentrations (1 ug/mL in this case) puromycin is sparsely and sporadically incorporated into a small minority of elongating polypeptide chains. Those prematurely terminated chains have puromycin at the C-terminus, which can be detected by western blotting. We chose 30 minutes after optimisation experiments, as it was the shortest incubation time where a robust signal could be observed in these cells with this concentration of puromycin. The puromycinylated peptides are preferentially degraded by the ubiquitin-proteasome system because they are efficiently recognised as misfolded/aberrant proteins by chaperones within tens of minutes of being translated. Unless used at much higher concentrations, or over much longer timescales, there is no reason to believe that puromycin affects the degradation machinery itself, but the degradation of puromycinylated peptides depends on the proteasome. Therefore, puromycin+a proteasome inhibitor provides a reliable proxy for translation rate in the preceding 30 minutes, whereas puromycin alone tells us the steady state concentration under normal conditions, i.e., where proteasomes remain active. By subtracting the latter from the former we can infer the level of degradation of puromycinylated peptides that must have occurred in the previous 30 minutes. It is not a perfect technique, but its results agree with other findings in this manuscript: that protein turnover varies more than steady state protein abundance. With respect to the potential to degrade proteins, this is measured in Fig 1C.

      * How do they determine that they are measuring degradation of functionally relevant proteins as opposed to a host of premature truncations? *

      We do not. This is measured by stable isotope labelling in Figures 2-4. Figure 1 provides the rationale for what follows in subsequent figures, i.e., proof-principle experiments suggesting that turnover is not constant over the circadian cycle. No single experiment in Figure 1 is expected to convince the reader that of circadian turnover. Rather, several independent methods suggest that bulk protein synthesis and degradation (turnover) are not constant over time, and deviate from the null hypothesis with variation that appears to change over the 24h circadian cycle.

      * Fig 1e bottom - again is this a true regression line? *

      It is not a regression line, otherwise a p-value of fit would be shown. Fig1e bottom shows the bioluminescence measured at each timepoint from parallel control cultures (average of triplicates, error bars shown as dotted lines). Due to very high temporal resolution (every 30 min) and robustness of the cell line, it appears as a virtually perfect damped (co)sine wave. We apologise that this was not explained more clearly in the figure legend, now amended as follows:

      "Parallel PER2::LUC bioluminescence recording from replicate cell cultures (mean +/- SEM, every 30 min) is shown below, acting as phase marker."

      *Perhaps two time points should be examined here - similar to the pulse chase performed with 35S labeling? *

      We are sorry we were not fully clear with our method here. The puromycin (+/- BTZ) labelling was performed over two days every 4h (so 12 timepoints in total), which can be inferred from the data points in the top two graphs in Fig. 1E, and x-axis - but is now also clearly stated in the figure legend. The bottom right graph was a continuous bioluminescence recording, integrated every 30 min from the set of parallel culture dishes. The bioluminescence data serves as a circadian phase marker, so that we can infer at which biological times synthesis and inferred turnover was higher vs lower.

      We’ve adjusted the text to explain our method more clearly:

      “Acute (30 min) puromycin treatment of cells in culture, with or without proteasomal inhibition (by bortezomib, BTZ), allowed us to measure both total nascent polypeptide production (+BTZ) and the amount of nascent polypeptides remaining when the UPS remained active (-BTZ). This allowed inference of the level of UPS-mediated degradation of puromycylated peptides within each time window, as a proxy for nascent protein turnover (Fig. 1D).”

      * Fig 1f. It appears that Puro labeling results in a rhythm between ZT1 and ZT13 but no statistic is provided and appears that the 'ns' is the results of variance in the data as opposed to difference in means? - would this not contradict the cellular result? What accounts for the rhythm reversal in the presence/absence of BTZ. *

      To be clear, we measured the level of puromycin incorporation in mouse liver in vivo following a similar method employed by Lipton et al, Cell, 2015 (Figure 2). The prediction was that, exactly as in cells (Fig 1E), treatment with a proteasome inhibitor would lead to a much greater increase in puromycinylated peptides at ZT13 than ZT1, because this is when protein synthesis is known to be higher and thus (we predict) protein degradation should also be higher. The experiment was not designed or powered to detect a time effect, it was designed to detect an interaction between time-of-puromycin treatment and BTZ, with the specific prediction being that BTZ would have a greater effect during the active phase. This is what we observed.

      * While the authors have previously demonstrated an increase in rhythmicity of the proteome in Cry1/Cry2 double knockout cells, it would have been welcome here to test a global loss of circadian transcription in the degradation assay. One might expect that these rhythms would also be even higher. What I am really asking is: what is the mechanism for rhythmic degradation and is it dependent on the canonical clock? *

      To address the reviewer's curiosity, we used the proteasome-Glo assay (also used in Fig 1C) to assess whether there was an interaction between genotype (WT vs CKO) and time at opposite phases of the circadian cycle over 2 days. We found a significant interaction by two-way ANOVA, indicating that components of the 'canonical clock' regulate the temporal organisation of proteasomal activity (see revised Figure S1). Circadian regulation of mammalian cellular functions, such as protein turnover, is a complex and dynamic process, whereas gene deletion affects the steady state and may be epistatic to phenotype rather than revealing gene function. We are therefore reluctant to speculate what this result means in the present manuscript, which is focused entirely on testing the hypothesis that global protein turnover and complex biogenesis have cell-intrinsic circadian rhythms in non-stressed, wild type cells.

      To communicate this, the text has been revised as follows:

      "Moreover, we detected a significant interaction between genotype and biological time when comparing trypsin-lik proteasome activity between wild type and Cryptochrome1/2-deficient cells, that lack canonical circadian transcriptional feedback repression (Fig S1B-E). "

      * **Fig 2. How was the 'fixed window' timeframe determined? *

      A trial experiment was performed with labelling windows of various length, and 6h was determined to be the shortest window where enough heavy label incorporation was detected to be able to assess circadian changes. This was the case with our first methodology, which was subsequently improved (Figure 3), and therefore labelling window reduced to 1.5h.

      * *Fig 3h. While admittedly difficult, the native PAGE is not of great quality and kind of unconvincing. Also not really sure why the AHA labeling is used here an nowhere else in the paper.

      AHA is a methionine analogue that is sparsely incorporated into polypeptide chains with minimal effect on protein function/structure (Dietrich et al., PNAS, 2006). Unlike puromycin, therefore, AHA does not lead to chain termination or protein misfolding/degradation (Dermit et al., Mol Biosyst, 2017). In Figure 1, the aim was to validate previous reports of rhythmic protein synthesis assess whether there was any evidence for rhythmic turnover. To this end, we employed two independent methods (35S-labelling and puromycin-incorporation). We did not want to rely on AHA for measuring turnover: although it has been validated and used for this purpose in some studies (McShane et al., Cell, 2016), AHA is not fully equivalent to methionine, and cellular aminoacyl-tRNA synthetases have much higher affinity to methionine than they do to AHA (Ma and Yates, Expert Rev Proteomics, 2018). It is thus impossible to perform AHA labelling without methionine-free medium, and in turn methionine starvation and media changes are known to have an effect on cell signalling and cell metabolism, which would be particularly pronounced in circadian context (over days rather than over hours).

      By contrast, in Fig 3H, we use AHA with native PAGE to specifically validate one inference from the mass spectrometry analyses: circadian production of high molecular weight protein complexes. This would not be possible with puromycin because prematurely terminated polypeptide chains are not able to assemble into native complexes unless chain termination happens to occur at the extreme C-terminus and the C-terminus does not partake in any intermolecular interactions within the assembled complex.

      The raw data (full gels, all replicates) are presented in Figure S2e, which of course was used for quantification. We have now picked a different example for the main figure, which hopefully allows for clearer representation.

      The text in the results section describing the AHA experiment is now amended as follows:

      " To validate these observations by an orthogonal method, we pulse-labelled cells with methionine analogue L-azidohomoalanine (Dieterich et al, 2006). AHA is an exogenous substrate, that cells have lower affinity to than methionine, and it could potentially impact on stability of the labelled proteins (Ma & Yates, 2018) – therefore, we only used AHA to assess nascent complex synthesis, rather than turnover. We analysed the incorporation of the newly synthesised, AHA labelled proteins into highest molecular weight protein species detected under native-PAGE conditions (Fig 3H, S3F). We observed a high amplitude daily rhythm of AHA labelling, indicating the rhythmic translation and assembly of nascent protein complexes. Taken together, these results show that daily rhythms in synthesis and degradation may be particularly pertinent for subunits of macromolecular protein complexes"

      Fig 4. I was a little disappointed here that the authors did not directly assess macromolecular assembly of at least one of their "hits" and demonstrate functional relevance and most of the analysis is maintained at a very superficial, systemic level. STRING assemblies are not terribly helpful without clear k-means clustering or some other clearly visualizable metric for stratifying and organizing the putative PPI data - this figure (S3) could be markedly improved.

      We agree that validation is important. The ribosome is by far the most abundant macromolecular complex in the cell, and was one of the major complexes to show clear evidence for circadian regulation of turnover, but not abundance, by our pSILAC proteomics. To validate this result, we took advantage of two important observations: (1) that all fully assembled ribosomes incorporate ribosomal RNA (rRNA) which can readily be separated from other cellular RNA by density gradient centrifugation; (2) pulse-labelling with heavy uridine-15N2 allows nascent RNA to be distinguished from pre-existing RNA. Thus, combining stable isotope labelling with ribosome purification, we can distinguish nascently assembled ribosomes from total when the RNA is extracted, digested with RNAse, and the % heavy/total UMP quantified by mass spectrometry. These data are presented in new figure 5, and are consistent with findings in Figures 3/4 that circadian regulation of ribosome turnover is prevalent over abundance, and that the phase of highest ribosome turnover coincides with the phases of high translation and turnover overall. We hope by addressing the reviewer's question by an entirely orthogonal method, they can share more confidence in our conclusions.

      The statistical metric for STRING, specifically the p-value for enrichment in physical protein-protein interactions, is presented in the main Fig. 3G. It is now also reported in the legend for new Figure S4 itself.

      * Is it possible that some macromolecular complexes have rhythms because their constituent proteins have differential half-lives when in one complex compared with another in circadian time? This possibility was not discussed. *

      To our knowledge, there is no evidence that any major macromolecular complex in the cell has a functionally significant rhythm in abundance on a cell-autonomous basis. The reviewer’s suggestion is an intriguing possibility, but we can think of no way that it could be measured, even in principle. The simplest interpretation of our data from the independent techniques we employ (pSILAC with fractionation, native PAGE + AHA incorporation) is a rhythm in synthesis.

      *Fig. 5. Why is the first histogram in 3c not at unity? *

      This measures the average fold-induction in aggregation when cells are treated with MG132 for 4h at the indicated timepoints. Unity would indicate no induction at all, so the presented quantifications show that MG132 always elicited an increase in aggregation, with an effect size that varied with circadian phase.

      * Do ZT24 and ZT48 differ, similarly do ZT36 and ZT60?*

      No, neither difference is statistically significant (adjusted p-values of p=0.9 and p=0.07, respectively). This is now specified in the figure legend. Tendency to aggregate is also likely to change as a function of time in culture, which is why we think there is a slight increase overall in the second day of the experiment.

      * Fig S4f is not of good quality with missing eIF2a total and therefore no loading controls. *

      Thank you for prompting us to double-check this. We found that the levels of eIF2a were quite variable between the animals, and therefore we performed this experiment with 6 biological replicates. We have double-checked the quantification, and have now excluded 3 unreliable samples (the ones with undetectable levels of total eIF2a – ZT18 +BTZ replicate 1 & ZT18 -BTZ replicate 2, as well as ZT6 +BTZ replicate 4, where a smear does not allow for a reliable quantification of phospho-eIF2a) instead of 2 that were excluded originally. This still leaves at least 5 biological replicates in each group. In fact, the difference between BTZ and control in ZT6 is now deemed to be even more significant, going down to adjusted p=0.0007.

      *S4e? true regression lines? *

      The same method was used as in Figure 1. The fit lines are produced by statistical comparison of fits, i.e. our hypothesis (damped cosine fit) vs null hypothesis (no change over time, linear fit), using sum-of-squares F test. The statistically preferred fit is plotted and p-value displayed on the graph. These details are reported in the figure legends and methods section.

      While I thought these experiments were effective, they did not tie back well to the rest of the paper. What are the consequences of a temporally sensitive ISR? Which pathways does it effect in circadian time? Here, the main holes in this study are somewhat exposed; namely, a lack of mechanistic depth in explaining the very fascinating, albeit mostly descriptive, findings. The implicit assumption made here is that aggregation is 'bad' but could the opposite be just as true? Taking these considerations in account would further strengthen the discussion.

      The purpose of (former) Fig 5 was entirely to test the functional consequences and potential translational relevance of a daily rhythm in protein turnover. The mechanisms upstream and downstream of the ISR, and link with many diseases, are already quite well understood but we apologise that we did not draw more heavily on the prior literature to provide sufficient context for this experiment. Protein aggregation has long been associated with proteotoxic stress, and we do not assume it is good or bad, we simply use it as an additional validation of a temporally sensitive ISR. To correct this omission we have added the following to the results section before these experiments are introduced:

      "Disruption of proteostasis and sensitivity to proteotoxic stress are strongly linked with a wide range of diseases (Wolff et al, 2014; Harper & Bennett, 2016; Labbadia & Morimoto, 2015; Hipp et al, 2019). Evidently, global protein translation, degradation and complex assembly are crucial processes for cellular proteostasis in general, so cyclic variation in these processes would be expected to have (patho)physiological consequences....

      ...Informed by our observations, we predicted that circadian rhythms of global protein turnover would have functional consequences for maintenance of proteostasis. Specifically, we expected that cells would be differentially sensitive to perturbation of proteostasis induced by proteasomal inhibition using small molecules such as MG132 and BTZ, depending on time-of-day."

      Reviewer #2 (Significance (Required)):

      This is a fascinating paper that addresses key questions in the circadian organization of the proteome. The paper's main findings are that rhythms of protein synthesis and degradation are temporally coordinated to maintain overall stability of the proteome in mouse fibroblasts. Furthermore, the authors present evidence that this temporal organization may be important for assembly of macromolecular complexes. While very interesting, the main limitations are a lack of biochemical and mechanistic explanation and evidence that verifies these, mostly descriptive, findings.

      The fundamental biochemical mechanisms of protein synthesis, degradation, protein quality control and stress response have been studied for decades and are increasingly well understood, at least in cultured cancer cells. What is not understood is the extent to which all of these essential cellular systems are subject to physiological variation over the circadian cycle in quiescent cells. This is the fundamental knowledge gap our study attempts to fill by testing the discrete hypotheses that (1) circadian regulation of macromolecular complex turnover is more prevalent than abundance and that (2) proteome renewal is more prevalent than compositional variation. We suggest that establishing these essential principles of circadian cellular physiology is an essential prerequisite for performing the type perturbational experiments we presume the reviewer would prefer. We would like to reassure the reviewer that such studies have been and are being performed, but we are concerned that the inclusion of a very extensive additional body of work within this manuscript would detract from the clear communication of our major finding that complex turnover and proteome renewal has a cell-autonomous basis.

      *There are some relatively minor statistical and data quality issues that are probably addressable relatively quickly.

      **Upon revision the study would be a welcome addition to investigators interested in proteostasis, circadian biology, cell biology and proteomics.

      **I am a physician-scientist with expertise in circadian rhythms, cell biology, protein synthesis, and biochemistry.

      **Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Seinkmane et al investigate circadian regulation of protein synthesis and degradation in cultured cells and in mice. Their main new finding is that protein synthesis and degradation are in many cases rhythmic but coordinated such that the proteome is rhythmically renewed without an apparent rhythm in total protein abundance. Particularly the pool of large protein complexes is rhythmically renewed in this fashion.

      Using pulsed SILAC in combination with mass spectrometry, the authors are able to distinguish between total and newly synthesized protein levels in mouse lung fibroblasts. Analysis of these data shows that the synthesis of a large number of proteins is rhythmic although the total amount is constant, or that proteins are synthesized at a constant rate but the total amount is rhythmic, suggesting that degradation is rhythmic. By analyzing macromolecular complexes, defined as a high-speed pellet, they also present evidence that the rhythmic components of large complexes oscillate in the same phase and have a similar protein turnover rate. The authors conclude that complexes assemble rhythmically. **The authors also present evidence that the activity of the proteasome oscillates in a circadian manner. Based on this observation, they show (in fibroblasts and in mice) that the response to proteotoxic stress (monitored by eIF2alpha phosphorylation levels, protein aggregation, and apoptosis) is higher at circadian times of high proteasome activity.

      **I am an expert in the circadian field, and the hypothesis and concept behind the work presented here are potentially very interesting, and the experimental design is in principle suitable to answer these questions. However, after reading the paper several times, I cannot find the set of experiments that would convincingly support the authors' conclusions.

      **Major questions/points:

      *The major limitation of the manuscript is that the conclusions rely heavily on statistical analysis and massive processing of data from a bewilderingly large number of very different experiments. In looking at the figures, I have often wondered if the presence or absence of a rhythm is real or a product of the heavily processed data. The fact that a cosine wave fits through data points better than a straight line does not necessarily mean that a circadian rhythm is present.

      We agree that comparison of fits alone does not provide sufficiently reliable evidence. However, the fact that many independent methods (cosinor, RAIN, ANOVA) yield similar overall findings lends more confidence to our findings. We would also argue that the large number of different experiments is a positive aspect of the paper and lends weight to the general conclusions. We instead ask the reviewer to consider an alternative question - we and many other labs have found no evidence for any change in total cellular protein content, and yet there is extensive evidence from independent labs for a 'translational rush hour' whilst (excepting some low abundance transcription factors) very few cellular proteins change by more than 10% over the circadian cycle (see Stangherlin et al, COISB, 2022 for extended discussion of this). We hypothesised a parsimonious explanation for this clear contradiction, and designed experiments whose data were analysed by widely used methods that yielded results that were consistent with prediction. Perhaps the reviewer will at least concede that, if the presented findings do not refute the hypothesis, it should not be rejected until a superior one is proposed?

      * I think that in particular, the SILAC experiment(s) should be repeated and also performed with an arrhythmic control (such as CRY1/2 KO). *

      Whilst we agree that CRY1/2 KO cells show no circadian regulation of transcription and much more variable rhythms in PER2::LUC activity than wild type controls (Putker et al., EMBO J, 2021), in our hands circadian rhythms in proteome composition and protein phosphorylation in CRY1/2 KO are at least as prevalent as in wild type cells (see Wong et al., EMBO J, 2022). Indeed, when we performed a proteasome activity assay in CRY1/2 KO fibroblasts, we observed there was an apparent circadian variation, similar to WT but with a different phase. These data are now presented in revised Figure S1. Similarly, Lipton et al (Cell, 2015) showed circadian translational rhythms in cultured Bmal1 KO cells (see final figure), therefore it is not clear what would constitute an appropriate 'arrhythmic' control.

      In this study, for proteomics experiments, we used a combination of SILAC and TMT, as each technique alone would not be sufficient to answer our specific questions. These two techniques are very resource-intensive on their own, and even more so in combination. We therefore had to prioritise and for the second SILAC-TMT experiment decided to focus on cellular fractionation and questions pertaining macromolecular complexes, which were directly relevant to our hypothesis. While it would undoubtedly also be interesting to study how canonical clock genes, such as Cry1/2, impact turnover on a proteome-wide scale, the focus of our study is physiological regulation of proteome composition, rather than the function of Cryptochrome genes which we already explored in previous work (Putker et al., EMBO J, 2021; Wong et al., EMBO J, 2022).

      Comparability between the whole cell and MMC SILAC experiments is also limited due to the different experimental conditions (6h vs. 1.5h pulse, +booster).

      We do not make any direct comparisons, other than to report that broadly comparable numbers of proteins were detected. Implicitly this means there must be greater coverage of protein complexes in the second pSILAC experiment, which our data bears out. If we were not to report the first experiment, the reader would not understand why we refined the method used in the second. In reporting the results of the 6h pulse, we make the limitations of this experiment very clear i.e. biased towards highly abundant, highly turnover proteins, irrespective of cellular compartment. We should add that even in this experiment there was a clear trend towards rhythmic turnover of ribosomal proteins, but this did not quite achieve significance (p = 0.07) and so we did not want to make claims beyond the data.

      *The essential and new message of the paper is that (at least some) macromolecular complexes undergo circadian renewal (degradation and synthesis). Rather than just analysing an operationally defined pellet fraction by mass spectrometry, this could be shown in more detail and directly for one or two specific macromolecular complexes. Ribosomes, for example, seem particularly suitable, because there would also be the very simple approach of measuring the synthesis of ribosomal RNA by pulse labelling. To me, such an analysis would be perfectly sufficient as a proof of principle. I would then omit aspects such as rhythmic stress response, since many additional experiments are needed to demonstrate this convincingly. *

      Thank you for the excellent suggestion, we agree that validation is important. The ribosome is by far the most abundant macromolecular complex in the cell and was one of the major complexes to show clear evidence for circadian regulation of turnover, but not abundance, by our pSILAC proteomics. To validate this result, we took advantage of two important observations: (1) that all fully assembled ribosomes incorporate ribosomal RNA (rRNA) which can readily be separated from other cellular RNA by density gradient centrifugation; (2) pulse-labelling with heavy uridine-15N2 allows nascent RNA to be distinguished from pre-existing RNA. Thus, combining stable isotope labelling with ribosome purification, we can distinguish nascently assembled ribosomes from total ribosomes when the RNA is extracted, digested with RNAse, and the ratio of light to heavy UMP quantified by mass spectrometry. These data are presented in new figure 5, and are consistent with findings in Figures 3/4 that circadian regulation of ribosome turnover is prevalent over abundance, and that the phase of highest ribosome turnover coincides with the phases of high translation and turnover overall. We hope by addressing the reviewer's question by an entirely orthogonal method, he/she can share more confidence in our conclusions.

      The final figure is included because it tests predictions that were informed by the preceding experiments. It is not intended to be comprehensive exploration of how the integrated stress response changes with the circadian cycle, nor have we claimed this.

      * Specific points: The reader is strongly influenced by the cosine wave or straight lines in the graphs (e.g. 1c, e, 3h, 5b, etc) produced by the analysis of rhythmicity, which basically only gives a yes or no answer. But it is not really that simple. If the algorithm detects a rhythm what is its period? Is it the same as the period of the luciferase reporter? If the period lengths correlate, do the phases as well (e.g. see differences in phases 1c and e)? These questions are not addressed. *

      The temporal resolution of the time course data is much lower than the luciferase reporter and so the error of the fit is greater (usually 1-2h). For the cosine wave curve fit and the associated extra sum-of-squares F test, the period of the oscillation was fixed at either 24h or 25h, as determined from a parallel PER2::LUC control recording. This is now explicitly stated in the methods section

      In terms of phase, the general trend across all experiments is that bulk protein turnover, synthesis and degradation is higher during the 6-8h following the peak of PER2::LUC than at any other point in the circadian cycle. This is also consistent with our previous findings in mouse and human cells (Feeney et al, Nature, 2016; Stangherlin et al., Nat Comms, 2021) as well as findings from many different labs in vivo (e.g. Janich et al., Genome Res, 2016; Atger et al., 2015, PNAS; Sinturel et al., 2017, Cell). We are cautious about trying to be any more specific than this because each assay is measuring something different, and (as can be seen across the figures) there is also some modest variation in the phase of PER2::LUC between experiments, with respect the prior entraining temperature cycle (this will be reported in our forthcoming publication, Rzechorzek et al, in prep). To address the reviewer's point therefore, we have added the following to the discussion:

      "Across all experiments in this study, we find that protein synthesis, degradation and turnover is highest during the 6-8h that follow maximal production of the clock protein PER2. This is coincident with increased glycolytic flux and respiration (Putker et al, 2018), increased macromolecular crowding in the cytoplasm, decreased intracellular K+ concentration and increased mTORC activity (Feeney et al, 2016a; Stangherlin et al, 2021b; Wong et al, 2022)."

      * **The algorithm in Fig 1c predicts a rhythm for the chymotrypsin-like and the trypsin-like but not for the caspase-like activity. The peptide assay measures core proteasome activity independent of ubiquitylation and should therefore be dependent on proteasome concentration in the sample. How can then only two of the three proteasomal activities be rhythmic? Please elaborate and repeat with arrhythmic cells (e.g. CRY1/2 KO). The period length does not seem to correlate with the one of the reporter. Why is that? *

      The arrhythmic controls idea is partially addressed in the response above. We did perform a proteasome activity assay in CRY1/2 KO fibroblasts, and observed daily variation similar to WT, albeit with a different apparent phase. These data are now shown in Figure S1, and referred to in the main text as follows:

      "Moreover, we detected a significant interaction between genotype and biological time when comparing trypsin-like proteasome activity between wild type and Cryptochrome1/2-deficient cells, that lack canonical circadian transcriptional feedback repression (Fig S1B-E)".

      Besides this study, our two previous proteomic investigations of the fibroblast circadian proteome detected no biologically significant or consistent rhythm in proteasome subunit abundance (Wong et al., EMBO J, 2021; Hoyle et al., Science Translational Medicine, 2017). Moreover, proteasomes are long-lived stable complexes whose activity is determined by a combination of substrate-level, allosteric and post-translational regulatory mechanisms that includes their reversible sequestration into storage granules (Albert et al., PNAS, 2020; , Fu et al., PNAS, 2021; Yasuda et al., Nature, 2020). It is therefore very likely that the observed rhythm in trypsin- and chymotrypsin-like activity occurs post-translationally. Proteasome subunit composition is also known to change, which might be another reason for differences between the protease activities (Marshall and Vierstra, Front Mol Biosci, 2019; Zheng et al., J Neurochem, 2012).

      To communicate this succinctly, we have revised the relevant text as follows:

      Page 7: "Moreover, we detected a significant interaction between genotype and biological time when comparing trypsin-like proteasome activity between wild type and Cryptochrome1/2-deficient cells, that lack canonical circadian transcriptional feedback repression (Fig S1B, (Wong et al, 2022)). Previous proteomics studies under similar conditions have revealed minimal circadian variation in proteasome subunit abundance (Wong et al, 2022), suggesting that proteasome activity rhythmicity, and therefore rhythms in UPS-mediated protein degradation, are regulated post-translationally (Marshall & Vierstra, 2019; Hansen et al, 2021)."

      Regarding period length, we apologise for an oversight in Fig 1c: unlike all other experiments presented here, these fits were originally done with a flexible period length (between 20h and 36h). This has now been re-fitted in a similar manner to the other experiments (fixed period of 24h, same as the parallel PER2::LUC controls), and the updated data are presented. This has not influenced the results of the statistical tests (only changed the p-values slightly, but the significance levels remain the same).

      Fig. 1a,b suggest that there is a rhythm in global protein synthesis with a significant peak at 40h. Yet, Fig. 1e suggests otherwise. How can that be? Also, the degradation graph (lower panel 1c) has to be plotted with the ratios calculated from the data points and not the heavily processed fitted graphs. This can be very misleading.

      Fig1a,b was performed under quite different conditions to 1e. As described in the methods section, 35S-labelling experiments require a medium change during both pulse and chase (to replace normal Met with radioactive Met, and vice versa). To avoid growth factor/mTORC1-mediated stimulation of protein synthesis & turnover, these acute media changes must occur in the absence of serum; otherwise media changes would introduce artifacts. In contrast, puromycin labelling (Fig 1e) is performed without any media changes (as puromycin can be added directly to culture cell media), and therefore was performed in normal culture conditions of 10% serum. Thus, due to its well-established effect of growth factor/mTORC1 signalling on bulk translation rate, it is very likely that differences in the phase of translational rhythms between Fig1a,b and 1e are attributable to differing serum concentrations – this phenomenon of serum-dependency of phase is also described in Beale et al, 2023, bioRxiv https://doi.org/10.1101/2023.06.22.546020. The only important point, is that neither of these proof-of-principle experiments support the null hypothesis: that translation rate and turnover remains constant over the circadian cycle. Thus, the hypothesis being tested in Figure 1 is not rejected, and provides the rationale for the subsequent proteome-wide analyses.

      With respect to 1E, given the variance of measurement, the curve fits to Puro and Puro+BTZ already serve to test whether there is any significant ~24h component, a ratio of the respective data points would simply compound the error of measurement. The degradation plot is provided purely for illustrative purposes to help the reader i.e. if these fits were true, what would be expected? We have revised the figure to more clearly communicate that the degradation plot is presented purely as a visual aid, labelled “inferred”, and now show ratio plots in revised Figure 1.

      * **It also strikes me as odd that the amplitude of degradation increases (peak at 28h lower than at 30h) while the amplitude of the core clock oscillation dampens over time (peak at 54h higher than at 53h due to desynchronisation. Only two data values around 54h are responsible for the detected rhythm (2nd peak). Furthermore, phase and period do not agree with the rhythm of proteolytic activities shown in 1c. How can this be explained? *

      Due to the nature of the experiment, the degradation rate inferred from Figure 1B & 1E does not reflect proteasome activity exclusively. Rather it reflects the combined sum of processes that remove nascently produced proteins from the cell's digitonin-soluble fraction, which includes proteasomal degradation, but also autophagy, protein secretion and sequestration into other compartments. Therefore, the peak degradation in Fig 1B & E would not necessarily be expected to coincide with the peak of proteasome activity in Fig 1C. Again, these experiments in Figure 1 simply serve to test the hypothesis (change over circadian cycle) vs the null hypothesis (no change over the circadian cycle).

      To the question of amplitude increase, we speculate that this is due to metabolic changes in cultures over the course of three days – as serum and nutrients from the last medium change at T0 are depleted, cells need to increase degradation to promote turnover and recycling. As we suggest that the rhythms in turnover help cellular bioenergetic efficiency, it is quite plausible that amplitude increases as nutrient-concentrations fall. We are in process of further investigation into how exactly these rhythms vary with nutrient and serum status.i

      * Regarding the MS data shown in Figure 2, is it possible to show a positive / quality control? Best would be MS data of Luciferase (or PER2,3, RevErb/alpha, DBP) to show oscillation of protein levels with the same phase and period as the reporter. *

      Unfortunately, none of these low abundance transcription factors were detected in our MS runs. This is not surprising, given that their copy numbers are estimated at * In Fig. 2c examples of the 4 groups of proteins presented in 2e should be shown (both synthesis and total abundance arrhythmic, either one rhythmic or both rhythmic) and not just what appears to be random examples of rhythmic and arrhythmic proteins. *

      As also requested by another reviewer, we have revised the figure to include examples of each of the rhythmicity categories. No specific meaning is inferred from the chosen protein identities.

      Is it possible at all to distinguish between synthesis/turnover and assembly/disassembly of macromolecular complexes in the MMC SILAC experiment? If so, how?

      We followed the established protocol originally developed in our collaborator Kathryn Lilley's lab, where it has previously been shown that most proteins in the MMC fraction are in macromolecular assemblies (Geladaki et al, Nat Commun, 2019). Proteins that are rhythmically abundant in this fraction, but without an accompanying synthesis rhythm (e.g. Beta-actin, see Hoyle et al., Sci Trans Medicine, 2017) can be reliably assumed to arise solely from rhythmic assembly/disassembly i.e. they are captured in this fraction when assembled, but lost, and therefore not detected, in this fraction when disassembled. However, in the case of rhythmic synthesis and abundance, it is not possible with this technique to directly infer that rhythmic synthesis of a given protein is responsible for its rhythmic assembly in a complex, though they do correlate.

      Therefore, our new figure 5 (with thanks again for this suggestion) approaches this by an orthogonal method, relying on the important observations that a) ribosomes incorporate ribosomal RNA (rRNA) b) this can be readily separated from most other cellular RNA by density gradient centrifugation and c) pulse-labelling with heavy uridine-15N2 allows nascent RNA to be distinguished from pre-existing RNA. Using this technique, we validate a rhythm in production and assembly of mature ribosomes, with its peak consistent with the highest turnover time as measured in Figs 1 and 3, and MMC fraction proteomics (Supplemental table 3), at the descending phase of PER2::LUC.

      * **Looking at Fig. 4b,c, what is the fraction of rhythmic proteins from the MMC experiment that also oscillate in either synthesis, total abundance or both in the whole cell? Is there a general correlation at all? Please show. *

      There were no correlations greater than would be expected by chance (the sets of proteins rhythmic in either synthesis or degradation did not overlap significantly between whole-cell and MMC fractions, as determined by an odds ratio test).

      To communicate this we have added the following text:

      "It is also worth noting that although there were small sets of proteins that were rhythmic in both whole-cell (Figure 2) and MMC fractions (Figure 3), in both synthesis and total abundance, none of these four overlaps were higher than would have been expected by chance."

      * **Why is the phase of the oscillating proteins different in the two experiments (compare Figs. 2f,g and 4a) and does either of them match with the phase of the PER2::LUC reporter, which should be the peak synthesis phase of the clock? *

      This was a labelling error on our part, our apologies and thanks for drawing it to our attention. We had attempted to harmonise all these phase values so that they were mutually comparable between the two mass spec experiments, but omitted to update all the figures. They have now all been updated to be inter-consistent. From our experiments, the peak of PER2::LUC consistently precedes the timing of maximum bulk translation. This phase difference is, at least in part, attributable to the inactivation kinetics of firefly luciferase (see Feeney et al., J Biol Rhythms, 2016), i.e., under conditions of saturating luciferin substrate, PER2 protein abundance peaks several hours later than PER2::LUC activity when measured in longitudinal live cell assays.

      * Regarding the sensitivity to MG132 in Fig. 5b it doesn't make sense that, while eIF2alpha phosphorylation is arrhythmic in untreated cells and the levels of eIF2alpha phosphorylation are (apparently) not exhibiting a rhythmic change by administration of MG132 at different circadian timepoints, the ratio of P-eIF2alpha with and without MG132 suddenly is. Please show in Fig. S4b quantifications of the individual experiments with and without MG132. What is presented in 5b is after all the ratio of ratios of quantifications of Western blots, each of which individually does not display any appreciable rhythm. For me this is two much of processing of data. In my opinion, the MG132 4h acute treatment must show a detectable rhythm.*

      We apologise for being unclear in this panel and description. Our hypothesis concerned the fold-induction of the p-eIF2alpha:eIF2alpha ratio changing as a function of MG132 and time. Our reasoning being that the ratio may be more biologically-relevant as it is the relative change that cells sense and respond to, and not the absolute abundance of p-eIF2alpha. We applied a quantitative, two-channel fluorescent antibody technique to enable detection and quantification of p-eIF2alpha and eIF2alpha from each replicate at each time point from the same band of the same blot. We agree that no p-eIF2alpha rhythm is evident from a cursory inspection of any of the blots. This is due to the innate variance between dishes in extracted protein concentration, as well as the levels of basal eIF2alpha and its phosphorylation, and is the reason that we took great pains to be as quantitative as possible using the two-channel immuno-detection (LICOR). Due to the natural and stochastic variation in eIF2alpha levels and extraction between replicates and over time, it is difficult to get identical eIF2alpha loading to reveal the overlying rhythm in p-eIF2alpha, and furthermore, identical loading would give a misleading impression of the level of temporal variation of eIF2alpha levels. Quantification reveals temporal variation in the MG132 treated samples but not in the untreated controls (Supp Fig 5A) – suggesting that there may be circadian regulation of the cellular response to MG132 challenge, rather than a cell-autonomous p-eIF2alpha rhythm under basal conditions. We quantified fold-induction from MG132 vs untreated to present in Figure 6A. We have presented all the raw data in supplementary figure 5 for readers to validate through their own analysis.

      *Minor:

      In Fig. 1f please show dot blot with error bars as well as the individual experiments in the supplementals. Please check the graph legend (N>=3?) *

      Thank you for pointing out these omissions. The dot blot with error bars is now shown in Fig. 1F, and the full gels are now included as Fig. S2B. The main figure legend for 1f has also had the following added (explaining the N numbers):s

      "Four mice were used per condition, but in some cases one of the four injections were not successful i.e. no puromycin labelling was observed and so no quantification could be performed (full data in Fig. S2B)."

      * Please explain the mechanism of the "booster" used in the second SILAC experiment. *

      The following has been revised in the text:

      " Namely, we added a so-called booster channel: an additional fully heavy-labelled cell sample within a TMT mixture (Klann et al, 2020). When the mixture is analysed by MS, heavy peptides from the booster channel increase the overall signal of all identical heavy peptides at MS1 level; at MS2 and MS3 this results in improved detection of heavy proteins in the other TMT channels of interest, and is particularly advantageous for the proteins with lower turnover that would fall below the MS1 detection limit without the booster."

      *

      **p10 3rd paragraph: S2e not S3e *

      Thank you, this has been fixed.

      p12 last paragraph please add reference to Figs. 5f,g

      Thank you, this has been added.

      *Reviewer #3 (Significance (Required)): *

      xxxxx

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      I would like to express my appreciation for the authors' dedication to revising the manuscript. It is evident that they have thoughtfully addressed numerous concerns I previously raised, significantly contributing to the overall improvement of the manuscript.

      Response: We appreciate the reviewers’ recognition of our efforts in revising the manuscript.

      My primary concern regarding the authors' framing of their findings within the realm of habitual and goal-directed action control persists. I will try explain my point of view and perhaps clarify my concerns. While acknowledging the historical tendency to equate procedural learning with habits, I believe a consensus has gradually emerged among scientists, recognizing a meaningful distinction between habits and skills or procedural learning. I think this distinction is crucial for a comprehensive understanding of human action control. While these constructs share similarities, they should not be used interchangeably. Procedural learning and motor skills can manifest either through intentional and planned actions (i.e., goal-directed) or autonomously and involuntarily (habitual responses).

      Response: We would like to clarify that, contrary to the reviewer’s assertion of a scientific consensus on this matter, the discussion surrounding the similarities and differences between habits and skills remains an ongoing and unresolved topic of interest among scientists (Balleine and Dezfouli, 2019; Du and Haith, 2023; Graybiel and Grafton, 2015; Haith and Krakauer, 2018; Hardwick et al., 2019; Kruglanski and Szumowska, 2020; Robbins and Costa, 2017). We absolutely agree with the reviewer that “Procedural learning and motor skills can manifest either through intentional and planned actions (i.e., goal-directed) or autonomously and involuntarily (habitual responses)”. But so do habits. Some researchers also highlight the intentional/goal-directed nature of habits (e.g., Du and Haith, 2023, “Habits are not automatic” (preprint) or Kruglanski and Szumowska, 2020, “Habitual behavior is goal-driven”: “definitions of habits that include goal independence as a foundational attribute of habits are begging the question; they effectively define away, and hence dispose of, the issue of whether habits are goal-driven (p 1258).” Therefore, there is no clear consensus concerning the concept of habit.

      While we acknowledge the meaningful distinctions between habits and skills, we also recognize a substantial body of literature supporting the overlap between these concepts (cited in our manuscript), particularly at the neural level. The literature clearly indicates that both habits and skills are mediated by subcortical circuits, with a progressive disengagement of cognitive control hubs in frontal and cingulate cortices as repetition evolves. We do not use these concepts interchangeably. Instead, we simply present evidence supporting the assertion that our trained app sequences meet several criteria for their habitual nature.

      Our choice of Balleine and Dezfouli (2018)'s criteria stemmed from the comprehensive nature of their definitions, which effectively synthesized insights from various researchers (Mazar and Wood, 2018; Verplanken et al., 1998; Wood, 2017, etc). Importantly, their list highlights the positive features of habits that were previously overlooked. However, these authors still included a controversial criterion ("habits as insensitive to changes in their relationship to their individual consequences and the value of those consequences"), even though they acknowledged the problems of using outcome devaluation methods and of relying on a null-effect. According to Kruglanski and Szumowska (2020), this criterion is highly problematic as “If, by definition, habits are goalindependent, then any behavior found to be goal-dependent could not be a habit on sheer logical grounds” (p. 1257). In their definition, “habitual behavior is sensitive to the value of the reward (i.e., the goal) it is expected to mediate and is sensitive to the expectancy of goal attainment (i.e., obtainment of the reward via the behavior, p.1265). In fact, some recent analyses of habitual behavior are not using devaluation or revaluation as a criterion (Du and Haith, 2023). This article, for example, ascertains habits using different criteria and provides supporting evidence for trained action sequences being understood as skills, with both goal-directed and habitual components.

      In the discussion of our manuscript, we explicitly acknowledge that the app sequences can be considered habitual or goal-directed in nature and that this terminology does not alter the fact that our overtrained sequences exhibit clear habitual features.

      Watson et al. (2022) aptly detailed my concerns in the following statements: "Defining habits as fluid and quickly deployed movement sequences overlaps with definitions of skills and procedural learning, which are seen by associative learning theorists as different behaviors and fields of research, distinct from habits."

      "...the risk of calling any fluid behavioral repertoire 'habit' is that clarity on what exactly is under investigation and what associative structure underpins the behavior may be lost." I strongly encourage the authors, at the very least, to consider Watson et al.'s (2022) suggestion: "Clearer terminology as to the type of habit under investigation may be required by researchers to ensure that others can assess at a glance what exactly is under investigation (e.g., devaluationinsensitive habits vs. procedural habits)", and to refine their terminology accordingly (to make this distinction clear). I believe adopting clearer terminology in these respects would enhance the positioning of this work within the relevant knowledge landscape and facilitate future investigations in the field.

      Response: We would like to highlight that we have indeed followed Watson et al (2022)’s recommendations on focusing on other features/criteria of habits at the expense of the outcome devaluation/contingency degradation paradigm, which has been more controversial in the human literature. Our manuscript clearly aligns with Watson et al. (2022) ‘s recommendations: “there are many other features of habits that are not captured by the key metrics from outcome devaluation/contingency degradation paradigms such as the speed at which actions are performed and the refined and invariant characteristics of movement sequences (Balleine and Dezfouli, 2019). Attempts are being made to develop novel behavioral tasks that tap into these positive features of habits, and this should be encouraged as should be tasks that are not designed to assess whether that behavior is sensitive to outcome devaluation, but capture the definition of habits through other measures”.

      Regarding the authors' use of Balleine and Dezfouli's (2018) criteria to frame recorded behavior as habitual, as well as to acknowledgment the study's limitations, it's important to highlight that while the authors labelled the fourth criterion (which they were not fulfilling) as "resistance to devaluation," Balleine and Dezfouli (2018) define it as "insensitive to changes in their relationship to their individual consequences and the value of those consequences." In my understanding, this definition is potentially aligned with the authors' re-evaluation test, namely, it is conceptually adequate for evaluating the fourth criterion (which is the most accepted in the field and probably the one that differentiate habits from skills). Notably, during this test, participants exhibited goaldirected behavior.

      The authors characterized this test as possibly assessing arbitration between goal-directed and habitual behavior, stating that participants in both groups "demonstrated the ability to arbitrate between prior automatic actions and new goal-directed ones." In my perspective, there is no justification for calling it a test of arbitration. Notably, the authors inferred that participants were habitual before the test based on some criteria, but then transitioned to goal-directed behavior based on a different criterion. While I agree with the authors' comment that: "Whether the initiation of the trained motor sequences in experiment 3 (arbitration) is underpinned by an action-outcome association (or not) has no bearing on whether those sequences were under stimulus-response control after training (experiment 1)." they implicitly assert a shift from habit to goal-directed behavior without providing evidence that relies on the same probed mechanism. Therefore, I think it would be more cautious to refer to this test as solely an outcome revaluation test. Again, the results of this test, if anything, provide evidence that the fourth criterion was tested but not met, suggesting participants have not become habitual (or at least undermines this option).

      Response: In our previously revised manuscript, we duly acknowledged that the conventional (perhaps nowadays considered outdated) goal devaluation criterion was not met, primarily due to constraints in designing the second part of the study. We did cite evidence from another similar study that had used devaluation app-trained action sequences to demonstrate habitual qualities (but the reviewer ignored this).

      The reviewer points out that we did use a manipulation of goal revaluation in one of the follow-up tests conducted (although this was not a conventional goal revaluation test inasmuch that it was conducted in a novel context). In this test, please note that we used 2 manipulations: monetary and physical effort. Although we did show that subjects, including OCD patients, were apparently goaldirected in the monetary reward manipulation, this was not so clear when goal re-evaluation involved the physical effort expended. In this effort manipulation, participants were less goaloriented and OCD patients preferred to perform the longer, familiar, to the shorter, novel sequence, thus exhibiting significantly greater habitual tendencies, as compared to controls. Hence, we cannot decisively conclude that the action sequence is goal-directed as the reviewer is arguing. In fact, the evidence is equivocal and may reflect both habitual and goal-directed qualities in the performance of this sequence, consistent with recent interpretations of skilled/habitual sequences (Du and Haith, 2023). Relying solely on this partially met criterion to conclude that the app-trained sequences are goal-directed, and therefore not habitual, would be an inaccurate assessment for several reasons: 1) the action sequences did satisfy all other criteria for being habitual; 2) this approach would rest on a problematic foundation for defining habits, as emphasized by Kruglanski & Szumowska (2020); and 3) it would succumb to the pitfall of subscribing to a zero-sum game perspective, as cautioned by various researchers, including the review by Watson et al. (2022) cited by the referee, thus oversimplifying the nuanced nature of human behavior.

      While we have previously complied with the reviewer’s suggestion on relabelling our follow-up test as a “revaluation test” instead of an “arbitration test”, we have now explicitly removed all mentions of the term “arbitration” (which seems to raise concerns) throughout the manuscript. As the reviewer has suggested, we now use a more refined terminology by explicitly referring to the measured behavior as "procedural habits", as he/she suggested. We have also extensively revised the discussion section of our manuscript to incorporate the reviewer’s viewpoint. We hope that these adjustments enhance the clarity and accuracy of our manuscript, addressing the concerns raised during this review process.

      In essence, this is an ontological and semantic matter, that does not alter our findings in any way. Whether the sequences are consider habitual or goal directed, does not change our findings that 1) Both groups displayed equivalent procedural learning and automaticity attainment; 2) OCD patients exhibit greater subjective habitual tendencies via self-reported questionnaires; 3) Patients who had elevated compulsivity and habitual self-reported tendencies engaged significantly more with the motor habit-training app, practiced more and reported symptom relief at the end of the study; 4) these particular patients also show an augmented inclination to attribute higher intrinsic value to familiar actions, a possible mechanism underlying compulsions.

      Reviewer #2 (Recommendations For The Authors):

      A few more small comments (with reference to the point numbers indicated in the rebuttal):

      (14) I am not entirely sure why the suggested analysis is deemed impractical (i.e., why it cannot be performed by "pretending" participants received the points they should have received according to their performance). This can further support (or undermine) the idea of effect of reward on performance rather than just performance on performance.

      Response: We have now conducted this analysis, generating scores for each trial of practices after day 20, when participants no longer gained points for their performance. This analysis assesses whether participants trial-wise behavioral changes exhibit a similar pattern following simulated relative increases or decrease in scores, as if they had been receiving points at this stage. Note that this analysis has fewer trials available, around 50% less on average.

      Before presenting our results, we wish to emphasize the importance of distinguishing between the effects of performance on performance and the effects of reward on performance. In response to a reviewer's suggestion, we assessed the former in the first revision of our manuscript. We normalized the movement time variable and evaluated how normalized behavioral changes responded to score increments and decrements. The results from the original analyses were consistent with those from the normalized data.

      Regarding the phase where participants no longer received scores, we believe this phase primarily helps us understand the impact of 'predicted' or 'learned' rewards on performance. Once participants have learned the simple association between faster performance and larger scores, they can be expected to continue exhibiting the reward sensitivity effects described in our main analysis. We consider it is not feasible to assess the effects of performance on performance during the reward removal phase, which occurs after 20 days. Therefore, the following results pertain to how the learned associations between faster movement times and scores persist in influencing behavior, even when explicit scores are no longer displayed on the screen.

      Results: The main results of the effect of reward on behavioral changes persist, supporting that relative increases or decreases in scores (real or imagined/inferred) modulate behavioral adaptations trial-by-trial in a consistent manner across both cohorts. The direction of the effects of reward is the same as in the main analyses presented in the manuscript: larger mean behavioral changes (smaller std) following ∆R- . First, concerning changes in “normalized” movement time (MT) trial-by-trial, we conducted a 2 x 2 factorial analysis of the centroid of the Gaussian distributions with the same factors Reward, Group and Bin. This analysis demonstrated a significant main effect of Reward (P = 2e-16), but not of Group (P = 0.974) or Bin (P = 0.281). There were no significant interactions between factors. The main Reward effect can be observed in the top panel of the figure below. The same analysis applied to the spread (std) of the Gaussian distributions revealed a significant main effect of Reward (P = 0.000213), with no additional main effects or interactions.

      Author response image 1.

      Next, conducting the same 2 x 2 factorial analyses on the centroid and spread of the Gaussian distributions fitted to the Consistency data, we also obtained a robust significant main effect of Reward. For the centroid variable, we obtained a significant main effect of Reward (P = 0.0109) and Group (P = 0.0294), while Bin and the factor interactions were non-significant. See the top panel of the figure below.

      On the other hand, Reward also modulated significantly the spread of the Gaussian distributions fitted to the Consistency data, P = 0.00498. There were no additional significant main effects or interactions. See the bottom panel in the figure below.

      Note that here the factorial analysis was performed on the logarithmic transformation of the std.

      Author response image 2.

      (16) I find this result interesting and I think it might be worthwhile to include it in the paper.

      Response: We have now included this result in our revised manuscript (page 28)

      (18) I referred to this sentence: "The app preferred sequence was their preferred putative habitual sequence while the 'any 6' or 'any 3'-move sequences were the goal-seeking sequences." In my understanding, this implies one choice is habitual and another indicates goal-directedness.

      One last small comment:
In the Discussion it is stated: "Moreover, when faced with a choice between the familiar and a new, less effort-demanding sequence, the OCD group leaned toward the former, likely due to its inherent value. These insights align with the theory of goal-direction/habit imbalance in OCD (Gillan et al., 2016), underscoring the dominance of habits in particular settings where they might hold intrinsic value."

      This could equally be interpreted as goal-directed behavior, so I do not think there is conclusive support for this claim.

      Response: The choice of the familiar/trained sequence, as opposed to the 'any 6' or 'any 3'-move sequences cannot be explicitly considered goal-directed: firstly, because the app familiar sequences were associated with less monetary reward (in the any-6 condition), and secondly, because participants would clearly need more effort and time to perform them. Even though these were automatic, it would still be much easier and faster to simply tap one finger sequentially 6 times (any6) or 3 times (any-3). Therefore, the choice for the app-sequence would not be optimal/goaldirected. In this sense, that choice aligns with the current theory of goal-direction/habit imbalance of OCD. We found that OCD patients prefer to perform the trained app sequences in the physical effort manipulation (any-3 condition). While this, on one hand cannot be explicitly considered a goal-directed choice, we agree that there is another possible goal involved here, which links to the intrinsic value associated to the familiar sequence. In this sense the action could potentially be considered goal-directed. This highlights the difficulty of this concept of value and agrees with: 1) Hommel and Wiers (2017): “Human behavior is commonly not driven by one but by many overlapping motives . . . and actions are commonly embedded into larger-scale activities with multiple goals defined at different levels. As a consequence, even successful satiation of one goal or motive is unlikely to also eliminate all the others(p. 942) and 2) Kruglanski & Szumowska (2020)’s account that “habits that may be unwanted from the perspective of an outsider and hence “irrational” or purposeless, may be highly wanted from the perspective of the individual for whom a habit is functional in achieving some goal” (p. 1262) and therefore habits are goal-driven.

      References:

      Balleine BW, Dezfouli A. 2019. Hierarchical Action Control: Adaptive Collaboration Between Actions and Habits. Front Psychol 10:2735. doi:10.3389/fpsyg.2019.02735

      Du Y, Haith A. 2023. Habits are not automatic. doi:10.31234/osf.io/gncsf Graybiel AM, Grafton ST. 2015. The Striatum: Where Skills and Habits Meet. Cold Spring Harb Perspect Biol 7:a021691. doi:10.1101/cshperspect.a021691

      Haith AM, Krakauer JW. 2018. The multiple effects of practice: skill, habit and reduced cognitive load. Current Opinion in Behavioral Sciences 20:196–201. doi:10.1016/j.cobeha.2018.01.015

      Hardwick RM, Forrence AD, Krakauer JW, Haith AM. 2019. Time-dependent competition between goal-directed and habitual response preparation. Nat Hum Behav 1–11. doi:10.1038/s41562019-0725-0

      Hommel B, Wiers RW. 2017. Towards a Unitary Approach to Human Action Control. Trends Cogn Sci 21:940–949. doi:10.1016/j.tics.2017.09.009

      Kruglanski AW, Szumowska E. 2020. Habitual Behavior Is Goal-Driven. Perspect Psychol Sci 15:1256– 1271. doi:10.1177/1745691620917676

      Mazar A, Wood W. 2018. Defining Habit in Psychology In: Verplanken B, editor. The Psychology of Habit: Theory, Mechanisms, Change, and Contexts. Cham: Springer International Publishing. pp. 13–29. doi:10.1007/978-3-319-97529-0_2

      Robbins TW, Costa RM. 2017. Habits. Current Biology 27:R1200–R1206. doi:10.1016/j.cub.2017.09.060

      Verplanken B, Aarts H, van Knippenberg A, Moonen A. 1998. Habit versus planned behaviour: a field experiment. Br J Soc Psychol 37 ( Pt 1):111–128. doi:10.1111/j.2044-8309.1998.tb01160.x

      Watson P, O’Callaghan C, Perkes I, Bradfield L, Turner K. 2022. Making habits measurable beyond what they are not: A focus on associative dual-process models. Neurosci Biobehav Rev 142:104869. doi:10.1016/j.neubiorev.2022.104869

      Wood W. 2017. Habit in Personality and Social Psychology. Pers Soc Psychol Rev 21:389–403. doi:10.1177/1088868317720362

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major comments (Public Reviews)

      Generality of grid cells

      We appreciate the reviewers’ concern regarding the generality of our approach, and in particular for analogies in nonlinear spaces. In that regard, there are at least two potential directions that could be pursued. One is to directly encode nonlinear structures (such as trees, rings, etc.) with grid cells, to which DPP-A could be applied as described in our model. The TEM model [1] suggests that grid cells in the medial entorhinal may form a basis set that captures structural knowledge for such nonlinear spaces, such as social hierarchies and transitive inference when formalized as a connected graph. Another would be to use eigen-decomposition of the successor representation [2], a learnable predictive representation of possible future states that has been shown by Stachenfield et al. [3] to provide an abstract structured representation of a space that is analogous to the grid cell code. This general-purpose mechanism could be applied to represent analogies in nonlinear spaces [4], for which there may not be a clear factorization in terms of grid cells (i.e., distinct frequencies and multiple phases within each frequency). Since the DPP-A mechanism, as we have described it, requires representations to be factored in this way it would need to be modified for such purpose. Either of these approaches, if successful, would allow our model to be extended to domains containing nonlinear forms of structure. To the extent that different coding schemes (i.e., basis sets) are needed for different forms of structure, the question of how these are identified and engaged for use in a given setting is clearly an important one, that is not addressed by the current work. We imagine that this is likely subserved by monitoring and selection mechanisms proposed to underlie the capacity for selective attention and cognitive control [5], though the specific computational mechanisms that underlie this function remain an important direction for future research. We have added a discussion of these issues in Section 6 of the updated manuscript.

      (1) Whittington, J.C., Muller, T.H., Mark, S., Chen, G., Barry, C., Burgess, N. and Behrens, T.E., 2020. The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell, 183(5), pp.1249-1263.

      (2) Dayan, P., 1993. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4), pp.613-624.

      (3) Stachenfeld, K.L., Botvinick, M.M. and Gershman, S.J., 2017. The hippocampus as a predictive map. Nature neuroscience, 20(11), pp.1643-1653.

      (4) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      (5) Shenhav, A., Botvinick, M.M. and Cohen, J.D., 2013. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 79(2), pp.217-240. Biological plausibility of DPP-A

      We appreciate the reviewers’ interest in the biological plausibility of our model, and in particular the question of whether and how DPP-A might be implemented in a neural network. In that regard, Bozkurt et al. [1] recently proposed a biologically plausible neural network algorithm using a weighted similarity matrix approach to implement a determinant maximization criterion, which is the core idea underlying the objective function we use for DPP-A, suggesting that the DPP-A mechanism we describe may also be biologically plausible. This could be tested experimentally by exposing individuals (e.g., rodents or humans) to a task that requires consistent exposure to a subregion, and evaluating the distribution of activity over the grid cells. Our model predicts that high frequency grid cells should increase their firing rate more than low frequency cells, since the high frequency grid cells maximize the determinant of the covariance matrix of the grid cell embeddings. It is also worth noting that Frankland et al. [2] have suggested that the use of DPPs may also help explain a mutual exclusivity bias observed in human word learning and reasoning. While this is not direct evidence of biological plausibility, it is consistent with the idea that the human brain selects representations for processing that maximize the volume of the representational space, which can be achieved by maximizing the DPP-A objective function defined in Equation 6. We have added a comment to this effect in Section 6 of the updated manuscript.

      (1) Bozkurt, B., Pehlevan, C. and Erdogan, A., 2022. Biologically-plausible determinant maximization neural networks for blind separation of correlated sources. Advances in Neural Information Processing Systems, 35, pp.13704-13717.

      (2) Frankland, S. and Cohen, J., 2020. Determinantal Point Processes for Memory and Structured Inference. In CogSci.

      Simplicity of analogical problem and comparison to other models using this task

      First, we would like to point out that analogical reasoning is a signatory feature of human cognition, which supports flexible and efficient adaptation to novel inputs that remains a challenge for most current neural network architectures. While humans can exhibit complex and sophisticated forms of analogical reasoning [1, 2, 3], here we focused on a relatively simple form, that was inspired by Rumelhart’s parallelogram model of analogy [4,5] that has been used to explain traditional human verbal analogies (e.g., “king is to what as man is to woman?”). Our model, like that one, seeks to explain analogical reasoning in terms of the computation of simple Euclidean distances (i.e., A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript. It is worth noting that, despite the seeming simplicity of this construction, we show that standard neural network architectures (e.g., LSTMs and transformers) struggle to generalize on such tasks without the use of the DPP-A mechanism.

      Second, we are not aware of any previous work other than Frankland et al. [6] cited in the first paragraph of Section 2.2.1, that has examined the capacity of neural network architectures to perform even this simple form of analogy. The models in that study were hardcoded to perform analogical reasoning, whereas we trained models to learn to perform analogies. That said, clearly a useful line of future work would be to scale our model further to deal with more complex forms of representation and analogical reasoning tasks [1,2,3]. We have noted this in Section 6 of the updated manuscript.

      (1) Holyoak, K.J., 2012. Analogy and relational reasoning. The Oxford handbook of thinking and reasoning, pp.234-259.

      (2) Webb, T., Fu, S., Bihl, T., Holyoak, K.J. and Lu, H., 2023. Zero-shot visual reasoning through probabilistic analogical mapping. Nature Communications, 14(1), p.5144.

      (3) Lu, H., Ichien, N. and Holyoak, K.J., 2022. Probabilistic analogical mapping with semantic relation networks. Psychological review.

      (4) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (5) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (6) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      Clarification of DPP-A attentional modulation

      We would like to clarify several concerns regarding the DPP-A attentional modulation. First, we would like to make it clear that ω is not meant to correspond to synaptic weights, and thank the reviewer for noting the possibility for confusion on this point. It is also distinct from a biasing input, which is often added to the product of the input features and weights. Rather, in our model ω is a vector, and diag (ω) converts it into a matrix with ω as the diagonal of the matrix, and the rest entries are zero. In Equation 6, diag(ω) is matrix multiplied with the covariance matrix V, which results in elementwise multiplication of ω with column vectors of V, and hence acts more like gates. We have noted this in Section 2.2.2 and have changed all instances of “weights (ω)” to “gates (ɡ)” in the updated manuscript. We have also rewritten the definition of Equation 6 and uses of it (as in Algorithm 1) to depict the use of sigmoid nonlinearity (σ) to , so that the resulting values are always between 0 and 1.

      Second, we would like to clarify that we don’t compute the inner product between the gates ɡ and the grid cell embeddings x anywhere in our model. The gates within each frequency were optimized (independent of the task inputs), according to Equation 6, to compute the approximate maximum log determinant of the covariance matrix over the grid cell embeddings individually for each frequency. We then used the grid cell embeddings belonging to the frequency that had the maximum within-frequency log determinant for training the inference module, which always happened to be grid cells within the top three frequencies. Author response image 1 (also added to the Appendix, Section 7.10 of the updated manuscript) shows the approximate maximum log determinant (on the y-axis) for the different frequencies (on the x-axis).

      Author response image 1.

      Approximate maximum log determinant of the covariance matrix over the grid cell embeddings (y-axis) for each frequency (x-axis), obtained after maximizing Equation 6.

      Third, we would like to clarify our interpretation of why DPP-A identified grid cell embeddings corresponding to the highest spatial frequencies, and why this produced the best OOD generalization (i.e., extrapolation on our analogy tasks). It is because those grid cell embeddings exhibited greater variance over the training data than the lower frequency embeddings, while at the same time the correlations among those grid cell embeddings were lower than the correlations among the lower frequency grid cell embeddings. The determinant of the covariance matrix of the grid cell embeddings is maximized when the variances of the grid cell embeddings are high (they are “expressive”) and the correlation among the grid cell embeddings is low (they “cover the representational space”). As a result, the higher frequency grid cell embeddings more efficiently covered the representational space of the training data, allowing them to efficiently capture the same relational structure across training and test distributions which is required for OOD generalization. We have added some clarification to the second paragraph of Section 2.2.2 in the updated manuscript. Furthermore, to illustrate this graphically, Author response image 2 (added to the Appendix, Section 7.10 of the updated manuscript) shows the results after the summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for 3 representative frequencies (left, middle and right panels showing results for the lowest, middle and highest grid cell frequencies, respectively, of the 9 used in the model), obtained after maximizing Equation 6 for each grid cell frequency. The color code indicates the responsiveness of the grid cells to different X and Y locations in the input space (lighter color corresponding to greater responsiveness). Note that the dark blue area (denoting regions of least responsiveness to any grid cell) is greatest for the lowest frequency and nearly zero for the highest frequency, illustrating that grid cell embeddings belonging to the highest frequency more efficiently cover the representational space which allows them to capture the same relational structure across training and test distributions as required for OOD generalization.

      Author response image 2.

      Each panel shows the results after summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for a particular frequency, obtained after maximizing Equation 6 for each grid cell frequency. The left, middle, and right panels show results for the lowest, middle, and highest grid cell frequencies, respectively, of the 9 used in the model. Lighter color in each panel corresponds to greater responsiveness of grid cells at that particular location in the 2d space.

      Finally, we would like to clarify how the DPP-A attentional mechanism is different from the attentional mechanism in the transformer module, and why both are needed for strong OOD generalization. Use of the standard self-attention mechanism in transformers over the inputs (i.e., A, B, C, and D for the analogy task) in place of DPP-A would lead to weightings of grid cell embeddings over all frequencies and phases. The objective function for the DPP-A represents an inductive bias, that selectively assigns the greatest weight to all grid cell embeddings (i.e., for all phases) of the frequency for which the determinant of the covariance matrix is greatest computed over the training space. The transformer inference module then attends over the inputs with the selected grid cell embeddings based on the DPP-A objective. We have added a discussion of this point in Section 6 of the updated manuscript.

      We would like to thank the reviewers for their recommendations. We have tried our best to incorporate them into our updated manuscript. Below we provide a detailed response to each of the recommendations grouped for each reviewer.

      Reviewer #1 (Recommendations for the authors)

      (1) It would be helpful to see some equations for R in the main text.

      We thank the reviewer for this suggestion. We have now added some equations explaining the working of R in Section 2.2.3 of the updated manuscript.

      (2) Typo: p 11 'alongwith' -> 'along with'

      We have changed all instances of ‘alongwith’ to ‘along with’ in the updated manuscript.

      (3) Presumably, this is related to equivariant ML - it would be helpful to comment on this.

      Yes, this is related to equivariant ML, since the properties of equivariance hold for our model. Specifically, the probability distribution after applying softmax remains the same when the transformation (translation or scaling) is applied to the scores for each of the answer choices obtained from the output of the inference module, and when the same transformation is applied to the stimuli for the task and all the answer choices before presenting as input to the inference module to obtain the scores. We have commented on this in Section 2.2.3 of the updated manuscript.

      Reviewer #2 (Recommendations for the authors)

      (1) Page 2 - "Webb et al." temporal context - they should also cite and compare this to work by Marc Howard on generalization based on multi-scale temporal context.

      While we appreciate the important contributions that have been made by Marc Howard and his colleagues to temporal coding and its role in episodic memory and hippocampal function, we would like to clarify that his temporal context model is unrelated to the temporal context normalization developed by Webb et al. (2020) and mentioned on Page 2. The former (Temporal Context Model) is a computational model that proposes a role for temporal coding in the functions of the medial temporal lobe in support of episodic recall, and spatial navigation. The latter (temporal context normalization) is a normalization procedure proposed for use in training a neural network, similar to batch normalization [1], in which tensor normalization is applied over the temporal instead of the batch dimension, which is shown to help with OOD generalization. We apologize for any confusion engendered by the similarity of these terms, and failure to clarify the difference between these, that we have now attempted to do in a footnote on Page 2.

      Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

      (2) page 3 - "known to be implemented in entorhinal" - It's odd that they seem to avoid citing the actual biology papers on grid cells. They should cite more of the grid cell recording papers when they mention the entorhinal cortex (i.e. Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012; Giocomo et al., 2011; Brandon et al., 2011).

      We have now cited the references mentioned below, on page 3 after the phrase “known to be implemented in entohinal cortex”.

      (1) Barry, C., Hayman, R., Burgess, N. and Jeffery, K.J., 2007. Experience-dependent rescaling of entorhinal grids. Nature neuroscience, 10(6), pp.682-684.

      (2) Stensola, H., Stensola, T., Solstad, T., Frøland, K., Moser, M.B. and Moser, E.I., 2012. The entorhinal grid map is discretized. Nature, 492(7427), pp.72-78.

      (3) Giocomo, L.M., Hussaini, S.A., Zheng, F., Kandel, E.R., Moser, M.B. and Moser, E.I., 2011. Grid cells use HCN1 channels for spatial scaling. Cell, 147(5), pp.1159-1170.

      (4) Brandon, M.P., Bogaard, A.R., Libby, C.P., Connerney, M.A., Gupta, K. and Hasselmo, M.E., 2011. Reduction of theta rhythm dissociates grid cell spatial periodicity from directional tuning. Science, 332(6029), pp.595-599.

      (3) To enhance the connection to biological systems, they should cite more of the experimental and modeling work on grid cell coding (for example on page 2 where they mention relational coding by grid cells). Currently, they tend to cite studies of grid cell relational representations that are very indirect in their relationship to grid cell recordings (i.e. indirect fMRI measures by Constaninescu et al., 2016 or the very abstract models by Whittington et al., 2020). They should cite more papers on actual neurophysiological recordings of grid cells that suggest relational/metric representations, and they should cite more of the previous modeling papers that have addressed relational representations. This could include work on using grid cell relational coding to guide spatial behavior (e.g. Erdem and Hasselmo, 2014; Bush, Barry, Manson, Burges, 2015). This could also include other papers on the grid cell code beyond the paper by Wei et al., 2015 - they could also cite work on the efficiency of coding by Sreenivasan and Fiete and by Mathis, Herz, and Stemmler.

      We thank the reviewer for bringing the additional references to our attention. We have cited the references mentioned below on page 2 of the updated manuscript.

      (1) Erdem, U.M. and Hasselmo, M.E., 2014. A biologically inspired hierarchical goal directed navigation model. Journal of Physiology-Paris, 108(1), pp.28-37.

      (2) Sreenivasan, S. and Fiete, I., 2011. Grid cells generate an analog error-correcting code for singularly precise neural computation. Nature neuroscience, 14(10), pp.1330-1337.

      (3) Mathis, A., Herz, A.V. and Stemmler, M., 2012. Optimal population codes for space: grid cells outperform place cells. Neural computation, 24(9), pp.2280-2317.

      (4) Bush, D., Barry, C., Manson, D. and Burgess, N., 2015. Using grid cells for navigation. Neuron, 87(3), pp.507-520

      (4) Page 3 - "Determinantal Point Processes (DPPs)" - it is rather annoying that DPP is defined after DPP-A is defined. There ought to be a spot where the definition of DPP-A is clearly stated in a single location.

      We agree it makes more sense to define Determinantal Point Process (DPP) before DPP-A. We have now rephrased the sentences accordingly. In the “Abstract”, the sentence now reads “Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), which we call DPP attention (DPP-A) - a transformation that ensures maximum sparseness in the coverage of that space.” We have also modified the second paragraph of the “Introduction”. The modified portion now reads “b) an attentional objective inspired from Determinantal Point Processes (DPPs), which are probabilistic models of repulsion arising in quantum physics [1], to attend to abstract representations that have maximum variance and minimum correlation among them, over the training data. We refer to this as DPP attention or DPP-A.” Due to this change, we removed the last sentence of the fifth paragraph of the “Introduction”.

      (1) Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

      (5) Page 3 - "the inference module R" - there should be some discussion about how this component using LSTM or transformers could relate to the function of actual brain regions interacting with entorhinal cortex. Or if there is no biological connection, they should state that this is not seen as a biological model and that only the grid cell code is considered biological.

      While we agree that the model is not construed to be as specific about the implementation of the R module, we assume that — as a standard deep learning component — it is likely to map onto neocortical structures that interact with the entorhinal cortex and, in particular, regions of the prefrontal-posterior parietal network widely believed to be involved in abstract relational processes [1,2,3,4]. In particular, the role of the prefrontal cortex in the encoding and active maintenance of abstract information needed for task performance (such as rules and relations) has often been modeled using gated recurrent networks, such as LSTMs [5,6], and the posterior parietal cortex has long been known to support “maps” that may provide an important substrate for computing complex relations [4]. We have added some discussion about this in Section 2.2.3 of the updated manuscript.

      (1) Waltz, J.A., Knowlton, B.J., Holyoak, K.J., Boone, K.B., Mishkin, F.S., de Menezes Santos, M., Thomas, C.R. and Miller, B.L., 1999. A system for relational reasoning in human prefrontal cortex. Psychological science, 10(2), pp.119-125.

      (2) Christoff, K., Prabhakaran, V., Dorfman, J., Zhao, Z., Kroger, J.K., Holyoak, K.J. and Gabrieli, J.D., 2001. Rostrolateral prefrontal cortex involvement in relational integration during reasoning. Neuroimage, 14(5), pp.1136-1149.

      (3) Knowlton, B.J., Morrison, R.G., Hummel, J.E. and Holyoak, K.J., 2012. A neurocomputational system for relational reasoning. Trends in cognitive sciences, 16(7), pp.373-381.

      (4) Summerfield, C., Luyckx, F. and Sheahan, H., 2020. Structure learning and the posterior parietal cortex. Progress in neurobiology, 184, p.101717.

      (5) Frank, M.J., Loughry, B. and O’Reilly, R.C., 2001. Interactions between frontal cortex and basal ganglia in working memory: a computational model. Cognitive, Affective, & Behavioral Neuroscience, 1, pp.137-160.

      (6) Braver, T.S. and Cohen, J.D., 2000. On the control of control: The role of dopamine in regulating prefrontal function and working memory. Control of cognitive processes: Attention and performance XVIII, (2000).

      (6) Page 4 - "Learned weighting w" - it is somewhat confusing to use "w" as that is commonly used for synaptic weights, whereas I understand this to be an attentional modulation vector with the same dimensionality as the grid cell code. It seems more similar to a neural network bias input than a weight matrix.

      We refer to the first paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (7) Page 4 - "parameterization of w... by two loss functions over the training set." - I realize that this has been stated here, but to emphasize the significance to a naïve reader, I think they should emphasize that the learning is entirely focused on the initial training space, and there is NO training done in the test spaces. It's very impressive that the parameterization is allowing generalization to translated or scaled spaces without requiring ANY training on the translated or scaled spaces.

      We have added the sentence “Note that learning of parameter occurs only over the training space and is not further modified during testing (i.e. over the test spaces)” to the updated manuscript.

      (8) Page 4 - "The first," - This should be specific - "The first loss function"

      We have changed it to “The first loss function” in the updated manuscript.

      (9) Page 4 - The analogy task seems rather simplistic when first presented (i.e. just a spatial translation to different parts of a space, which has already been shown to work in simulations of spatial behavior such as Erdem and Hasselmo, 2014 or Bush, Barry, Manson, Burgess, 2015). To make the connection to analogy, they might provide a brief mention of how this relates to the analogy space created by word2vec applied to traditional human verbal analogies (i.e. king-man+woman=queen).

      We agree that the analogy task is simple, and recognize that grid cells can be used to navigate to different parts of space over which the test analogies are defined when those are explicitly specified, as shown by Erdem and Hasselmo (2014) and Bush, Barry, Manson, and Burgess (2015). However, for the analogy task, the appropriate set of grid cell embeddings must be identified that capture the same relational structure between training and test analogies to demonstrate strong OOD generalization, and that is achieved by the attentional mechanism DPP-A. As suggested by the reviewer’s comment, our analogy task is inspired by Rumelhart’s parallelogram model of analogy [1,2] (and therefore similar to traditional human verbal analogies) in as much as it involves differences (i.e A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript.

      (1) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (2) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (10) Page 5 - The variable "KM" is a bit confusing when it first appears. It would be good to re-iterate that K and M are separate points and KM is the vector between these points.

      We apologize for the confusion on this point. KM is meant to refer to an integer value, obtained by multiplying K and M, which is added to both dimensions of A, B, C and D, which are points in ℤ2, to translate them to a different region of the space. K is an integer value ranging from 1 to 9 and M is also an integer value denoting the size of the training region, which in our implementation is 100. We have clarified this in Section 2.1.1 of the updated manuscript.

      (11) Page 5 - "two continuous dimensions (Constantinescu et al._)" - this ought to give credit to the original study showing the abstract six-fold rotational symmetry for spatial coding (Doeller, Barry and Burgess).

      We have now cited the original work by Doeller et al. [1] along with Constantinescu et al. (2016) in the updated manuscript after the phrase “two continuous dimensions” on page 5.

      (1) Doeller, C.F., Barry, C. and Burgess, N., 2010. Evidence for grid cells in a human memory network. Nature, 463(7281), pp.657-661.

      (12) Page 6 - Np=100. This is done later, but it would be clearer if they right away stated that Np*Nf=900 in this first presentation.

      We have now added this sentence after Np=100. “Hence Np*Nf=900, which denotes the number of grid cells.”

      (13) Page 6 - They provide theorem 2.1 on the determinant of the covariance matrix of the grid code, but they ought to cite this the first time this is mentioned.

      We have cited Gilenwater et al. (2012) before mentioning theorem 2.1. The sentence just before that reads “We use the following theorem from Gillenwater et al. (2012) to construct :”

      (14) Page 6 - It would greatly enhance the impact of the paper if they could give neuroscientists some sense of how the maximization of the determinant of the covariance matrix of the grid cell code could be implemented by a biological circuit. OR at least to show an example of the output of this algorithm when it is used as an inner product with the grid cell code. This would require plotting the grid cell code in the spatial domain rather than the 900 element vector.

      We refer to our response above to the topic “Biological plausibility of DPP-A” and second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contain our responses to this issue.

      (15) Page 6 - "That encode higher spatial frequencies..." This seems intuitive, but it would be nice to give a more intuitive description of how this is related to the determinant of the covariance matrix.

      We refer to the third paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (16) Page 7 - log of both sides... Nf is number of frequencies... Would be good to mention here that they are referring to equation 6 which is only mentioned later in the paragraph.

      As suggested, we now refer to Equation 6 in the updated manuscript. The sentence now reads “This is achieved by maximizing the determinant of the covariance matrix over the within frequency grid cell embeddings of the training data, and Equation 6 is obtained by applying the log on both sides of Theorem 2.1, and in our case where refers to grid cells of a particular frequency.”

      (17) Page 7 - Equation 6 - They should discuss how this is proposed to be implemented in brain circuits.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      18) Page 9 - "egeneralize" - presumably this is a typo?

      Yes. We have corrected it to “generalize” in the updated manuscript.

      (19) Page 9 - "biologically plausible encoding scheme" - This is valid for the grid cell code, but they should be clear that this is not valid for other parts of the model, or specify how other parts of the model such as DPP-A could be biologically plausible.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (20) Page 12 - Figure 7 - comparsion to one-hots or smoothed one-hots. The text should indicate whether the smoothed one-hots are similar to place cell coding. This is the most relevant comparison of coding for those knowledgeable about biological coding schemes.

      Yes, smoothed one-hots are similar to place cell coding. We now mention this in Section 5.3 of the updated manuscript.

      (21) Page 12 - They could compare to a broader range of potential biological coding schemes for the overall space. This could include using coding based on the boundary vector cell coding of the space, band cell coding (one dimensional input to grid cells), or egocentric boundary cell coding.

      We appreciate these useful suggestions, which we now mention as potentially valuable directions for future work in the second paragraph of Section 6 of the updated manuscript.

      (22) Page 13 - "transformers are particularly instructive" - They mention this as a useful comparison, but they might discuss further why a much better function is obtained when attention is applied to the system twice (once by DPP-A and then by a transformer in the inference module).

      We refer to the last paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (23) Page 13 - "Section 5.1 for analogy and Section 5.2 for arithmetic" - it would be clearer if they perhaps also mentioned the specific figures (Figure 4 and Figure 6) presenting the results for the transformer rather than the LSTM.

      We have now rephrased to also refer to the figures in the updated manuscript. The phrase now reads “a transformer (Figure 4 in Section 5.1 for analogy and Figure 6 in Section 5.2 for arithmetic tasks) failed to achieve the same level of OOD generalization as the network that used DPP-A.”

      (24) Page 14 - "statistics of the training data" - The most exciting feature of this paper is that learning during the training space analogies can so effectively generalize to other spaces based on the right attention DPP-A, but this is not really made intuitive. Again, they should illustrate the result of the xT w inner product to demonstrate why this work so effectively!

      We refer to the second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (25) Bibliography - Silver et al., go paper - journal name "nature" should be capitalized. There are other journal titles that should be capitalized. Also, I believe eLife lists family names first.

      We have made the changes to the bibliography of the updated manuscript suggested by the reviewer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript represents a cleanly designed experiment for assessing biological motion processing in children (mean age = 9) with and without ADHD. The group differences concerning accuracy in global and local motion processing abilities are solid, but the analyses suggesting dissociable relationships between global and local processing and social skills, age, and IQ need further interrogation. The results are useful in terms of understanding ADHD and the ontogenesis of different components of the processing of biological motion.

      We thank the editors for the positive assessment of our manuscript. We have carefully considered the reviewers’ constructive and helpful comments and revised our manuscript accordingly. To address the question about the dissociable relationships between global and local BM processing, we have provided more evidence and additional analyses in this revised version.

      Reviewer #1 (Public Review):

      Summary:

      The paper presents a nice study investigating differences in biological motion perception in participants with ADHD in comparison with controls. Motivated by the idea that there is a relationship between biological motion perception and social capabilities, the authors investigated local and global (holistic) biological motion perception, the group, and several additional behavioral variables that are affected in ADHS (IQ, social responsiveness, and attention/impulsivity). As well as local global biological motion perception is reduced in ADHD participants. In addition, the study demonstrates a significant correlation between local biological motion perception skills and the social responsiveness score in the ADHD group, but not the controls. A path analysis in the ADHD data suggests that general performance in biological motion perception is influenced mainly by global biological motion perception performance and attentional and perceptual reasoning skills.

      Strengths:

      It is true that there exists not much work on biological motion perception and ADHD. Therefore, the presented study contributes an interesting new result to the biological motion literature and adds potentially also new behavioral markers for this clinical condition. The design of the study is straightforward and technically sound, and the drawn conclusions are supported by the presented results.

      Thank you for your positive assessment of our work.

      Weaknesses:

      Some of the claims about the relationship between genetic factors and ADHD and the components of biological motion processing have to remain speculative at this point because genetic influences were not explicitly tested in this paper.

      We agree that the relationship between genetic factors and BM processing in ADHD needs more investigation, We have modified our statement in Discussion section as following:

      “Using the classical twin method, Wang et al. found that the distinction between local and global BM processing may stem from the dissociated genetic bases. The former, to a great degree, seems to be acquired phylogenetically20,21,59,60, while the latter is primarily obtained through individual development19.” (lines 421 - 425),

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a relatively clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate your positive assessment of our work.

      Weaknesses:

      Except for the main analysis, it is unclear what the authors' specific predictions are regarding the three different tasks they employ. The three BM tasks are used to probe different processes underlying BM perception, but it is difficult to gather from the introduction why these three specific tasks were chosen and what predictions the authors have about the performance of the ADHD group in these tasks. Relatedly, the authors do not report whether (and if so, how) they corrected for multiple comparisons in their analyses. As the number of tests one should control for depends on the theoretical predictions (http://daniellakens.blogspot.com/2016/02/why-you-dont-need-to-adjust-you-alpha.html), both are necessary for the reader to assess the statistical validity of the results and any inferences drawn from them. The same is the case for the secondary analyses exploring relationships between the 3 individual BM tasks and social function measured by the social responsivity scale (SRS).

      We appreciate these constructive suggestions. In response, we have included a detailed description in the Introduction section explaining why we employed three different tasks and our predictions about the performance in ADHD:

      “Despite initial indications, a comprehensive investigation into BM perception in ADHD is warranted. We proposed that it is essential to deconstruct BM processing into its multiple components and motion features, since treating them as a single entity may lead to misleading or inconsistent findings31. To address this issue, we employed a carefully designed behavioral paradigm used in our previous study19, making slight adjustments to adapt for children. This paradigm comprises three tasks. Task 1 (BM-local) aimed to assess the ability to process local BM cues. Scrambled BM sequences were displayed and participants could use local BM cues to judge the facing direction of the scrambled walker. Task 2 (BM-global) tested the ability to process the global configuration cues of the BM walker. Local cues were uninformative, and participants used global BM cues to determine the presence of an intact walker. Task 3 (BM-general) tested the ability to process general BM cues (local + global cues). The stimulus sequences consisted of an intact walker and a mask containing similar target local cues, so participants could use general BM cues (local + global cues) to judge the facing direction of the walker.” (lines 116 - 130)

      “In Experiment 1, we examined three specific BM perception abilities in children with ADHD. As mentioned earlier, children with ADHD also show impaired social interaction, which implies atypical social cognition. Therefore, we speculated that children with ADHD performed worse in the three tasks compared to TD children.” (lines 131 - 134)

      Additionally, we have reported the p values corrected for multiple comparisons (false discovery rate, FDR) in the revised manuscript wherever it was necessary to adjust the alpha (lines 310 - 316; Table 2). The pattern of the results remained unchanged.

      In relation to my prior point, the authors could provide more clarity on how the conclusions drawn from the results relate to their predictions. For example, it is unclear what specific conclusions the authors draw based on their findings that ADHD show performance differences in all three BM perception tasks, but only local BM is related to social function within this group. Here, the claim is made that their results support a specific hypothesis, but it is unclear to me what hypothesis they are actually referring to (see line 343 & following). This lack of clarity is aggravated by the fact that throughout the rest of the discussion, in particular when discussing other findings to support their own conclusions, the authors often make no distinction between the two processes of interest. Lastly, some of the authors' conclusions related to their findings on local vs global BM processing are not logically following from the evidence: For instance, the authors conclude that their data supports the idea that social atypicalities are likely to reduce with age in ADHD individuals. However, according to their own account, local BM perception - the only measure that was related to social function in their study - is understood to be age invariant (and was indeed not predicted by age in the present study).

      Thank you for pointing out this issue. We have carefully revised the Discussion section about our findings to clarify these points:

      “Our study contributes several promising findings concerning atypical biological motion perception in ADHD. Specifically, we observe the atypical local and global BM perception in children with ADHD. Notably, a potential dissociation between the processing of local and global BM information is identified. The ability to process local BM cues appears to be linked to the traits of social interaction among children with ADHD. In contrast, global BM processing exhibits an age-related development. Additionally, general BM perception may be affected by factors including attention.” (lines 387 - 393)

      We have provided a detailed discussion on the two processes of interest to clarify their potential differences and the possible reasons behind the difference of the divergent developmental trajectories between local and global BM processing:

      “BM perception is considered a multi-level phenomenon56-58. At least in part, processing information of local BM and global BM appears to involve different genetic and neural mechanisms16,19. Using the classical twin method, Wang et al. found that the distinction between local and global BM processing may stem from the dissociated genetic bases. The former, to a great degree, seems to be acquired phylogenetically20,21,59,60, while the latter is primarily obtained through individual development19. The sensitivity to local rather than global BM cues seems to emerge early in life. Visually inexperienced chicks exhibit a spontaneous preference for the BM stimuli of hen, even when the configuration was scrambled20. The same finding was reported in newborns. On the contrary, the ability to process global BM cues rather than local BM cues may be influenced by attention28,29 and shaped by experience24,56.” (lines 419 - 430)

      “We found that the ability to process global and general BM cues improved significantly with age in both TD and ADHD groups, which imply the processing module for global BM cues tends to be mature with development. In the ADHD group, the improvement in processing general and global BM cues is greater than that in processing local BM cues, while no difference was found in TD group. This may be due to the relatively higher baseline abilities of BM perception in TD children, resulting in a relatively milder improvement. These findings also suggest a dissociation between the development of local and global BM processing. There seems to be an acquisition of ability to process global BM cues, akin to the potential age-related improvements observed in certain aspects of social cognition deficits among individuals with ADHD5, whereas local BM may be considered an intrinsic trait19.” (lines 438 -449)

      In addition, we have rephased some inaccurate statements in revised manuscript. Another part of social dysfunction might be stable and due to the atypical local BM perception in ADHD individuals, although some studies found a part of social dysfunction would reduce with age in ADHD individuals. One reason is that some factors related to social dysfunction would improve with age, like the symptom of hyperactivity.

      Results reported are incomplete, making it hard for the reader to comprehensively interpret the findings and assess whether the conclusions drawn are valid. Whenever the authors report negative results (p-values > 0.05), the relevant statistics are not reported, and the data not plotted. In addition, summary statistics (group means) are missing for the main analysis.

      Thanks for your comments. We have provided the complete statistical results in the revised manuscript (lines 309 - 316) and supplementary material, which encompass relevant statistics and plots of negative results (Figure 4, Figure S2 and S3), in accordance with our research questions. And we have also included summary statistics in the Results section (lines 287 - 293).

      Some of the conclusions/statements in the article are too strong and should be rephrased to indicate hypotheses and speculations rather than facts. For example, in lines 97-99 the authors state that the finding of poor BM performance in TD children in a prior study 'indicated inferior applicability' or 'inapplicable experimental design'. While this is one possibility, a perhaps more plausible interpretation could be that TD children show 'poor' performance due to outstanding maturation of the underlying (global) BM processes (as the authors suggest themselves that BM perception can improve with age). There are several other examples where statements are too strong or misleading, which need attention.

      We thank you for pointing out the issue. We have toned down and rephrased the strong statements and made the necessary revisions.

      “Another study found that children with ADHD performed worse in BM detection with moderate ratios of noise34. This may be due to the fact that BM stimuli with noise dots will increase the difficulty of identification, which highlights the difference in processing BM between the two groups33,35.” (lines 111 - 115)

      Reviewer #3 (Public Review):

      Summary:

      The authors presented point light displays of human walkers to children (mean = 9 years) with and without ADHD to compare their biological motion perception abilities and relate them to IQ, social responsiveness scale (SRS) scores and age. They report that children with ADHD were worse at all three biological motion tasks, but that those loading more heavily on local processing related to social interaction skills and global processing to age. The important and solid findings are informative for understanding this complex condition, as well as biological motion processing mechanisms in general. However, I am unsure that these differences between local and global skills are truly supported by the data and suggest some further analyses.

      Strengths:

      The authors present clear differences between the ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      We appreciate your positive feedback. In revised manuscript, we have added more analyses to support the differences between local and global motion processing. Please refer to our response to the point #3 you mentioned below.

      Weaknesses:

      I am unsure that the data are strong enough to support claims about differences between global and local processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but do not seem so plausible to me. I am also concerned about gender, and possible autism, confounds when examining the effect of ADHD. Specifics:

      Gender confound. There are proportionally more boys in the ADHD than TD group. The authors appear to attempt to overcome this issue by including gender as a covariate. I am unsure if this addresses the problem. The vast majority of participants in the ADHD group are male, and gender is categorically, not continuously, defined. I'm pretty sure this violates the assumptions of ANCOVA.

      We appreciate your comments. We concur with you that although we observed a clear difference between local and global BM processing in ADHD, the evidence is to some extent preliminary. The mechanistic possibilities for why these abilities may dissociate have been discussed in revised manuscript. Please refer to the response to reviewer 2’s point #2. To further examine if gender played a role in the observed results, we used a statistical matching technique to obtain a sub-dataset. The pattern of results remained with the more balanced dataset (see Supplementary Information part 1). According to your suggestion, we have also presented the results without using gender as a covariate in main text and also separated the data of boys and girls on the plots (see Figure 1 and Figure S1). There were indeed no signs of a gender effect.

      Autism. Autism and ADHD are highly comorbid. The authors state that the TD children did not have an autism or ADHD diagnosis, but they do not state that the ADHD children did not have an autism diagnosis. Given the nature of the claims, this seems crucial information for the reader.

      Thanks for your suggestion. We have confirmed that all children with ADHD in our study were not diagnosed with autism. We used a semi-structured interview instrument (K-SADSPL-C) to confirm every recruited child with ADHD but not with ASD. The exclusion criteria for both groups were mentioned in the Materials and methods section:

      “Exclusion criteria for both groups were: (a) neurological diseases; (b) other neurodevelopmental disorders (e.g., ASD, Mental retardation, and tic disorders), affective disorders and schizophrenia…” (lines 158 - 162)

      Conclusions. The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. I think that if the authors wish to make strong claims here they must show inferential stats supporting (1) a difference between ADHD and TD SRS-Task 1 correlations, and (2) a difference in those differences for Task 2 and 3 relative to Task 1. I think they should also show a scatterplot of this correlation, with separate lines of best fit for the two groups, for Tasks 2 and 3 as well. I.e. Figure 4 should have 3 panels. I would recommend the same type of approach for age. Currently, they have small samples for correlations, and are reading much of theoretical significance between some correlations passing significance threshold and others not. It would be incredibly interesting if the social skills (as measured by SRS) only relate to local BM abilities, and age only to global, but I think the data are not so clear with the current information. I would be surprised if all BM abilities did not improve with age. Even if there is some genetic starter kit (and that this differs according to particular BM component), most abilities improve with learning/experience/age.

      Thank you for this recommendation. We have added more statistics to test differences between the correlations (a difference between ADHD and TD in SRS-Task 1 correlations (see the first paragraph of Supplementary Information part 2), a difference in SRS-response accuracy correlations for Task 2 and 3 relative to Task 1(see the second paragraph of Supplementary Information part 2), and a difference in age-response accuracy correlations for Task 2 and 3 relative to Task 1 in ADHD group (see Supplementary Information part 3)). Additionally, we have included scatterplots for SRS-Task1, SRS-Task2, SRS-Task3 (with separate lines of best fit for the two groups in each, see Figure 4), SRS-ADHD, SRS-TD, age-ADHD and age-TD (with separate lines of best fit for the three tasks in each, see Figure S2 and S3) to make a clear demonstration. Detailed results have been presented in the revised manuscript and Supplementary Information. We expect these further analyses would strengthen our conclusions.

      Theoretical assumptions. The authors make some sweeping statements about local vs global biological motion processing that need to be toned down. They assume that local processing is specifically genetically whereas global processing is a product of experience. The fact their global, but not local, task performance improves with age would tend to suggest there could be some difference here, but the existing literature does not allow for this certainty. The chick studies showing a neonatal preference are controversial and confounded - I cannot remember the specifics but I think there an upper vs lower visual field complexity difference here.

      Thank you for pointing out this issue. We have toned down rephrased our claims that the difference between local and global BM processing according to your suggestion:

      “These findings suggest that local and global mechanisms might play different roles in BM perception, though the exact mechanisms underlying the distinction remain unclear. Exploring the two components of BM perception will enhance our understanding of the difference between local and global BM processing, shedding light on the psychological processes involved in atypical BM perception.” (lines 87 - 92)

      Reviewer #1 (Recommendations For The Authors):

      I have only a number of minor points that should be addressed prior to publication:

      L. 95ff: What is meant by 'inapplicability of experimental designs' ? This paragraph is somewhat unclear.

      In revised manuscript, we have clarified this point (lines 111 - 115).

      L. 146: The groups were not perfectly balanced for sex. Would results change fundamentally in a more balanced design, or can arguments be given that gender does not play a role, like it seems to be the case for some functions in biological motion perception (e.g. Pavlova et al. 2015; Tsang et al 2018). One could provide a justification that this disbalance does not matter or test for subsampled balanced data sets maybe.

      This point is similar to the point #1 from reviewer 3, and we have addressed this issue in our response above.

      L. 216 f.: In this paragraph it does not become very clear that the mask for the global task consisted of scrambles generated from walkers walking in the same direction. The mask for the local task then should consist of a balanced mask that contains the same amount of local motion cues indicating right and leftwards motion. Was this the case? (Not so clear from this paragraph.)

      Regarding the local task, the introduction of mask would make the task too difficult for children. Therefore, in the local task, we only displayed a scrambled walker without a mask, which was more suitable for children to complete the task. We have made clear this point in the corresponding paragraph (lines 232 - 241).

      L. 224 ff.: Here it would be helpful to see the 5 different 'facing' directions of the walkers. What does this exactly mean? Do they move on oblique paths that are not exactly orthogonal to the viewing directions, and how much did these facing directions differ?

      Out of the five walkers we used, two faced straight left or right, orthogonal to the viewing directions. Two walked with their bodies oriented 45 degrees from the observer, to the left or right. The last one walked towards the observer. We have included a video (Video 4) to demonstrate the 5 facing directions.

      L. 232: How was the number of 5 practicing trials determined/justified?

      As mentioned in main text, global BM processing is susceptible to learning. Therefore, too many practicing trials would increase BM visual experience and influence the results. We determined the number of training trials to be 5 based on the results of the pilot experiment. During this phase, we observed that nearly all children were able to understand the task requirements well after completing 5 practicing trials.

      L 239: Apparently no non-parametric statistics was applied. Maybe it would be good to mention in the Statistics section briefly why this was justified.

      We appreciate your suggestion and have cited two references in the Statistics section (Fagerland et al. 2012, Rochon et al. 2012). Fagerland et al., mentioned that when the sample size increases, the t-test is more robust. According to the central limit theorem, when the sample size is greater than 30, the sampling distribution of the mean can be safely assumed to be normal.

      (http://www2.psychology.uiowa.edu/faculty/mordkoff/GradStats/part%201/I.07%20normal.p df). In fact, we also ran non-parametric statistics for our data and found the results to be robust.

      L 290: 'FIQ' this abbreviation should be defined.

      Regarding the abbreviation ’FIQ’, it stands for the abbreviation of the full-scale intellectual quotient, which was mentioned in Materials and methods section:

      “Scores of the four broad areas constitute the full-scale intellectual quotient (FIQ).”

      L. 290 ff.: These model 'BM-local = age + gender etc ' is a pretty sloppy notation. I think what is meant that a GLM was used that uses the predictors gender etc. time appropriate beta_i values. This formula should be corrected or one just says that a GLM was run with the predictors gender ....

      The same criticism applies to these other models that follow.

      We thank you for pointing this out. We have modified all formulas accordingly in the revised manuscript (see part3 of the Results section).

      All these models assume linearity of the combination of the predictors.was this assumption verified?

      We referred to the previous study of BM perception in children. They found main predictor variables, including IQ (Rutherford et al., 2012; Jones et al., 2011) and age (Annaz et al., 2010; van et al., 2016), have a linear relation with the ability of BM processing.

      L. 296ff.: For model (b) it looks like general BM performance is strongly driven by the predictor global BM performance in the group of patients. Does the same observation also apply to the normals?

      The same phenomenon was not observed in TD children. We have briefly discussed this point in the Discussion section of the revised manuscript (lines 449 - 459).

      Reviewer #2 (Recommendations For The Authors):

      (1) Please add public access to the data repository so data availability can be assessed.

      The data of the study will be available at https://osf.io/37p5s/.

      (2) Although overall, the language was clear and understandable, there are a few parts where language might confuse a reader and lead to misconceptions. For instance, line 52: Did the authors mean to refer to 'emotions and intentions' instead of 'emotions and purposes'? See also examples where rephrasing may help to reflect a statement is speculation rather than fact.

      Thanks for the comments. We have carefully checked the full text and rephrased the confused statements.

      (3) Line 83/84: Autism is not a 'mental disorder' - please change to something like 'developmental disability'. Authors are encouraged to adapt their language according to terms preferred by the community (e.g., see Fig. 5 in this article:

      https://onlinelibrary.wiley.com/doi/10.1002/aur.2864)

      Suggestion well taken. We have changed the wording accordingly:

      “In recent years, BM perception has received significant attention in studies of mental disorders (e.g., schizophrenia30) and developmental disabilities, particularly in ASD, characterized by deficits in social communication and social interaction31,32.” (lines 93 - 95)

      (4) Please report how the sample size for the study was determined.

      In the Materials and methods section (lines 168 - 173), we explained how the sample size was determined.

      Line 94: It would be helpful to have a brief description of what neurophysiological differences have been observed upon BM perception in children with ADHD.

      Thanks for the comment. We have added a brief description of neurophysiological findings in children with ADHD (lines 108 - 111).

      (6) Line 106/107 and 108/109: please add references.

      We have revised this part, and the relevant findings and references are in line with the revised manuscript (lines 77, 132 - 133).

      (7) Line 292: Please add what order the factors were entered into each regression model.

      Regarding this issue, we used SPSS 26 for the main analysis. SPSS utilizes the Type III sum of squares (default) to evaluate models. Regardless of the order in the GLM, we will obtain the same result. For more information, please refer to the documentation of SPSS 26 (https://www.ibm.com/docs/en/spss-statistics/26.0.0?topic=features-glm-univariate-analysis).

      Reviewer #3 (Recommendations For The Authors)

      (1) Task specifics. It is key to understanding the findings, as well as the dissociation between tasks, that the precise nature of the stimuli is clear. I think there is room for improvement in description here. Task 1 is described as involving relocating dots within the range of the intact walker. Of course, PLWs are created by presenting dots at the joints, so relocation can involve either moving to another place on the body, or random movement within the 2D spatial array (which likely involves moving it off the body). Which was done? It is said that Ps must indicate the motion direction, but what was the display of the walker? Sagittal? Task 2 requires detecting whether there is an intact walker amongst scrambled walkers. Were all walkers completely overlaid? Task 3 requires detecting the left v right facing of an intact walker at different orientations, presented amongst noise. So Task 3 requires determining facing direction and Task 1 walking direction. Are these tasks the same but described differently? Or can walkers ever walk backwards? Wrt this point, I also think it would help the reader if example videos were uploaded.

      We appreciate you for bringing this to our attention. With regards to Task 1, it appears that your second speculation is correct. We scrambled the original dots and randomly presented them within the 2D spatial array (which likely involved moving them off the body). As a result, the global configuration of the 13 dots was completed disrupted while preserving the motion trajectory of each individual dot. This led to the display of scrambled dots on the monitor (which does not resemble a human). In practice, these local BM cues contain information about motion direction. In Task 2, the target walkers completely overlaid by a mask that is approximately 1.44 times the size of the intact walker. The task requirements of Task 1 and Task3 are same, which is judging the motion (walking) direction. The difference is that Task 1 displayed a scrambled walker while Task 3 displayed an intact walker within a mask. We have clarified these points and improved our descriptions in Procedure section and created example videos for each task, which we believe will be helpful for the readers to understand each task.

      (2) Gender confound (see above). I think that the authors should present the results without gender as a covariate. Can they separate boys and girls on the plots with different coloured individual datapoints, such that readers can see whether it's actually a gender effect driving the supposed ADHD effect? And show that there are no signs of a gender effect in their TD group?

      This point is similar to the point #1 you mentioned. Please refer to our response to that point above.

      (3) Autism possible confound (see above). I think the authors must report whether any of the ADHD group had an autism diagnosis.

      Please refer to the response for the point #2 your mentioned.

      (4) Conclusions concerning differences between the local and global tasks wrt SRS and age (see above). I believe the authors should add stats demonstrating differences between the correlations to support such claims, as well as demonstrating appropriate scatterplots for SRS-Task 1, SRS-Task 2, SRS-Task 3 and age-Task 1, age-Task2 and age-Task 3 (with separate lines of best fit for the two groups in each).

      Please refer to the response for the point #3 your mentioned.

      (5) Theoretical assumptions (see above). I would suggest rephrasing all claims here to outline that these discussed mechanistic differences between local and global BM processing are only possibilities and not known on the basis of existing data.

      Please refer to the response for the point #4 your mentioned.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I only have a few minor suggestions:

      Abstract: I really liked the conclusion (that IM and VWM are two temporal extremes of the same process) as articulated in lines 557--563. (It is always satisfying when the distinction between two things that seem fundamentally different vanishes). If something like this but shorter could be included in the Abstract, it would highlight the novel aspects of the results a little more, I think.

      Thank you for this comment. We have added the following to the abstract:

      “A key conclusion is that differences in capacity classically thought to distinguish IM and VWM are in fact contingent upon a single resource-limited WM store.”

      L 216: There's an orphan parenthesis in "(justifying the use".

      Fixed.

      L 273: "One surprising result was the observed set size effect in the 0 ms delay condition". In this paragraph, it might be a good idea to remind the reader of the difference between the simultaneous and zero-delay conditions. If I got it right, the results differ between these conditions because it takes some amount of processing time to interpret the cue and free the resources associated with the irrelevant stimuli. Recalling that fact would make this paragraph easier to digest.

      That is correct. However, at this point in the text, we have not yet fitted the DyNR model to the data. Therefore, we believe that introducing cue processing and resource reallocation as concepts that differentiate between those two conditions would disrupt the flow of this paragraph. We address these points soon after, in a paragraph starting on line 341.

      Figures 3, 5: The labels at the bottom of each column in A would be more clear if placed at the top of each column instead. That way, the x-axis for the plots in A could be labeled appropriately, as "Error in orientation estimate" or something to that effect.

      We edited both figures, now Figure 4 and Figure 6, as suggested.

      L 379: It should be "(see Eq 6)", I believe.

      That is correct, line 379 (currently line 391) should read ‘Eq 6’. Fixed.

      L 379--385: I was a bit mystified as to why the scaled diffusion rate produced a worse fit than a constant rate. I imagine the scaled version was set to something like

      sigma^2_diff_scaled = sigma^2_base + K*(N-1)

      where N is the set size and sigma^2_base and K are parameters. If this model produced a similar fit as with a constant diffusion rate, the AIC would penalize it because of the extra parameter. But why would the fit be worse (i.e., not match the pattern of variability)? Shouldn't the fitter just find that the K=0 solution is the best? Not a big deal; the Nelder-Mead solutions can wobble when that many parameters are involved, but if there's a simple explanation it might be worth commenting on.

      The scaled diffusion was implemented by extending Eq 6 in the following way:

      σ(t)2 = (t-toffset) * σ̇ 2diff * N

      where N is set size. Therefore, the scaling was not associated with a free parameter that could become 0 if set size did not affect diffusion rate, but variability rather mandatory increased with set size. We now clarify this in the text:

      “The second variant was identical to the proposed model, except that we replaced the constant diffusion rate with a set size scaled diffusion rate by multiplying the right side of Eq 6 by N.“

      Figure 4 is not mentioned in the main text. Maybe the end of L 398 would be a good place to point to it. The paragraph at L 443-455 would also benefit from a couple of references to it.

      Thank you for this suggestion. Figure 4 (now Figure 5) was previously mentioned on line 449 (previously line 437), but now we have included it on line 410 (previously line 398), within the paragraph spanning lines 455-467 (previously 443-455), and also on line 136 where we first discuss masking effects.

      L 500: Figure S7 is mentioned before Figures S5 and S6. Quite trivial, I know....

      Thank you for this comment. There was no specific reason for Figure S7 to appear after S5 & S6, so we simply swapped their order to be consistent with how they are referred to in the manuscript (i.e., S7 became S5, S5 became S6, and S6 became S7).

      Reviewer #2 (Recommendations For The Authors):

      (1) One potential weakness is that the model assumes sensory information is veridical. However, this isn't likely the case. Acknowledging noise in sensory representations could affect the model interpretation in a couple of different ways. First, neurophysiological recordings have shown normalization affects sensory representations, even when a stimulus is still present on the screen. The DyNR model partially addresses this concern because reports are drawn from working memory, which is normalized. However, if sensory representations were also normalized, then it may improve the model variant where subjects draw directly from sensory representations (an alternative model that is currently described but discarded).

      Thank you for this suggestion. We can consider two potential mechanisms through which divisive normalization might be incorporated into sensory processing within the DyNR model.

      The first possibility involves assuming that normalization is pre-attentive. In this scenario, the sensory activity of each object would be rescaled at the lowest level of sensory processing, occurring before the allocation of attentional or VWM resources. One strong prediction of such an implementation is that recall error in the simultaneous cue condition (Experiment 1) should vary with set size. However, this prediction is inconsistent with the observed data, which failed to show a significant difference between set sizes, and is more closely aligned with the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). On that basis, we anticipate that introducing normalization as a pre-attentive mechanism would impair the model fit.

      An alternative scenario is to consider normalization as post-attentive. In the simultaneous cueing condition, only one item is attended (i.e., the cued one), regardless of the displayed set size. Here, we would expect normalized activity for a single item, regardless of the number of presented objects, which would then be integrated into VWM. This expanded DyNR model with post-attentive normalization would make exactly the same predictions as the proposed DyNR for recall fidelity, so distinguishing between these models would not be possible based on working memory experiments.

      To acknowledge the possibility that sensory signals could undergo divisive normalization and to motivate future research, we have added the following to our manuscript:

      “As well as being implicated in higher cognitive processes including VWM (Buschman et al, 2011; Sprague et al., 2014), divisive normalization has been shown to be widespread in basic sensory processing (Bonin et al., 2005; Busse et al., 2009; Ni et al., 2017). The DyNR model presently incorporates the former but not the latter type of normalization. While the data observed in our experiments do not provide evidence for normalization of sensory signals (note comparable recall errors across set size in the simultaneous cue condition of Experiment 1), this may be because sensory suppressive effects are localized and our stimuli were relatively widely separated in the visual field: future research could explore the consequences of sensory normalization for recall from VWM using, e.g., centre-surround stimuli (Bloem et al., 2018).”

      Bloem, I. M., Watanabe, Y. L., Kibbe, M. M., & Ling, S. (2018). Visual Memories Bypass Normalization. Psychological Science, 29(5), 845–856. https://doi.org/10.1177/0956797617747091

      Bonin, V., Mante, V., & Carandini, M. (2005). The Suppressive Field of Neurons in Lateral Geniculate Nucleus. The Journal of Neuroscience, 25(47), 10844–10856. https://doi.org/10.1523/JNEUROSCI.3562-05.2005

      Buschman, T. J., Siegel, M., Roy, J. E., & Miller, E. K. (2011). Neural substrates of cognitive capacity limitations. Proceedings of the National Academy of Sciences, 108(27), 11252–11255. https://doi.org/10.1073/pnas.1104666108

      Busse, L., Wade, A. R., & Carandini, M. (2009). Representation of Concurrent Stimuli by Population Activity in Visual Cortex. Neuron, 64(6), 931–942. https://doi.org/10.1016/j.neuron.2009.11.004

      Ni, A. M., & Maunsell, J. H. R. (2017). Spatially tuned normalization explains attention modulation variance within neurons. Journal of Neurophysiology, 118(3), 1903–1913. https://doi.org/10.1152/jn.00218.2017

      Sprague, T. C., Ester, E. F., & Serences, J. T. (2014). Reconstructions of Information in Visual Spatial Working Memory Degrade with Memory Load. Current Biology, 24(18), 2174–2180. https://doi.org/10.1016/j.cub.2014.07.066

      Second, visual adaptation predicts sensory information should decrease over time. This would predict that for long stimulus presentation times, the error would increase. Indeed, this seems to be reflected in Figure 5B. This effect is not captured by the DyNR model.

      Indeed, neural responses in the visual cortex have been observed to quickly adapt during stimulus presentation, showing reduced responses to prolonged stimuli after an initial transient (Groen et al., 2022; Sawamura et al., 2006; Zhou et al., 2019). This adaptation typically manifests as 1) reduced activity towards the end of stimulus presentation and 2) a faster decay towards baseline activity after stimulus offset.

      In the DyNR model, we use an idealized solution in which we convolve the presented visual signal with a response function (i.e., temporal filter). At the longest presentation durations, in DyNR, the sensory signal plateaus and remains stable until stimulus offset. Because our psychophysical data does not allow us to identify the exact neural coding scheme that underlies the sensory signal, we tend to favour this simple implementation, which is broadly consistent with some previous attempts to model temporal dynamics in sensory responses (e.g., Carandini and Heeger, 1994). However, we agree with the reviewer that some adaptation of the sensory signal with prolonged presentation would also be consistent with our data.

      We have added the following to the manuscript:

      “In Experiment 2, the longest presentation duration shows an upward trend in error at set sizes 4 and 10. While this falls within the range of measurement error, it is also possible that this is a meaningful pattern arising from visual adaptation of the sensory signal, whereby neural populations reduce their activity after prolonged stimulation. This would mean less residual sensory signal would be available after the cue to supplement VWM activity, predicting a decline in fidelity at higher set sizes. Visual adaptation has previously been successfully accounted for by a type of delayed normalization model in which the sensory signal undergoes a series of linear and nonlinear transformations (Zhou et al., 2019). Such a model could in future be incorporated into DyNR and validated against psychophysical and neural data.”

      Carandini, M., & Heeger, D. J. (1994). Summation and division by neurons in primate visual cortex. Science, 264(5163), 1333–1336. https://doi.org/10.1126/science.8191289

      Groen, I. I. A., Piantoni, G., Montenegro, S., Flinker, A., Devore, S., Devinsky, O., Doyle, W., Dugan, P., Friedman, D., Ramsey, N. F., Petridou, N., & Winawer, J. (2022). Temporal Dynamics of Neural Responses in Human Visual Cortex. The Journal of Neuroscience, 42(40), 7562–7580. https://doi.org/10.1523/JNEUROSCI.1812-21.2022

      Sawamura, H., Orban, G. A., & Vogels, R. (2006). Selectivity of Neuronal Adaptation Does Not Match Response Selectivity: A Single-Cell Study of the fMRI Adaptation Paradigm. Neuron, 49(2), 307–318. https://doi.org/10.1016/j.neuron.2005.11.028

      Zhou, J., Benson, N. C., Kay, K., & Winawer, J. (2019). Predicting neuronal dynamics with a delayed gain control model. PLOS Computational Biology, 15(11), e1007484. https://doi.org/10.1371/journal.pcbi.1007484

      (2) A second potential weakness is that, in Experiment 1, the authors briefly change the sensory stimulus at the end of the delay (a 'phase shift', Fig. 6A). I believe this is intended to act as a mask. However, I would expect that, in the DyNR model, this should be modeled as a new sensory input (in Experiment 2, 50 ms is plenty of time for the subjects to process the stimuli). One might expect this change to disrupt sensory and memory representations in a very characteristic manner. This seems to make a strong testable hypothesis. Did the authors find evidence for interference from the phase shift?

      The phase shift was implemented with the intention of reducing retinal after-effects, essentially acting as a mask for retinal information only; crucially the orientation of the stimulus is unchanged by the phase shift, so from the perspective of the DyNR model, it transmits the same orientation information to working memory as the original stimulus.

      If our objective were to model sensory input at the level of individual neurons and their receptive fields, we would indeed need to treat this phase shift as a novel input. Nevertheless, for DyNR, conceived as an idealization of a biological system for encoding orientation information, we can safely assume that visual areas in biological organisms have a sufficient number of phase-sensitive simple cells and phase-indifferent complex cells to maintain the continuity of input to VWM.

      When comparing conditions with and without the phase shift of stimuli (Fig S1B), we found performance to be comparable in the perceptual condition (simultaneous presentation) and with the longest delay (1 second), suggesting that the phase shift did not change the visibility or encoding of information into VWM. In contrast, we found strong evidence that observers had access to an additional source of information over intermediate delays when the phase shift was not used. This was evident through enhanced recall performance from 0 ms to 400 ms delay. Based on this, we concluded that the additional source of information available in the absence of a phase shift was accessible immediately following stimulus offset and had a brief duration, aligning with the theoretical concept of retinal afterimages.

      (3) It seems odd that the mask does not interrupt sensory processing in Experiment 2. Isn't this the intended purpose of the mask? Should readers interpret this as all masks not being effective in disrupting sensory processing/iconic memory? Or is this specific to the mask used in the experiment?

      Visual masks are often described as instantly and completely halting the visual processing of information that preceded the mask. We also anticipated the mask would entirely terminate sensory processing, but our data indicate the effect was not complete (as indicated by model variants in Experiment 2). Nevertheless, we believe we achieved our intended goal with this experiment – we observed a clear modulation of response errors with changing stimulus duration, indicating that the post-stimulus information that survived masking did not compromise the manipulation of stimulus duration. Moreover, the DyNR model successfully accounted for the portion of signal that survived the mask.

      We can identify two possible reasons why masking was incomplete. First, it is possible that the continuous report measure used in our experiments is more sensitive than the discrete measures (e.g., forced-choice methods) commonly employed in experiments that found masks to be 100% effective. Second, despite using a flickering white noise mask at full contrast, it is possible that it may not have been the most effective mask; for instance, a mask consisting of many randomly oriented Gabor patches matched in spatial frequency to the stimuli could prove more effective. We decided against such a mask because we were concerned that it could potentially act as a new input to orientation-sensitive neurons, rather than just wiping out any residual sensory activity.

      (4) I apologize if I missed it, but the authors did not compare the DyNR model to a model without decaying sensory information for Experiment 1.

      We tested two DyNR variants in which the diffusion process was solely responsible for memory fidelity dynamics. These models assumed that the sensory signal terminates abruptly with stimuli offset, and the VWM signal encoding the stimuli was equal to the limit imposed by normalization, independent of the delay duration.

      As variants of this model failed to account for the observed response errors both quantitatively (see 'Fixed neural signal' under Model variants) and qualitatively (Figure S3), we decided not to test any more restrictive variants, such as the one without sensory decay and diffusion.

      (5) In the current model, selection is considered to be absolute (all or none). However, this need not be the case (previous work argues for graded selection). Could a model where memories are only partially selected, in a manner that is mediated by load, explain the load effects seen in behavior?

      Thank you for this point. If attentional selection was partial, it would affect the observers’ efficiency in discarding uncued objects to release allocated resources and encode additional information about the cued item. We and others have previously examined whether humans can efficiently update their VWM when previous items become obsolete. For example, Taylor et al. (2023) showed that observers could efficiently remove uncued items from VWM and reallocate the released resources to new visual information. These findings align with results from other studies (e.g., Ecker, Oberauer, & Lewandowsky, 2014; Kessler & Meiran, 2006; Williams et al., 2013).

      Based on these findings, we feel justified in assuming that observers in our current task were capable of fully removing all uncued objects, allowing them to continue the encoding process for the cued orientation that was already partially stored in VWM, such that the attainable limit on representational precision for the cued item equals the maximum precision of VWM.

      Partial removal could in principle be modelled in the DyNR model by introducing an additional plateau parameter specifying a maximum attainable precision after the cue. Our concern would be that such a plateau parameter would trade off with the parameter associated with Hick’s law (i.e., cue interpretation time). The former would control the amount of information that can be encoded into VWM, while the latter regulates the amount of sensory information available for encoding. We are wary of adding additional parameters, and hence flexibility, to the model where we do not have the data to sufficiently constrain them.

      Ecker, U. K. H., Oberauer, K., & Lewandowsky, S. (2014b). Working memory updating involves item-specific removal. Journal of Memory and Language, 74, 1–15. https://doi.org/10.1016/j.jml. 2014.03.006

      Kessler, Y., & Meiran, N. (2006). All updateable objects in working memory are updated whenever any of them are modified: Evidence from the memory updating paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 570–585. https://doi.org/10.1037/0278-7393.32.3.570

      Taylor, R., Tomić, I., Aagten-Murphy, D., & Bays, P. M. (2023). Working memory is updated by reallocation of resources from obsolete to new items. Attention, Perception, & Psychophysics, 85(5), 1437–1451. https://doi.org/10.3758/s13414-022-02584-2

      Williams, M., & Woodman, G. F. (2012). Directed forgetting and directed remembering in visual working memory. Journal of Experimental Psychology. Learning, Memory, and Cognition, 38(5), 1206–1220. https://doi.org/10.1037/a0027389

      (6) Previous work, both from the authors and others, has shown that memories are biased as if they are acted on by attractive/repulsive forces. For example, the memory of an oriented bar is biased away from horizontal and vertical and biased towards diagonals. This is not accounted for in the current model. In particular, this could be one mechanism to generate a non-uniform drift rate over time. As noted in the paper, a non-uniform drift rate could capture many of the behavioral effects reported.

      The reviewer is correct that the model does not currently include stimulus-specific effects, although our work on that topic provides a clear template for incorporating them in future (e.g. Taylor & Bays, 2018). Specifically on the question of generating a non-uniform drift, we have another project that currently looks at this exact question (cited in our manuscript as Tomic, Girones, Lengyel, and Bays; in prep.). By examining various datasets with varying memory delays, including the Additional Dataset 1 reported in the Supplementary Information, we found that stimulus-specific effects on orientation recall remain constant with retention time. Specifically, although there is a clear increase in overall error over time, estimation biases remain constant in direction and amplitude, indicating that the bias does not manifest in drift rates (see also Rademaker et al., 2018; Figure S1).

      Taylor, R., & Bays, P. M. (2018). Efficient coding in visual working memory accounts for stimulus-specific variations in recall. The Journal of Neuroscience, 1018–18. https://doi.org/10.1523/JNEUROSCI.1018-18.2018

      Rademaker, R. L., Park, Y. E., Sack, A. T., & Tong, F. (2018). Evidence of gradual loss of precision for simple features and complex objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance. https://doi.org/10.1037/xhp0000491

      (7) Finally, the authors use AIC to compare many different model variants to the DyNR model. The delta-AICs are high (>10), indicating a strong preference for the DyNR model over the variants. However, the overall quality of fit to the data is not clear. What proportion of the variance in data was the model able to explain? In particular, I think it would be helpful for the reader if the authors reported the variance explained on withheld data (trials, conditions, or subjects).

      Thank you for this comment.

      Below we report the estimates of r2, representing the goodness of fit between observed data (i.e., RMSE) and the DyNR model predictions.

      In Experiment 1, the r2 values between observations and predictions were computed across delays for each set size, yielding the following estimates: r2ss1 = 0.60; r2ss4 = 0.87; r2ss10 = 0.95. Note that lower explained variance for set size 1 arises from both data and model predictions having near-constant precision.

      In Experiment 2, we calculated r2 between observations and predictions across presentation durations, separately for each set size, resulting in the following estimates: r2ss1 = 0.88; r2ss4 = 0.71; r2ss10 = 0.70. Note that in this case the decreasing percentage of explained variance with set size is a consequence of having less variability in both data and model predictions with larger set sizes.

      While these estimates suggest that the DyNR model effectively fits the psychophysical data, a more rigorous validation approach would involve cross-validation checks across all conditions with a withheld portion of trials. Regrettably, due to the large number of conditions in each experiment, we could only collect 50 trials per condition. We are sceptical that fitting the model to even fewer trials, as necessary for cross-validation, would provide a reliable assessment of model performance.

      Minor: It isn't clear to me why the behavioral tasks are shown in Figure 6. They are important for understanding the results and are discussed earlier in the manuscript (before Figure 3). This just required flipping back and forth to understand the task before I could interpret the results.

      Thank you for this comment. We have now moved the behavioural task figure to appear early in the manuscript (as Figure 3).

      Reviewer #3 (Recommendations For The Authors):

      (1) Dynamics of sensory signals during perception

      I believe that the modeled sensory signal is a reasonable simplification and different ways to model the decay function are discussed. I would like to ask the authors to discuss the implications of slightly more complex initial sensory transients such as the ones shown in Teeuwen (2021). Specifically for short exposure times, this might be particularly relevant for the model fits as some of the alternative models diverge from the data for short exposures. In addition, the role of feedforward (initial transient?) and feedback signaling (subsequent "plateau" activity) could be discussed. The first one might relate more strongly to sensory signals whereas the latter relates more to top-down attention/recurrent processing/VWM.

      Particularly, this latter response might also be sensitive to the number of items present on the screen which leads to a related question pertaining to the limitations of attention during perception. Some work suggests that perception is similarly limited in the amount of information that can be represented concurrently (Tsubomi, 2013). Could the authors discuss the implications of this hypothesis? What happens if maximum sensory amplitude is set as a free parameter in the model?

      Tsubomi, H., Fukuda, K., Watanabe, K., & Vogel, E. K. (2013). Neural limits to representing objects still within view. Journal of Neuroscience, 33(19), 8257-8263.

      Thank you for this question. Below, we unpack it and answer it point by point.

      While we agree our model of the sensory response is justified as an idealization of the biological reality, we also recognise that recent electrophysiological recordings have illuminated intricacies of neuronal responses within the striate cortex, a critical neural region associated with sensory memory (Teeuwen et al, 2021). Notably, these recordings reveal a more nuanced pattern where neurons exhibit an initial burst of activity succeeded by a lower plateau in firing rate, and stimulus offset elicits a second small burst in the response of some neurons, followed by a gradual decrease in activity after the stimulus disappears (Teeuwen et al, 2021).

      In general, asynchronous bursts of activity in individual neurons will tend to average out in the population making little difference to predictions of the DyNR model. Synchronized bursts at stimulus onset could affect predictions for the shortest presentations in Exp 2, however the model appears to capture the data very well without including them. We would be wary of incorporating these phenomena into the model without more clarity on their universality (e.g., how stimulus-dependent they are), their significance at the population level (as opposed to individual neurons), and most importantly, their prominence in visual areas outside striate cortex. Specifically, while Teeuwen et al. (2021) described activity in V1, our model does not make strong assumptions about which visual areas are the source of the sensory input to WM. Based on these uncertainties we believe the idealized sensory response is justified for use in our model.

      Next, thank you for the comment on feedforward and feedback signals. We have added the following to our manuscript:

      “Following onset of a stimulus, the visual signal ascends through visual areas via a cascade of feedforward connections. This feedforward sweep conveys sensory information that persists during stimulus presentation and briefly after it disappears (Lamme et al., 1998). Simultaneously, reciprocal feedback connections carry higher-order information back towards antecedent cortical areas (Lamme and Roelfsema, 2000). In our psychophysical task, feedback connections likely play a critical role in orienting attention towards the cued item, facilitating the extraction of persisting sensory signals, and potentially signalling continuous information on the available resources for VWM encoding. While our computational study does not address the nature of these feedforward and feedback signals, a challenge for future research is to describe the relative contributions of these signals in mediating transmission of information between sensory and working memory (Semedo et al., 2022).”

      Lamme, V. A., Supèr, H., & Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing in the visual cortex. Current Opinion in Neurobiology, 8(4), 529–535. https://doi.org/10.1016/S0959-4388(98)80042-1

      Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neurosciences, 23(11), 571–579. https://doi.org/10.1016/S0166-2236(00)01657-X

      Semedo, J. D., Jasper, A. I., Zandvakili, A., Krishna, A., Aschner, A., Machens, C. K., Kohn, A., & Yu, B. M. (2022). Feedforward and feedback interactions between visual cortical areas use different population activity patterns. Nature Communications, 13(1), 1099. https://doi.org/10.1038/s41467-022-28552-w

      Finally, both you and Reviewer 2 raised a similar interesting question regarding capacity limitations of attention during perception Such a limitation could be modelled by freely estimating sensory amplitude and implementing divisive normalization to that signal, similar to how VWM is constrained. We can consider two potential mechanisms through which divisive normalization might be incorporated into sensory processing within the DyNR model.

      The first possibility involves assuming that normalization is pre-attentive. In this scenario, the sensory activity of each object would be rescaled at the lowest level of sensory processing, occurring before the allocation of attentional or VWM resources. One strong prediction of such an implementation is that recall error in the simultaneous cue condition (Experiment 1) should vary with set size. However, this prediction is inconsistent with the observed data, which failed to show a significant difference between set sizes, and is more closely aligned with the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). On that basis, we anticipate that introducing normalization as a pre-attentive mechanism would impair the model fit.

      An alternative scenario is to consider normalization as post-attentive. In the simultaneous cueing condition, only one item is attended (i.e., the cued one), regardless of the displayed set size. Here, we would expect normalized activity for a single item, regardless of the number of presented objects, which would then be integrated into VWM. This expanded DyNR model with post-attentive normalization would make exactly the same predictions as the proposed DyNR for recall fidelity, so distinguishing between these models would not be possible based on working memory experiments.

      To acknowledge the possibility that sensory signals could undergo divisive normalization and to motivate future research, we have added the following to our manuscript:

      “As well as being implicated in higher cognitive processes including VWM (Buschman et al, 2011; Sprague et al., 2014), divisive normalization has been shown to be widespread in basic sensory processing (Bonin et al., 2005; Busse et al., 2009; Ni et al., 2017). The DyNR model presently incorporates the former but not the latter type of normalization. While the data observed in our experiments do not provide evidence for normalization of sensory signals (note comparable recall errors across set size in the simultaneous cue condition of Experiment 1), this may be because sensory suppressive effects are localized and our stimuli were relatively widely separated in the visual field: future research could explore the consequences of sensory normalization for recall from VWM using, e.g., centre-surround stimuli (Bloem et al., 2018).”

      Bloem, I. M., Watanabe, Y. L., Kibbe, M. M., & Ling, S. (2018). Visual Memories Bypass Normalization. Psychological Science, 29(5), 845–856. https://doi.org/10.1177/0956797617747091

      Bonin, V., Mante, V., & Carandini, M. (2005). The Suppressive Field of Neurons in Lateral Geniculate Nucleus. The Journal of Neuroscience, 25(47), 10844–10856. https://doi.org/10.1523/JNEUROSCI.3562-05.2005

      Buschman, T. J., Siegel, M., Roy, J. E., & Miller, E. K. (2011). Neural substrates of cognitive capacity limitations. Proceedings of the National Academy of Sciences, 108(27), 11252–11255. https://doi.org/10.1073/pnas.1104666108

      Busse, L., Wade, A. R., & Carandini, M. (2009). Representation of Concurrent Stimuli by Population Activity in Visual Cortex. Neuron, 64(6), 931–942. https://doi.org/10.1016/j.neuron.2009.11.004

      Ni, A. M., & Maunsell, J. H. R. (2017). Spatially tuned normalization explains attention modulation variance within neurons. Journal of Neurophysiology, 118(3), 1903–1913. https://doi.org/10.1152/jn.00218.2017

      Sprague, T. C., Ester, E. F., & Serences, J. T. (2014). Reconstructions of Information in Visual Spatial Working Memory Degrade with Memory Load. Current Biology, 24(18), 2174–2180. https://doi.org/10.1016/j.cub.2014.07.066

      (2) Effectivity of retro-cues at long delays

      Can the authors discuss how cues presented at long delays (>1000 ms) can still lead to increased memory fidelity when sensory signals are likely to have decayed? A list of experimental work demonstrating this can be found in Souza & Oberauer (2016).

      Souza, A. S., & Oberauer, K. (2016). In search of the focus of attention in working memory: 13 years of the retro-cue effect. Attention, Perception, & Psychophysics, 78, 1839-1860.

      The increased memory fidelity observed with longer delays between memory array offset and cue does not result from integrating available sensory signals into VWM because the sensory signal would have completely decayed by that time. Instead, research so far has indicated several alternative mechanisms that could lead to higher recall precision for cued items, and we can briefly summarize some of them, which are also reviewed in more detail in Souza and Oberauer (2016).

      One possibility is that, after a highly predictive retro-cue indicates the to-be-tested item, uncued items can simply be removed from VWM. This could result in decreased interference for the cued item, and consequently higher recall precision. Secondly, the retro-cue could also indicate which item can be selectively attended to, and thereby differentially strengthening it in memory. Furthermore, the retro-cue could allow evidence to accumulate for the target item ahead of decision-making, and this could increase the probability that the correct information will be selected for response. Finally, the retro-cued stimulus could be insulated from interference by subsequent visual input, while the uncued stimuli may remain prone to such interference.

      A neural account of this retro-cue effect based on the original neural resource model has been proposed in Bays & Taylor, Cog Psych, 2018. However, as we did not use a retro-cue design in the present experiments, we have decided not to elaborate on this in the manuscript.

      (3) Swap errors

      I am somewhat surprised by the empirically observed and predicted pattern of swap errors displayed in Figure S2. For set size 10, swap probability does not consistently increase with the duration of the retention interval, although this was predicted by the author's model. At long intervals, swap probability is significantly higher for large compared to small set sizes, which also seems to contrast with the idea of shared, limited VWM resources. Can the authors provide some insight into why the model fails to reproduce part of the behavioral pattern for swap errors? The sentence in line 602 might also need some reconsideration in this regard.

      Determining the ground truth for swap errors poses a challenge. The prevailing approach has been to employ a simpler model that estimates swap errors, such as a three-component mixture model, and use those estimates as a proxy for ground truth. However, this method is not without its shortcomings. For example, the variability of swap frequency estimates tends to increase with variability in the report feature dimension (here, orientation). This is due to the increasing overlap of response probability distributions for swap and non-swap responses. Consequently, the discrepancy between any two methods of swap estimation is most noticeable when there is substantial variability in orientation reports (e.g., 10 items and long delay or short exposure).

      When modelling swap frequency in the DyNR model, our aim was to provide a parsimonious account of swap errors while implementing similar dynamics in the spatial (cue) feature as in the orientation (report) feature. This parametric description captured the overall pattern of swap frequency with set size and retention and encoding time, but is still only an approximation of the predictions if we fully modelled memory for the conjunction of cue and report features (as in e.g. Schneegans & Bays, 2017; McMaster et al, 2020).

      We expanded the existing text in the section ‘Representational dynamics of cue-dimension features’ of our manuscript:

      “… Although we did not explicitly model the neural signals representing location, the modelled dynamics in the probability of swap errors were consistent with those of the primary memory feature. We provided a more detailed neural account of swap errors in our earlier works that is theoretically compatible with the DyNR model (McMaster et al., 2020; Schneegans & Bays, 2017).

      The DyNR model successfully captured the observed pattern of swap frequencies (intrusion errors). The only notable discrepancy between DyNR and the three-component mixture model (Fig. S2) arises with the largest set size and longest delay, although with considerable interindividual variability. As the variability in report-dimension increases, the estimates of swap frequency become more variable due to the growing overlap between the probability distributions of swap and non-swap responses. This may explain apparent deviations from the modelled swap frequencies with the highest set size and longest delay where orientation response variability was greatest. “

      McMaster, J. M. V., Tomić, I., Schneegans, S., & Bays, P. M. (2022). Swap errors in visual working memory are fully explained by cue-feature variability. Cognitive Psychology, 137, 101493. https://doi.org/10.1016/j.cogpsych.2022.101493

      Schneegans, S., & Bays, P. M. (2017). Neural Architecture for Feature Binding in Visual Working Memory. The Journal of Neuroscience, 37(14), 3913–3925. https://doi.org/10.1523/JNEUROSCI.3493-16.2017

      (4) Direct sensory readout

      The model assumes that readout from sensory memory and from VWM happens with identical efficiency. Currently, we don't know if these two systems are highly overlapping or are fundamentally different in terms of architecture and computation. In the case of the latter, it might be less reasonable to assume that information readout would happen at similar efficiencies, as it is currently assumed in the manuscript. Perhaps the authors could briefly discuss this possibility.

      In the direct sensory read-out model, we did not explicitly model the efficiency of readout from either sensory or VWM store. However, the distinctive prediction of this model is that the precision of recall changes exponentially with delay at every set size, including one item. This prediction does not depend on the relative efficiency of readout from sensory and working memory, but only on the principle that direct readout from sensory memory bypasses the capacity limit on working memory. This prediction is inconsistent with the pattern of results observed in Experiment 1, where early cues did not show a beneficial effect on recall error for set size 1. While the proposal raised by the reviewer is intriguing, even if we were to model the process of readout from both the sensory and VWM stores with different efficiencies, the direct read-out model could not account for the near-constant recall error with delay for set size one.

      (5) Encoding of distractors

      One of the model assumptions is that, for simultaneous presentations of memory array and cue only the cued feature will be encoded. Previous work has suggested that participants often accidentally encode distractors even when they are cued before memory array onset (Vogel 2005). Given these findings, how reasonable is this assumption in the authors' model?

      Vogel, E. K., McCollough, A. W., & Machizawa, M. G. (2005). Neural measures reveal individual differences in controlling access to working memory. Nature, 438(7067), 500-503.

      Although previous research suggested that observers can misinterpret the pre-cue and encode one of the uncued items, our results argue against this being the case in the current experiment. Such encoding failures would manifest in overall recall error, resulting in a gradient of error with set size, owing to the presence of more adjacent distractors in larger set sizes. However, when we compared recall errors between set sizes in the simultaneous cue condition, we did not find a significant difference between set sizes, and moreover, our results were more likely under the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). If observers occasionally encoded and reported one of the uncued items in the simultaneous cue condition, those errors were extremely infrequent and did not affect the overall error distributions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al., investigated the relationship between monocular and binocular responses of V1 superficial-layer neurons using two-photon calcium imaging. They found a strong relationship in their data: neurons that exhibited a greater preference for one eye or the other (high ocular dominance) were more likely to be suppressed under binocular stimulation, whereas neurons that are more equivalently driven by each other (low ocular dominance) were more likely to be enhanced by binocular stimulation. This result chiefly demonstrates the relationship between ocular dominance and binocular responses in V1, corroborating what has been shown previously using electrophysiological techniques but now with greater spatial resolution (albeit less temporal resolution). The binocular responses were well-fitted by a model that institutes divisive normalization between the eyes that accounts for both the suppression and enhancement phenomena observed in the subpopulation of binocular neurons. In so doing, the authors reify the importance of incorporating ocular dominance in computational models of binocular combination.

      The conclusions of this paper are mostly well supported by the data, but there are some limitations of the methodology that need to be clarified, and an expansion of how the results relate to previous work would better contextualize these important findings in the literature.

      Strengths:

      The two-photon imaging technique used to resolve the activity of individual neurons within intact brain tissue grants a host of advantages. Foremost, two-photon imaging confers considerably high spatial resolution. As a result, the authors were able to sample and analyze the activity from thousands of verified superficial-layer V1 neurons. The animal model used, awake macaques, is also highly relevant for the study of binocular combination. Macaques, like humans, are binocular animals, meaning they have forward-facing eyes that confer overlapping visual fields. Importantly, macaque V1 is organized into cortical columns that process specific visual features from the separate eyes just like in humans. In combination with a powerful imaging technique, this allowed the authors to evaluate the monocular and binocular response profiles of V1 neurons that are situated within neighboring ocular dominance columns, a novel feat. To this aim, the approach was well-executed and should instill further confidence in the notion that V1 neurons combine monocular information in a manner that is dependent on the strength of their ocular dominance.

      Weaknesses:

      While two-photon imaging provides excellent spatial resolution, its temporal resolution is often lower compared to some other techniques, such as electrophysiology. This limits the ability to study the fast dynamics of neuronal activity, a well-understood trade-off of the method. The issue is more so that the authors draw comparisons to electrophysiological studies without explicit appreciation of the temporal difference between these techniques. In a similar vein, two-photon imaging is limited spatially in terms of cortical depth, preferentially sampling from neurons in layers 2/3. This limitation does not invalidate any of the interpretations but should be considered by readers, especially when making comparisons to previous electrophysiological reports using microelectrode linear arrays that sample from all cortical layers. Indeed, it is likely that a complete picture of early cortical binocular processing will require high spatial resolution (i.e., sampling from neurons in neighboring ocular dominance columns, from pia mater to white matter) at the biophysically relevant timescales (1ms resolution, capturing response dynamics over the full duration of the stimulus presentation, including the transient onset and steady-state periods).

      To address the same concern from all three reviewers, we discussed the technical limitations of two photon calcium imaging at the end of Discussion, including limited imaging depth, low temporal resolution, and nonlinearity. The relevant texts are copied here:

      (Ln 304) “Limitations of the current study

      Although capable of sampling a large number of neurons at cellular resolution and with low sampling bias, two-photon calcium imaging has its known limitations that may better make it a complementary research tool to electrophysiological recordings.

      For example, two-photon imaging can only sample neurons from superficial-layers, while binocular neurons also exist in deeper layers, and even neurons in the input layer are affected by feedback from downstream binocular neurons to exhibit binocular response properties (Dougherty, Cox, Westerberg, & Maier, 2019). Furthermore, calcium signals are relatively slow and cannot reveal the fast dynamics of neuronal responses. Due to these spatial and temporal limitations, a more complete picture of the neuronal mechanisms underlying binocular combination of monocular responses may come from studies using both technologies.

      In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although calcium signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rates within a range of 10-150 Hz (Li, Liu, Jiang, Lee, & Tang, 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the differences in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      (Recommendations For The Authors):

      Overall, my main suggestion for the authors to improve the paper is to revise some of the interpretations of their results in relation to previous research. The purpose of the present study was to illustrate a more complete picture of the binocular combination of monocular responses by taking into consideration the ocular dominance of V1 cells (lines 34-36). A study published earlier this year had an identical purpose (Mitchell et al., Current Biology, 2023) and arrived at a highly similar conclusion (and also applied divisive normalization to fit their data). I would ask that this paper be mentioned in the introduction and discussed.

      The Mitchell et al 2023 paper is added to the Introduction and Discussion:

      (Ln 50) “In addition (to the Dougherty et al 2019 paper from the same group), Mitchell, Carlson, Westerberg, Cox, and Maier (2023) reported that binocular combination of monocular stimuli with different contrasts is also affected by neurons’ eye preference.”

      (Ln 286) “The critical roles of ocular dominance have been largely overlooked by extant binocular vision models to our knowledge, except that Anderson and Movshon (1989) demonstrated that a model consisting of multiple ocular dominance channels can better explain their psychophysical adaptation data, and that Mitchell et al. (2023) revealed that binocular combination of different contrasts presented to different eyes are affected by neurons’ ocularity preference.”

      Nevertheless, the results of the present study are very valuable. They add substantial spatial resolution and sophisticated relational analysis of monocular and binocular responses that Mitchell et al., 2023 did not include. Therefore, my suggestion is to emphasize the advantages of two-photon imaging in the introduction, focusing on the ability to image neurons in neighboring ocular dominance columns. The rigorous modeling of the relationship between nearby neurons with a range of eye preferences, in tandem with the incredible yield of two-photon imaging, is what sets this paper apart from previous electrophysiological work.

      The finding that binocular responses were dependent on ocular dominance is largely consistent with previous electrophysiological results. However, there should be a paragraph in the discussion section that speaks to the limitations of comparing two-photon imaging data to electrophysiological data. Namely, there are two limitations:

      (1) These two techniques confer different temporal resolutions. It is conceivable that some of the electrophysiology relationships (for example, described by Dougherty et al., 2019) may be dependent on the temporal window over which the data was averaged, typically over 50-100ms around stimulus onset, or 100-250ms comprising the neurons' sustained response to the stimulus. This possible explanation of the difference in obtained results would be especially useful for the discussion paragraph starting at line 232. It would also be helpful to readers for there to be some mention of the advantage of having high temporal resolution (i.e., the benefits of electrophysiology) since (a) recent work has distinguished between sequential stages of binocular combination (Cox et al., 2019) and (b) modern models of V1 neurons emphasize recurrent feedback to explain V1 temporal dynamics (see Heeger et al., 2019; Rubin et al., 2015), which could prove to be relevant for combination of stimuli in the two eyes (Fleet et al., 1997).

      Our discussion regarding the technical limitations of 2-p calcium imaging has been listed earlier. Specific to the Dougherty et 2019 paper, we added the following discussion to address the issue of temporal resolution difference between two technologies.

      (Ln 266) “In addition, it is unclear whether the discrepancies are caused by different temporal resolutions of electrode recording and calcium imaging. The results of Dougherty et al. (2019) represent changes of neuronal spike activities over a period of approximately 50-200 ms after the stimulus onset, which may reflect the sustained neuronal responses to the stimulus and possible feedback signals. Calcium signals are much slower and indicative of the aggregated neuronal responses over a longer period (up to 1000 ms in the current study). They should have smeared, rather than exaggerated, the differences between monocular and binocular responses, although we cannot exclude the possibility that some neuronal response changes beyond 200 ms are responsible for the discrepancies.”

      (2) The sample of V1 neurons in this study is limited to cells in the most superficial layers of the cortex (layers 2/3). This limitation is, of course, well understood, but it should be mentioned at least in the context of studying the formative mechanisms of binocular combination in V1 (since we know that binocular neurons also exist in layers 5/6, and there is now substantial evidence that even layer 4 neurons are not as "monocular" as we previously thought (Dougherty et al., 2019)).

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      In short, I believe the paper would be improved by (1) adding the above citations in the appropriate places, (2) acknowledging in the introduction that this question has been investigated electrophysiologically but emphasizing the advantages of two-photon imaging, and (3) adding a paragraph to the discussion section that discusses the temporal and spatial limitations when using two-photon imaging to study binocular combination, particularly when comparing the results to electrophysiology.

      Reviewer #2 (Public Review):

      Summary:

      This study examines the pattern of responses produced by the combination of left-eye and right-eye signals in V1. For this, they used calcium imaging of neurons in V1 of awake, fixating monkeys. They take advantage of calcium imaging, which yields large populations of neurons in each field of view. With their data set, they observe how response magnitude relates to ocular dominance across the entire population. They analyze carefully how the relationship changed as the visual stimulus switched from contra-eye only, ipsi-eye only, and binocular. As expected, the contra-eye-dominated neurons responded strongly with a contra-eye-only stimulus. The ipsi-eye-dominated neurons responded strongly with an ipsi-eye-only stimulus. The surprise was responses to a binocular stimulus. The responses were similarly weak across the entire population, regardless of each neuron's ocular dominance. They conclude that this pattern of responses could be explained by interocular divisive normalization, followed by binocular summation.

      Strengths:

      A major strength of this work is that the model-fitting was done on a large population of simultaneously recorded neurons. This approach is an advancement over previous work, which did model-fitting on individual neurons. The fitted model in the manuscript represents the pattern observed across the large population in V1, and washes out any particular property of individual neurons. Given the large neuronal population from which the conclusion was drawn, the authors provide solid evidence supporting their conclusion. They also observed consistency across 5 fields of view.

      The experiments were designed and executed appropriately to test their hypothesis. Their data support their conclusion.

      Weaknesses:

      One weakness of their study is that calcium signals can exaggerate the nonlinear properties of neurons. Calcium imaging renders poor responses poorer and strong responses stronger, compared to single-unit recording. In particular, the dramatic change in the population response between monocular stimulation and binocular stimulation could actually be less pronounced when measured with single-unit recording methods. This means their choice of recording method could have accidentally exaggerated the evidence of their finding.

      We discussed the nonlinearity of calcium signals as part of the technical limitations of 2-p imaging calcium. The calcium indicator we use, GCaMP5, has a reasonable range of linear relationship with spike rates. But out of this range, the nonlinearity is indeed a concern.

      (Ln 314) “In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rate within a range of 10-150 Hz (Li et al., 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the changes in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      The implication of their finding is that strong ocular dominance is the result of release from interocular suppression by a monocular stimulus, rather than the lack of binocular combination as many traditional studies have assumed. This could significantly advance our understanding of the binocular combination circuitry of V1. The entire population of neurons could be part of a binocular combination circuitry present in V1.

      This is a very good insight. We added the following sentences to the end of the first paragraph of Discussion:

      (Ln 242) “These findings implicate that at least for neurons in superficial layers of V1, significant ocular dominance may result from a release of interocular suppression during monocular stimulation, an unusual viewing condition as our vision is typically binocular, rather than a lack of binocular combination of inputs from upstream monocular neurons.”

      (Recommendations For The Authors):

      Line 150: "To model interocular response suppression, responses from each eye in Eq. 2 were further normalized by an interocular suppression factor wib or wcb," I recommend the authors improve their explanation of how they arrived at Eq. 3 from Eq. 2. As it stands, my impression is that they have one model for the responses to monocular stimulation, and another model for the responses to binocular stimulation. What I think is missing is that both equations are derived from the same model. Monocular stimulation is a situation in which the stimulus in one eye's contrast is zero. Could the authors clarify whether this situation produces an interocular suppression of zero, and how that leads to Eq. 2?

      We rewrote the modeling part to show that Equations 1-3 are sequential steps of development for the same model. We also added a brief paragraph to discuss how Eq. 3 could lead to Eq. 2 under monocular viewing:

      (Ln 166) “Although not shown in Eq. 3, we also assumed that the nonlinear exponent b also depends on the contrast of the stimulus presented to the other eye (i.e., Sc or Si). Consequently, when Sc or Si = 0 under monocular stimulation, Rc or Ri = 0 (Eq. 1), and interocular suppression wib or wcb = 1, so Eq. 3 changes back to Eq. 2. It is only when Sc and Si are equal and close to 1, as in the current study, that interocular suppression and binocular combination would be in the current Eq. 3 format.”

      Line 225: "However, individually, compared to monocular responses, responses of monocular neurons more preferring the stimulated eye are actually suppressed, and only responses of binocular neurons are increased by binocular stimulation." This sentence is difficult to follow. I recommend the authors improve clarity by breaking up the sentence into several sentences. If I understand correctly, they summarize the pattern in the data that is indicative of interocular divisive normalization, i.e., their final conclusion.

      This sentence no longer exists in the Discussion.

      Line 426: "Third, for those showing significant orientation difference, the trial-based orientation responses of each neuron were fitted with a Gaussian model with a MATLAB nonlinear least squares function:" The choice of using a Gaussian function to fit orientation tuning was probably suboptimal. A Gaussian function provides an adequate fit only for neurons whose tuning is very sharp. The responses outside of the peak fall down to the baseline and the two ends meet. Otherwise, the two ends do not meet. An adequate fit would be achieved with a function of a circular variable, which wraps around 180 deg. I recommend using a Von Mises function for fitting orientation tuning.

      We agree with the reviewer that the Von Mises function is more accurate than Gaussian for fitting orientation tuning functions. Indeed we are using it to fit orientation tuning of V4 neurons, many of which have two peaks. For the current V1 data, the differences between Von Mises and Gaussian fittings are very small, as shown in the orientation functional maps from three macaques below. Because we also use the same Gaussian fitting of orientation tuning in several published and current under-review papers, we prefer to keep the Gaussian fitting results in the manuscript.

      Author response image 1.

      Reviewer #3 (Public Review):

      The authors have made simultaneous recordings of the responses of large numbers of neurons from the primary visual cortex using optical two-photon imaging of calcium signals from the superficial layers of the cortex. Recordings were made to compare the responses of the cortical neurons under normal binocular viewing of a flat screen with both eyes open and monocular viewing of the same screen with one eye's view blocked by a translucent filter. The screen displayed visual stimuli comprising small contrast patches of Gabor function distributions of luminance, a stimulus that is known to excite cortical neurons.

      This is an important data set, given the large numbers of neurons recorded. The authors present a simple model to explain the binocular combination of neuronal signals from the right and left eyes.

      The limitations of the paper as written are as follows. These points can be addressed with some additional analysis and rewriting of sections of the paper. No new experimental data need to be collected.

      (1) The authors should acknowledge the fact that these recordings arise from neurons in the superficial layers of the cortex. This limitation arises from the usual constraints on optical imaging in the macaque cortex. This means that the sample of neurons forming this data set is not fully representative of the population of binocular neurons within the visual cortex. This limitation is important in comparing the outcome of these experiments with the results from other studies of binocular combination, which have used single-electrode recording. Electrode recording will result in a sample of neurons that is drawn from many layers of the cortex, rather than just the superficial layers.

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      (2) Single-neuron recording of binocular neurons in the primary visual cortex has shown that these neurons often have some spontaneous activity. Assessment of this spontaneous level of firing is important for accurate model fitting [1]. The paper here should discuss the level of spontaneous neuronal firing and its potential significance.

      We have noticed previously that at non-optimal spatial frequencies, calcium responses to a moving Gabor grating are close to zero (Guan et al., Prog Neurobiology, 2021, Fig. 1B), but we cannot tell whether this is due to calcium response nonlinearity, or a close-to-zero level of spontaneous neuronal activity. Prince et al (2002) reported low spontaneous responses of V1 neurons with moving grating stimuli (e.g., about 3 spikes/sec in one exemplar neuron, their Fig. 1B), so this appears not a big effect. In our data fitting, we do have an orientation-unspecific component in the Gaussian model, which represents the neuronal response at a non-preferred orientation, but not necessarily the spontaneous activity.

      (3) The arrangements for visual stimulation and comparison of binocular and monocular responses mean that the stereoscopic disparity of the binocular stimuli is always at zero or close to zero. The animal's fixation point is in the centre of a single display that is viewed binocularly. The fixation point is, by definition, at zero disparity. The other points on the flat display are also at zero disparity or very close to zero because they lie in the same depth plane. There will be some small deviations from exactly zero because the geometry of the viewing arrangements results in the extremities of the display being at a slightly different distance than the centre. Therefore, the visual stimulation used to test the binocular condition is always at zero disparity, with a slight deviation from zero at the edges of the display, and never changes. [There is a detail that can be ignored. The experimenters tested neurons with visual stimulation at different real distances from the eyes, but this is not relevant here. Provided the animals accurately converged their eyes on the provided binocular fixation point, then the disparity of the visual stimuli will always be at or close to zero, regardless of viewing distance in these circumstances.] However, we already know from earlier work that neurons in the visual cortex exhibit a range of selectivity for binocular disparity. Some neurons have their peak response at non-zero disparities, representing binocular depths nearer than the fixation depth or beyond it. The response of other neurons is maximally suppressed by disparities at the depth of the fixation point (so-called Tuned Inhibitory [TI] neurons). The simple model and analysis presented in the paper for the summation of monocular responses to predict binocular responses will perform adequately for neurons that are tuned to zero disparity, so-called tuned excitatory neurons [TE], but is necessarily compromised when applied to neurons that have other, different tuning profiles. Specifically, when neurons are stimulated binocularly with a non-preferred disparity, the binocular response may be lower than the monocular response[2, 3]. This more realistic view of binocular responses needs to be considered by the authors and integrated into their modelling.

      We agree and include the following texts when discussing the future work:

      (Ln 298) “In addition, in our experiments, binocular stimuli were presented with zero disparity, which best triggered the responses of neurons with zero-disparity tuning. A more realistic model of binocular combination also requires the consideration of neurons with other disparity-tuning profiles.”

      (4) The data in the paper show some features that have been reported before but are not captured by the model. Notably for neurons with extreme values of ocular dominance, the binocular response is typically less than the larger of the two monocular responses. This is apparent in the row of plots in Figure 2D from individual animals and in the pooled data in Figure 2E. Responses of this type are characteristic of tuned inhibitory [TI] neurons[2]. It is not immediately clear why this feature of the data does not appear in the summary and analysis in Figure 3.

      This difference is indeed captured by the model, which can be more easily appreciated in Fig. 4A where monocular and binocular model simulations are plotted in the same panel. In the text, we also wrote: (Ln 195) “It is apparent that binocular responses cannot be explained by the sum of monocular responses, as binocular responses are substantially lower than the summed monocular responses for both monocular and binocular neurons. Nor can binocular responses be explained by the responses to the preferred eye, as binocular responses are also lower than those to the preferred eye (the larger of the two monocular responses) for monocular neurons.”

      The paper text states that the responses were "first normalized by the median of the binocular responses". This will certainly get rid of this characteristic of the data, but this step needs better justification, or an amendment to the main analysis is needed.

      The relevant sentence has been rewritten as “Monocular and binocular data of each FOV/depth, as well as the pooled data, were first normalized by the respective median of the binocular responses of all neurons in the same FOV/depth.” This normalization would render the overall binocular responses to be around unity, for the purpose of facilitating comparisons among all FOV/depth, but it would not affect the overall characteristic of the data.

      In the present form, the model and analysis do not appear to fit the data in Figure 2 as accurately as needed.

      Thanks for pointing out the problem, as data fitting for FOV C_270 and the pooled data were especially inaccurate. The issue has been mostly fixed when each datum was weighted by its standard deviation (please see the updated Fig. 3).

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Zeng and Staley provide a valuable analysis of the molecular requirements for the export of a reporter mRNA that contains a lariat structure at its 5' end in the budding yeast S. cerevisiae. The authors provide evidence that this is regulated by the main mRNA export machinery (Yra1, Mex67, Nab2, Npl3, Tom1, and Mlp1). Of note, Mlp1 has been mainly implicated in the nuclear retention of unspliced pre-mRNA (i.e. quality control), and relatively little has been done to investigate its role in mRNA export in budding yeast.

      Strengths:

      There is relatively little information in the current literature about the nuclear export of splicing intermediates. This paper provides one of the first analyses of this process and dissects the molecular components that promote this form of RNA export. Overall, the strength of the data presented in the manuscript is solid. The paper is well written and the message is clear and of general interest to the mRNA community.

      We thank the reviewer for highlighting these strengths.

      Weaknesses:

      There are three problems with the paper, although these are not major and likely would not affect the final model as most aspects of the molecular details are confirmed by multiple complementary assays.

      (1) The brG reporter produces both unspliced pre-mRNA and a lariat-containing intermediate RNA. Based on the primer extension assay the authors claim that only 33% of the final product is in pre-mRNA form and that this "is insufficient to account for the magnitude of the cytoplasmic signal from the brG reporter (83%)". Nevertheless, it is possible that primer extension is incomplete or that the lariat-containing RNA is inaccessible for smFISH. The authors could easily perform a dual smFISH experiment (similar to Adivarahan et l., Molecular Cell 2018) where exon 1 is labelled with probes of one color, and the region that overlaps the lariat-containing intermediate is labelled with probes of a second color. If the authors are correct, then one-third of the smFISH foci should have both labels and the rest would have only the second label. This would also confirm that the latter (i.e. the lariat-containing RNAs) are exported to the cytoplasm. Using this approach, the authors could then show that MLP1-depletion (or depletion of any of the other factors) affect(s) one pool of RNAs (i.e. those that are lariat-containing) but not the other (i.e. pre-mRNA). Including these experiments would make the evidence for their model more convincing.

      We appreciate the reviewer’s comments and suggestions. Concerning the primer extension analysis, we are considering alternative assays to quantitate the pre-mRNA and lariat intermediate levels. Concerning the accessibility of the lariat intermediate in smRNA-FISH, in a dbr1∆ strain the only major species from the UAc reporter that is detected by primer extension is the lariat intermediate (Fig. S3), and this reporter is readily detected by smRNA-FISH, indicate that the lariat intermediate is accessible to smRNA-FISH. Concerning discriminating between pre-mRNA and lariat intermediate by smRNA-FISH, we agree with the reviewer that a dual smFISH experiment would directly distinguish between the signals of these species. The brG reporter we used in most smRNA-FISH experiments has a 5’ exon that is too short for smRNA-FISH probes, as is typical of most budding yeast 5’ exons. We have tried to replace the 5’ exon with a longer sequence (GFP) to allow for smRNA-FISH; however, this substitution inhibited splicing. Therefore, to distinguish signals from pre-mRNA versus lariat intermediate, we used additional reporters: G1c and brC reporters, which accumulate pre-mRNA essentially exclusively (Fig. S2A-C), and the UAc reporter, which accumulates lariat intermediate exclusively, in a dbr1∆ strain (Fig. S3). Whereas the mlp1 deletion did not change beta-galactosidase activities of the G1c and brC pre-mRNA-accumulating reporters (Fig. S2E), the mlp1 deletion in a dbr1∆ background did reduce the beta-galactosidase activities of the UAc lariat intermediate-accumulating reporter (Fig. 3D) and did increase smRNA-FISH signal of this reporter in the nucleus (Fig. 3E). These observations corroborate our interpretation based on the brG reporter that Mlp1p is required for efficient export of lariat intermediates but not pre-mRNAs.

      (2) In some cases, the number of smFISH foci appears to change drastically depending on the genetic background. This could either be due to the stochastic nature of mRNA expression between cells or reflect real differences between the genetic backgrounds that could alter the interpretation of the other observations.

      We thank the reviewer for raising this point. We will review our data to distinguish between these possibilities.

      (3) The authors state in the discussion that "the general mRNA export pathway transports discarded lariat intermediates into the cytoplasm". Although this appears to be the case for the reporters that are investigated in this paper, I don't think that the authors should make such a broad sweeping claim. It may be that some discarded lariat intermediates are exported to the cytoplasm while others are targeted for nuclear retention and/or decay.

      The reviewer’s point is well-taken. We will revise the wording accordingly.

      Reviewer #2 (Public Review):

      In this report, Zeng and Staley have used an elegant combination of RNA imaging approaches (single molecule FISH), RNA co-immunoprecipitations, and translation reporters to characterize the factors and pathways involved in the nuclear export of splicing intermediates in budding yeast. Their study notably involves the use of specific reporter genes, which lead to the accumulation of pre-mRNA and lariat species, in a battery of mutants impacting mRNA export and quality control.

      The authors convincingly demonstrate that mRNA species expressed from such reporters are exported to the cytoplasm in a manner depending on the canonical mRNA export machinery (Mex67 and its adaptors) and the nuclear pore complex (NPC) basket (Mlp1). Interestingly, they provide evidence that the export of splicing intermediates requires docking and subsequent undocking at the nuclear basket, a step possibly more critical than for regular mRNAs.

      We thank the reviewer for this overall positive assessment.

      However, their assays do not always allow us to define whether the impacted mRNA species correspond to lariats and/or pre-mRNAs. This is all the more critical since their findings apparently contradict previous reports that supported a role for the nuclear basket in pre-mRNA quality control. These earlier studies, which were similarly based on the use of dedicated yet distinct reporters, had found that the nuclear basket subunit Mlp1, together with different cofactors, prevents the export of unspliced mRNA species. It would be important to clarify experimentally and discuss the possible reasons for these discrepancies.

      It is true that we did not assess export of all reporters in all mutant strains by smFISH; however, we did validate the key conclusion that the export of lariat intermediates requires the nuclear basket gene MLP1: the export of both the brG reporter (mostly lariat intermediate) and the UAc reporter (exclusively lariat intermediate) showed a dependence on MLP1 (Fig. 3). Further, by beta-galactosidase activity, we tested in total five separate reporters – three that accumulated lariat intermediate and two that accumulated exclusively pre-mRNA; only the three reporters accumulating lariat intermediate showed a dependence of export on MLP1 (Fig. 4B,D; Fig S2D); the reporters accumulating pre-mRNA did not show a dependence on MLP1 (Fig. S2E), further validating our main conclusion. We are considering additional experiments to validate this key conclusion even further. Also, see response to comment 1 from reviewer 1.

      We agree that the main conclusion from this manuscript differs from earlier studies. A key difference is that prior studies monitored exclusively pre-mRNA. In our study, we monitored pre-mRNA and lariat intermediate species and in doing so revealed a role for MLP1 in the export of lariat intermediates. This study, our previous study, as well as the previous studies of others have all provided evidence for efficient export of pre-mRNA; all of these studies are in conflict with the studies purporting a general role for the nuclear basked in retaining immature mRNA. Still, these past apparently conflicting studies can be re-interpreted in the context of our model that the export of such species requires docking at the nuclear basket, followed by undocking. In a revised manuscript, we will discuss the possibility that pre-mRNA apparently “retained” by the nuclear basket are stalled in export at the undocking stage.

      Reviewer #3 (Public Review):

      Summary:

      Zeng and Stanley show that in yeast, intron-lariat intermediates that accumulated due to defects in pre-mRNA splicing, are transported to the cytoplasm using the canonical mRNA export pathway. Moreover, they demonstrate that export requires the nuclear basket, a sub-structure of the nuclear pore complex previously implicated with the retention of immature mRNAs. These observations are important as they put into question a longstanding model that the main role of the nuclear basket is to ensure nuclear retention of immature or faulty mRNAs.

      Strengths:

      The authors elegantly combine genetic, biochemical, and single-molecule resolution microscopy approaches to identify the cellular pathway that mediates the cytoplasmic accumulation of lariat intermediates. Cytoplasmic accumulation of such splicing intermediates had been observed in various previous studies but how these RNAs reach the cytoplasm had not yet been investigated. By using smFISH, the authors present compelling, and, for the first time, direct evidence that these intermediates accumulate in the cytoplasm and that this requires the canonical mRNA export pathway, including the RNA export receptor Mex67 as well as various RNA-binding proteins including Yra1, Npl3 and Nab2. Moreover, they show that the export of lariat intermediates, but not mRNAs, requires the nuclear basket (Mlp1) and basket-associated proteins previously linked to the mRNP rearrangements at the nuclear pore. This is a surprising and important observation with respect to a possible function of the nuclear basket in mRNA export and quality control, as it challenges a longstanding model that the role of the basket in mRNA export is primarily to act as a gatekeeper to ensure that immature mRNAs are not exported. As discussed by the authors, their finding suggests a role for the basket in promoting the export of certain types of RNAs rather than retention, a model also supported by more recent studies in mammalian cells. Moreover, their findings also collaborate with a recent paper showing that in yeast, not all nuclear pores contain a basket (PMID: 36220102), an observation that also questioned the gatekeeper model of the basket, as it is difficult to imagine how the basket can serve as a gatekeeper if not all nuclear pore contain such a structure.

      We thank the reviewer for highlighting the importance and surprising nature of our findings.

      Weaknesses:

      One weakness of this study is that all their experiments rely on using synthetic splicing reporter containing a lacZ gene that produces a relatively long transcript compared to the average yeast mRNA.

      We are considering repeating some of our experiments to monitor export of RNAs with more average lengths.

      The rationale for using a reporter containing the brG (G branch point) resulting in more stable lariat intermediates due to them being inefficient substrates for the debranching enzyme Dbr1 could be described earlier in the manuscript, as this otherwise only becomes clear towards the end, what is confusing.

      We thank the reviewer for this comment. We will revise the text to explain sooner the rationale for using the brG reporter to assess the export of lariat intermediates.

      Discussion of their observation in the context that, in yeast, not all pores contain a basket would be useful.

      Thanks for this suggestion. We will raise this point that a nuclear basket is not present on all nuclear pores and discuss the implications.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work describes the mechanism of protein disaggregation by the ClpL AAA+ protein of Listeria monocytogenes. Using several model subtrate proteins the authors first show that ClpL possesses a robust disaggregase activity that does not further require the endogenous DnaK chaperone in vitro. In addition, they found that ClpL is more thermostable than the endogenous L. monocytogenes DnaK and has the capacity to unfold tightly folded protein domains. The mechanistic basis for the robust disaggregase activity of ClpL was also dissected in vitro and in some cases, supported by in vivo data performed in chaperonedeficient E. coli strains. The data presented show that the two AAA domains, the pore-2 site and the N-terminal domain (NTD) of ClpL are critical for its disaggregase activity. Remarkably, grafting the NTD of ClpL to ClpB converted ClpB into an autonomous disaggregase, highlighting the importance of such a domain in the DnaK-independent disaggregation of proteins. The role of the ClpL NTD domain was further dissected, identifying key residues and positions necessary for aggregate recognition and disaggregation. Finally, using sets of SEC and negative staining EM experiments combined with conditional covalent linkages and disaggregation assays the authors found that ClpL shows significant structural plasticity, forming dynamic hexameric and heptameric active single rings that can further form higher assembly states via their middle domains.

      Strengths:

      The manuscript is well-written and the experimental work is well executed. It contains a robust and complete set of in vitro data that push further our knowledge of such important disaggregases. It shows the importance of the atypical ClpL N-terminal domain in the disaggregation process as well as the structural malleability of such AAA+ proteins. More generally, this work expands our knowledge of heat resistance in bacterial pathogens.

      Weaknesses:

      There is no specific weakness in this work, although it would have helped to have a drawing model showing how ClpL performs protein disaggregation based on their new findings. The function of the higher assembly states of ClpL remains unresolved and will need further extensive research. Similarly, it will be interesting in the future to see whether the sole function of the plasmid-encoded ClpL is to cope with general protein aggregates under heat stress.

      We thank the reviewer for the positive evaluation. We agree with the reviewer that it will be important to test whether ClpL can bind to and process non-aggregated protein substrates. Our preliminary analysis suggests that the disaggregation activity of ClpL is most relevant in vivo, pointing to protein aggregates as main target.

      We also agree that the role of dimers or tetramers of ClpL rings needs to be further explored. Our initial analysis suggests a function of ring dimers as a resting state. It will now be important to study the dynamics of ClpL assembly formation and test whether substrate presence shifts ClpL assemblies towards an active, single ring state.

      Reviewer #2 (Public Review):

      The manuscript by Bohl et al. is an interesting and carefully done study on the biochemical properties and mode of action of potent autonomous AAA+ disaggregase ClpL from Listeria monocytogenes. ClpL is encoded on plasmids. It shows high thermal stability and provides Listeria monocytogenes food-pathogen substantial increase in resistance to heat. The authors show that ClpL interacts with aggregated proteins through the aromatic residues present in its N-terminal domain and subsequently unfolds proteins from aggregates translocating polypeptide chains through the central pore in its oligomeric ring structure. The structure of ClpL oligomers was also investigated in the manuscript. The results suggest that mono-ring structure and not dimer or trimer of rings, observed in addition to mono-ring structures under EM, is an active species of disaggregase.

      Presented experiments are conclusive and well-controlled. Several mutants were created to analyze the importance of a particular ClpL domain.

      The study's strength lies in the direct comparison of ClpL biochemical properties with autonomous ClpG disaggregase present in selected Gram-negative bacteria and well-studied E. coli system consisting of ClpB disaggregase and DnaK and its cochaperones. This puts the obtained results in a broader context.

      We thank the reviewer for the detailed comments. There are no specific weaknesses indicated in the public review.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript details the characterization of ClpL from L. monocytogenes as a potent and autonomous AAA+ disaggregase. The authors demonstrate that ClpL has potent and DnaKindependent disaggregase activity towards a variety of aggregated model substrates and that this disaggregase activity appears to be greater than that observed with the canonical DnaK/ClpB co-chaperone. Furthermore, Lm ClpL appears to have greater thermostability as compared to Lm DnaK, suggesting that ClpL-expressing cells may be able to withstand more severe heat stress conditions. Interestingly, Lm ClpP can provide thermotolerance to E. coli that have been genetically depleted of either ClpB or in cells expressing a mutant DnaK103. The authors further characterized the mechanisms by which ClpL interacts with protein aggregates, identifying that the N-terminal domain of ClpL is essential for disaggregase function. Lastly, by EM and mutagenesis analysis, the authors report that ClpL can exist in a variety of larger macromolecular complexes, including dimer or trimers of hexamers/heptamers, and they provide evidence that the N-terminal domains of ClpL prevent dimer ring formation, thus promoting an active and substrate-binding ClpL complex. Throughout this manuscript the authors compare Lm ClpL to ClpG, another potent and autonomous disaggregase found in gram-negative bacteria that have been reported on previously, demonstrating that these two enzymes share homologous activity and qualities. Taken together this report clearly establishes ClpL as a novel and autonomous disaggregase.

      Strengths:

      The work presented in this report amounts to a significant body of novel and significant work that will be of interest to the protein chaperone community. Furthermore, by providing examples of how ClpL can provide in vivo thermotolerance to both E. coli and L. gasseri the authors have expanded the significance of this work and provided novel insight into potential mechanisms responsible for thermotolerance in food-borne pathogens.

      Weaknesses:

      The figures are clearly depicted and easy to understand, though some of the axis labeling is a bit misleading or confusing and may warrant revision. While I do feel that the results and discussion as presented support the authors' hypothesis and overall goal of demonstrating ClpL as a novel disaggregase, interpretation of the data is hindered as no statistical tests are provided throughout the manuscript. Because of this only qualitative analysis can be made, and as such many of the concluding statements involving pairwise comparisons need to be revisited or quantitative data with stats needs to be provided. The addition of statistical analysis is critical and should not be difficult, nor do I anticipate that it will change the conclusions of this report.

      We thank the reviewer for the valid criticism. We addressed the major concern of the reviewer and added the requested statistical analysis to all relevant figures. The analysis confirms our conclusions. We also followed the advice of the reviewer and revised axis labeling to increase clarity.

      Reviewer #1 (Recommendations For The Authors):

      • It would really help to have a model showing how ClpL performs protein disaggregation based on their findings.

      We show that ClpL exerts a threading activity that is fueled by ATP hydrolysis in both AAA domains and executed by pore-located aromatic residues. The basic disaggregation mechanism of ClpL therefore does not differ from ClpB and ClpG disaggregases. Similarly, the specificity of ClpL towards protein aggregates is based on simultaneous interactions of multiple N-terminal domains with the aggregate surface. We could recently describe a similar mode of aggregate recognition for ClpG [1]. We therefore prefer not to add a model to the manuscript. We are currently in preparation of a review that includes the characterization of the novel bacterial disaggregases and will present models there as we consider a review article as more appropriate for such illustrations.

      • AAA2 domain of ClpL in Fig 3E should be the same color as in Fig 1A.

      We used light grey instead of dark grey for the ClpL AAA2 domain in Fig 3E, to distinguish between ClpL and ClpB AAA domains. This kind of illustration allows for clearer separation of both AAA+ proteins and the fusion construct LN-ClpB*. We therefore prefer keeping the color code.

      • Partial suppression of the dnaK mutant could be added in the main manuscript Figure.

      The main figure 3 is already very dense and we therefore prefer showing respective data as part of a supplementary figure.

      • It would have been interesting to know if the robust autonomous disaggregation activity of ClpL would be sufficient to rescue the growth of more severe E. coli chaperone mutants, like dnaK tig for example. Did the authors test this?

      We tested whether expression of clpL can rescue growth of E. coli dnaK103 mutant cells at 40°C on LB plates. This experiment is different from the restoration of heat resistance in dnaK103 cells (Figure 3, figure supplement 2A), as continuous growth at elevated temperatures (40°C) is monitored instead of cell survival upon abrupt severe heat shock (49°C). We did not observe rescue of the temperature-sensitive growth phenotype (40°C) of dnaK103 cells upon clpL expression, though expression of clpG complemented the temperature-sensitive growth phenotype (see Author response image 1 below). This finding points to differences in chaperone activities of ClpL and ClpG. It also suggests that ClpL activity is largely restricted to heat-shock generated protein aggregates, enabling ClpL to complement the missing disaggregation function of DnaK but not other Hsp70 activities including folding and targeting of newly synthesized proteins. We believe that dissecting the molecular reasons for differences in ClpG and ClpL complementation activities should be part of an independent study and prefer showing the growth-complementation data only in the response letter.

      Author response image 1.

      Serial dilutions (10-1 – 10-6) of E. coli dnaK103 mutant cells expressing E. coli dnaK, L. monocytogenes clpL or P. aeruginosa clpG were spotted on LB plates including the indicated IPTG concentrations. Plates were incubated at 30°C or 40°C for 24 h. p: empty vector control.

      Reviewer #2 (Recommendations For The Authors):

      Based on results presented in Fig. 2B the authors conclude "that stand-alone disaggregases ClpL and ClpG but not the canonical KJE/ClpB disaggregase exhibit robust threading activities that allow for unfolding of tightly folded domains" (page 5 line 209). In this experiment, the threading power of disaggregases was assessed by monitoring YFP fluorescence during the disaggregation of aggregates formed by fusion luciferase-YFP protein. In my opinion, the results of the experiment depend not only on the threading power of disaggregases but also on the substrate recognition by analyzed disaggregating systems and/or processivity of disaggregases. N-terminal domain in the case of ClpL and KJE chaperones in the case of the KJE/ClpB system are involved in recognition. This is not discussed in the manuscript and the obtained result might be misinterpreted. The authors have created the LN-ClpB* construct (N-terminal domain of ClpL fused to derepressed ClpB) (Fig. 3 E and F). In my opinion, this construct should be used as an additional control in the experiment in Fig. 2 B. It possesses the same substrate recognition domain and therefore the direct comparison of disaggregases threading power might be possible.

      We performed the requested experiment (new Figure 3 - figure supplement 2D). We did not observe unfolding of YFP by LN-ClpB. Sínce ClpL and LN-ClpB do not differ in their aggregate targeting mechanisms, this finding underlines the differences in threading power between ClpL and activated (derepressed) ClpB. It also suggests that the AAA threading motors and the aggregate-targeting NTD largely function independently.

      Presented results suggest that tetramer and dimer of rings might be a "storage form" of disaggregase. It would be interesting to analyze the thermotolerance and/or phenotype of ClpL mutants that do not form tetramer and dimer (E352A). This variant possesses similar to WT disaggregation activity but does not form dimers and tetramers. If in vivo the differences are observed (for example toxicity of the mutant), the "storage form" hypothesis will be probable.

      When testing expression of clpL-MD mutants (E352A, F354A), which cannot form dimers and tetramers of ClpL rings, in E. coli ∆clpB cells, we observed reduced production levels as compared to ClpL wildtype and speculated that reduced expression might be linked to cellular toxicity. We therefore compared spotting efficiencies of E. coli ∆clpB cells expression clpL, ∆NclpL or the clpL-MD mutants at different temperatures. Expression of clpL at high levels abrogated colony formation at 42°C (new Figure 6 - figure supplement 3). ClpL toxicity was dependent on its NTD as no effect was observed upon expression of ∆N-clpL. ClpL-MD mutants (E352A, F354A) were expressed at much lower levels and exhibited strongly increased toxicity as compared to ClpL-WT when produced at comparable levels (new Figure 6 – figure supplement 3). This implies a protective role of ClpL ring dimers and tetramers in the cellular environment by downregulating ClpL activity. We envision that the formation of ClpL assemblies restricts accessibility of the ClpL NTDs and reduces substrate interaction. Increased toxicity of ClpL-E352A and ClpL-F354A points to a physiological relevance of the dimers and tetramers of ClpL rings and is in agreement with the proposed function as storage forms. We added this potential role of ClpL ring assemblies to the discussion section. Due to the strongly reduced production levels of ClpL MD mutants and their enhanced toxicity at elevated temperatures we did not test for their ability to restore thermotolerance in E. coli ∆clpB cells.

      Figure 6G and Figure 6 -figure supplement 2 - it is not clear what is the difference in the preparation of WT and WTox forms of ClpL.

      ClpL WT was purified under reduced conditions (+ 2 mM DTT), whereas WTox was purified in absence of DTT, thus serving as control for ClpL-T355C, which forms disulfide bonds upon purification without DTT. We have added respective information to the figure legend and the materials and methods section.

      Page 5 line 250 - wrong figure citation. Instead of Figure 1 - Figure Supplement 2A should be Figure 3 - Figure Supplement 2A.

      Page 5 line 251 - wrong figure citation. Instead of Figure 1 - Figure Supplement 2B/C should be Figure 3 - Figure Supplement 2B/C.

      Page 7 line 315 - wrong figure citation. Instead of Figure 4F, it should be Figure 4G Figure 1 - Figure Supplement 2E - At first glance, this Figure does not correspond to the text and is confusing. It would be nice to have bars for Lm ClpL activity in the figure. Alternatively, the description of the y-axis might be changed to "relative to Lm ClpL disaggregation activity" instead of "relative disaggregation activity". One has to carefully read the figure legend to find out that 1 corresponds to Lm ClpL activity.

      We have corrected all mistakes and changed the description of y-axis (Figure 1 - figure Supplement 2E) as suggested.

      Reviewer #3 (Recommendations For The Authors):

      (1) While the authors make many experimental comparisons throughout their study, no statistical tests are described or presented with their results or figures, nor are these statistical tests described in the methods. While the data as presented does appear to support the author's conclusions, without these statistical tests no meaningful conclusions from paired analysis can be drawn. Critically, please report these statistical tests. As a general suggestion please include the statistics (p-values) in the results section when presenting this data, as well as in the figure legends, as this will allow the reader to better understand the authors' presentation and interpretation of the data.

      We have added statistical tests to all relevant figures. The analysis is confirming our former statements. We have further clarified our approach for the statistical analysis in the methods section. We report p-values in the results section, however, due to the volume of comparisons we did not add individual p-values to the figure legends but used standard labeling with stars.

      (2) Some of the axis labels for the presented graphs are a bit misleading or confusing. Many describe a relative (%) disaggregation rate, but it is not clear from the methods or figure legends what this rate is relative to. Is it relative to non-denatured substrates, to no chaperone conditions, etc.? Is it possible to present the figures with the raw data rates/activity (ex. luciferase activity / time) vs. relative rates? I think that labeling these figure axes with "disaggregation rate" is a bit misleading as none of these experiments measure the actual rate of disaggregation of these model substrates per se (say by SEC-MALS or other biophysical measurements), but instead infer the extent of disaggregation by measuring a property of these substrates, i.e. luciferase activity or fluorescence intensity over time. Thus, labeling these figures with the appropriate axis for what is being measured, and then clarifying in the methods and results what is being inferred by these measurements, will help solidify the author's conclusions.

      Relative (%) disaggregation rate usually refers to the disaggregation activity of ClpL wildtype serving as reference. We clarified this point in the revised text and respective figure legends. We now also refer to the process measured (e.g. relative refolding activity of aggregated Luciferase instead of relative disaggregation activity) as suggested by the reviewer and added clarifications to text and materials and methods.

      Since we have many measurements for our most frequently used assays and have a reasonable estimate for the general variance within these assays, we found it reasonable to show activity data in relation to fixed controls. This reduces the impact of unspecific variance and thereby makes more accurate comparisons between different repetitions. The reference is now indicated in the axis title.

      (3) The figures are well presented, clutter-free, and graphically easy to understand. Figure legends have sufficient information aside from the aforementioned statistical information and should include the exact number of independent replicates for each panel/experiment (ex. n=4), not just a greater than 3. While the figures do show each data point along with the mean and error, in some figures it is difficult to determine the number of replicate data points. Example figures 2c, 2d, and 3a. Also, please state whether the error is std. error or SEM.

      While we agree, that this is valuable information, we fear that overloading the figure legends with information may take a toll on the readability. We therefore decided to append the number of replicates for each experiment in a separate supplementary table (Table S2). The depicted error is showing the SD and not the SEM, which we also specified in the figure legends.

      (4) There are various examples throughout the results where qualitative descriptors are used to describe comparisons. Examples of this are "hardly enhanced" (Figure 1) and "partially reduced" (Figure 6). While this is not necessarily wrong, qualitative descriptions of comparisons in this manner would require further explanation. What is the definition of "hardly" or "partially"? My recommendation is to just state the data quantitatively, such as "% enhanced" or "reduced by x", this way there is no misinterpretation. Examples of this can be found in Figures 6C-G. This would require a full statistical overview and presentation of these stats in the results.

      We followed the reviewer`s advice and no longer use the terms criticized (e.g. “hardly enhanced”). We instead provide the requested quantifications in the text.

      Questions for Figures:

      Figures 1B and 1C:

      (1) Is the disaggregase activity of ClpL towards heat-denatured luciferase and GFP ATPdependent? While the authors later in the manuscript show that mutations within the Walker B domains dramatically impair reactivation (disaggregation) of denatured luciferase, this does not rule out an ATP-independent effect of these mutations. Thus, the authors should test whether disaggregase activity is observed when wild-type ClpL is incubated with denatured substrates without ATP present or in the presence of ADP only.

      We tested for ClpL disaggregation activity in absence of nucleotide and presence of ADP only (new Figure 1 – figure supplement 2A). We did not observe any activity, demonstrating that ClpL activity depends on ATP binding and hydrolysis (see also Figure 3 – figure supplement 1D: ATPase-deficient ClpL-E197A/E530A is lacking disaggregation activity).

      (2) The authors suggest that a reduction in disaggregase activity observed in samples combining Lm ClpL and KJE (Figure 1C, supp. 1C-E) could be due to competition for protein aggregate binding as observed previously with ClpG. Did the authors test this directly by pulldown assay or another interaction-based assay? While ClpL and ClpG appear to work in a similar manner, it would be good to confirm this. Also, clarification on how this competition operates would be useful. Is it that ClpL prevents aggregates from interacting with KJE, or vice versa?

      We probed for binding of ClpL to aggregated Malate Dehydrogenase in the presence of L. monocytogenes or E. coli Hsp70 (DnaK + respective J-domain protein DnaJ) by a centrifugation-based assay. Here, we used the ATPase-deficient ClpL-E197A/E530A (ClpLDWB) mutant, ensuring stable substrate interaction in presence of ATP. We observe reduced binding of ClpL-DWB to protein aggregates in presence of DnaK/DnaJ (new Figure 1 – figure supplement 2G). This finding indicates that both chaperones compete for binding to aggregated proteins and explains inhibition of ClpL disaggregation activity in presence of Hsp70.

      (3) Related to the above, while incubation of aggregated substrates with ClpL and KJE does appear to reduce aggregase activity towards GFP (Figure 1c), α-glucosidase (Supp. 1C), and MDH (Supp. 1D), this doesn't appear to be the case towards luciferase (Figure 1b, Supp. 1b). Furthermore, ClpL aggregase activity is reduced towards luciferase when combined with E. coli KJE (Supp. 1e) but not with Lm KJE (Figure 1b). The authors provide no commentary or explanation for these observations. Furthermore, these results complicate the concluding statement that "combining ClpL with Lm KJE always led to a strong reduction in disaggregation activity ... ".

      We suggest that the differing inhibitory degrees of the KJE system on ClpL disaggregation activities reflect diverse binding affinities of KJE and ClpL to the respective aggregates. While we usually observe strong inhibition of ClpL activity in presence of KJE, this is different for aggregated Luciferase. This points to specific structural features of Luciferase aggregates or the presence of distinct binding sites on the aggregate surface that favour ClpL binding. We have added a respective comment to the revised manuscript.

      The former statement that “combining ClpL with Lm KJE always led to a strong reduction in disaggregation activity” referred to aggregated GFP, MDH and α-Glucosidase for which a strong inhibition of ClpL activity was observed. We have specified this point.

      Figures 1D and 1E:

      (1) The authors conclude that the heat sensitivity of ΔClpL L. gasseri cells is because they do not express the canonical ClpB disaggregase. A good test to validate this would be to express KJE/ClpB in these Lg ΔClpL cells to see if heat-sensitivity could be fully or partially rescued.

      We agree that such experiment would further strengthen the in vivo function of ClpL as alternative disaggregase. However, such approach would demand for co-expression of E. coli ClpB with the authentic E. coli DnaK chaperone system (KJE), as ClpB and DnaK cooperate in a species-specific manner [2-4]. This makes the experiment challenging, also because the individual components need to be expressed at a correct stochiometry. Furthermore, the presence of the authentic L. gasseri KJE system, which is likely competing with the E. coli KJE system for aggregate binding, will hamper E. coli KJE/ClpB disaggregation activity in L. gasseri. In view of these limitations, we would like to refrain from conducting such an experiment.

      (2) The rationale for investigating Lg ClpL, and the aggregase activity assays are compelling and support the hypothesis that ClpL contributes to thermotolerance in multiple grampositive species. Though, from Figure 1d, why was only Lg ClpL investigated? It appears that S. thermophilus also lacks the canonical ClpB disaggregase and demonstrates ΔClpL heat sensitivity. There is also other Lactobacillus sp. presented that lack ClpB but were not tested for heat sensitivity. Why only test and move forward with L. gasseri? Lastly, L. mesenteroides is ClpB-negative but doesn't demonstrate ΔClpL heat sensitivity. Why?

      We wanted to document high, partner-independent disaggregation activity for another ClpL homolog. We chose L. gasseri, as (i) this bacterial species lacks a ClpB homolog and (ii) a ∆clpL mutant exhibit reduced survival upon severe heat shock (thermotolerance phenotype), which is associated with defects in cellular protein disaggregation. The characterization of L. gasseri ClpL as potent disaggregase in vitro represents a proof-of-concept and allows to generalize our conclusion. We therefore did not further test S. thermophilus ClpL. L. mesenteroides encodes for ClpL but not ClpB, yet, a ∆clpL mutant has not yet been characterized in this species to the best of our knowledge. As we wanted to link ClpL in vitro activity with an in vivo phenotype, we did not characterize L. mesenteroides ClpL.

      We agree with the reviewer that the characterization of additional ClpL homologs is meaningful and interesting, however, we strongly believe that such analysis should be part of an exhaustive and independent study.

      Figures 2A and 2B:

      (1) Figure 2B demonstrates that both ClpL and ClpG, but not the canonical KJE/ClpB, are able to unfold YFP during the luciferase disaggregation process, suggesting that ClpL and ClpG exhibit stronger threading activity. A technical question, can luciferase activity be measured alongside in the same assay sample? If so, would you expect to observe a concomitant increase in luciferase activity as YFP fluorescence decreases?

      KJE/ClpB can partially disaggregate and refold aggregated Luciferase-YFP without unfolding YFP during the disaggregation reaction [5]. YFP unfolding is therefore not linked to refolding of aggregated Luciferase-YFP. On the other hand, unfolding of YFP during disaggregation can hamper the refolding of the fused Luciferase moiety as observed for the AAA+ protein ClpC in presence of its partner MecA [5]. These diverse effects make the interpretation of LuciferaseYFP refolding experiments difficult as the degree of YFP unfolding activity does not necessarily correlate with the extend of Luciferase refolding. We therefore avoided to perform the suggested experiment.

      Figure 2C and 2D:

      (1) Thermal shift assays for ClpL, ClpG, and DnaK were completed with various nucleotides. Were these experiments also completed with samples in their nucleotide-free apo state? Also, while all these chaperones are ATPases, the nucleotides used differ, but no explanation is provided. Comparison should be made of these ATPases bound to the same molecules.

      We did not monitor thermal stabilities of chaperones without nucleotide as such state is likely not relevant in vivo. We used ATPγS in case of ClpL to keep the AAA+ protein in the ATPconformation. ATP would be rapidly converted to ADP due to the high intrinsic ATPase activity of ClpL. In case of DnaK ATPγS cannot be used as it does not induce the ATP conformation [6]. The low intrinsic ATPase activity of DnaK allows determining the thermal stability of its ATP conformation in presence of ATP. This is confirmed by calculating a reduced thermal stability of ADP-bound DnaK.

      (2) The authors suggest that incubation at 55⁰C will cause unfolding of Lm DnaK, but not ClpL, providing ClpL-positive Lm cells disaggregase activity at 55⁰C. While the thermal shift assays in Figures 2C and 2D support this, an experiment to test this would be to heat-treat Lm DnaK and ClpL at 55⁰C then test for disaggregase activity using either aggregated luciferase or GFP as in Figure 1.

      We followed the suggestion of the reviewer and incubated Lm ClpL and DnaK at 55-58°C in presence of ATP for 15 min prior to their use in disaggregation assays. We compared the activities of pre-heated chaperones with controls that were incubated at 30°C for 15 min. Notably, we did not observe a loss of DnaK disaggregation activity, suggesting that thermal unfolding of DnaK at this temperature is reversible. We provide these data as Figure 2 -figure supplement 1 and added a respective statement to the revised manuscript.

      Figure 3B:

      (1) The authors state that ATPase activity of ΔN-ClpL was "hardly affected", but from the data provided it appeared to result in an approximate 35% reduction. As discussed above, no stats are provided for this figure, but given the error bars, it is highly likely that this reduction is significant. Please perform this statistical test, and if significant, please reflect this in the written results as well as the figure. Lastly, if this reduction in ATPase activity is significant, why would this be so, and could this contribute to the reduction in aggregase activity towards luciferase and MDH observed in Figure 3A?

      We applied statistical tests as suggested by the reviewer, showing that the reduction in ATPase activity of ∆N-ClpL is statistically significant. N-terminal domains of Hsp100 proteins can modulate ATPase activity as shown for the family member ClpB, functioning as auxiliary regulatory element for fine tuning of ClpB activity [7]. We speculate that the impact of the ClpL-NTD on the assembly state (stabilization of ClpL ring dimers) might affect ClpL ATPase activity. We would like to point out that other ClpL mutants (e.g. NTD mutant ClpL-Y51A; MDmutant ClpL-F354A) have a similarly reduced ATPase activity, yet exhibit substantial disaggregation activity (approx. 2-fold reduced compared to ClpL wildtype). In contrast ∆NClpL does not exhibit any disaggregation activity. This suggests that the loss of disaggregation activity is caused by a substrate binding defect but not by a partial reduction in ATPase activity. We added a comment on the reduced ATPase activity and also discuss its potential reasons in the discussion section.

      (2) I think the authors' conclusion that deletion of the ClpL NTD does not contribute to structural defects of ClpL is premature given the apparent reduction in ATPase activity. Did the authors perform any biophysical analysis of ΔN-ClpL to confirm this conclusion? Thermal shift assays, Native-PAGE, or size-exclusion chromatography for aggregates would all be good assays to demonstrate that the wild-type and ΔN-ClpL have similar structural properties. Surprisingly, Figure 6 describes significant macromolecular changes associated with ΔN-ClpL such that it preferentially forms a dimer of rings. Furthermore, in Supp. Figure 6D the authors report that ΔN-ClpL appears to have an increased Tm as compared to WT- or ΔM-ClpL. The authors should reflect these observations as deletion of the ClpL NTD does appear to contribute to structural changes, though perhaps only at the macromolecular scale, i.e. dimerization of the rings.

      We have characterized the oligomeric state of ∆N-ClpL by size exclusion chromatography (Figure 6 – figure supplement 1A) and negative staining electron microscopy (Figure 6C), both showing that it forms assemblies similar to ClpL wildtype. We did not observe an increased tendency of ∆N-ClpL to form aggregates and the protein remained fully soluble after several cycles of thawing and freezing. EM data reveal that ∆N-ClpL exclusively form ring dimers, suggesting that the NTDs destabilize MD-MD interactions. The stabilized interaction between two ∆N-ClpL rings can explain the increased thermal stability (Figure 6 – figure supplement 1D). We speculate that the ClpL NTDs either affect MD-MD interactions through steric hindrance or by directly contacting MDs. We have added a respective statement to the discussion section.

      Figure 3C and 3D:

      (1) Given the larger error in samples expressing ClpG (100) or ClpL (100) statistical analysis with p-values is required to make conclusions regarding the comparison of these samples vs. plasmid-only control. The effect of ΔN-ClpL vs. wild-type ClpL looks compelling and does appear to attenuate the ClpL-induced thermotolerance. This is nicely demonstrated in Figure 3D.

      We quantified respective spot tests (new Figure 3E) and tested for statistical significance as suggested by the reviewer. We show that restoration of heat resistance is significant for the first 30 min. While we always observe rescue at later timepoints significance is lost here due to larger deviations in the number of viable cells and thus the degree of complementation.

      Figure 3F:

      (1) What is the role of the ClpB NTD? It appears to be dispensable for disaggregase activity, assuming that ClpB is co-incubated with KJE. A quick explanation of this domain in ClpB could be useful.

      The ClpB NTD is not required for disaggregation activity, as ClpB is recruited to protein aggregates by DnaK, which interacts with the ClpB MDs. Still, two functions have been described for the ClpB NTD. First, it can bind soluble unfolded substrates such as casein [8]. This substrate binding function can increase ClpB disaggregation activity towards some aggregated model substrates (e.g. Glucose-6-phosphate dehydrogenase) [9]. However, NTD deletion usually does not decrease ClpB disaggregation activity and can even lead to an increase [7, 10, 11]. An increased disaggregation activity of ∆N-ClpB correlates with an enhanced ATPase activity, which is explained by NTDs stabilizing a repressing conformation of the ClpB MDs, which function as main regulators of ClpB ATPase activity [7]. We added a short description on the role of the ClpB NTD to the respective results section.

      (2) The result of fusing the ClpL NTD to ClpB supports a role for this NTD in promoting autonomous disaggregase activity. What would you expect to observe if the fused Ln-ClpB protein was co-incubated with KJE? Would this further promote disaggregase activity, or potentially impair through competition? This experiment could potentially support the authors' hypothesis that ClpL and ClpB/KJE can compete with each other for aggregated substrates as suggested in Figure 1.

      We have performed the suggested experiment using aggregated MDH as model substrate. We did not observe an inhibition of LN-ClpB disaggregation activity in presence of KJE. In contrast ClpL disaggregation activity towards aggregated MDH is inhibited upon addition of KJE due to competition for aggregate binding (Figure 1 – figure supplement 2D/F). Disaggregation activity of LN-ClpB in presence of KJE can be explained by functional cooperation between both chaperone systems, which involves interactions between aggregate-bound DnaK and the ClpB MDs of the LN-ClpB fusion construct. We prefer showing these data only in the response letter but not including them in the manuscript, as respective results distract from the main message of the LN-ClpB fusion construct: the ClpL NTD functions as autonomous aggregatetargeting unit that can be transferred to other Hsp100 family members.

      Author response image 2.

      LN-ClpB cooperates with DnaK in protein disaggregation. Relative MDH disaggregation activities of indicated disaggregation systems were determined. KJE: DnaK/DnaJ/GrpE. The disaggregation activity of Lm ClpL was set to 1. Statistical Analysis: Oneway ANOVA, Welch’s Test for post-hoc multiple comparisons. Significance levels: **p < 0.001. n.s.: not significant.

      Figures 4E and 4F:

      (1) While the effect of various NTD mutations follows a similar trend in regard to the impairment of ClpL-mediated disaggregation of luciferase and MDH, the degree of these effects does appear different. For example, patch A and C mutations reduce ClpL disaggregase activity towards luciferase (~60% / 50% reduction) vs. MDH (>90%) respectively. While these results do suggest a critical role for residues in patches A and C of ClpL, these substrate-specific differences are not discussed. Why would we expect a difference in the effect of these patch A/C ClpL mutations on different substrates?

      We speculate that the aggregate structure and the presence or distributions of ClpL NTD binding sites differ between aggregated Luciferase and MDH. A difference between both aggregated model substrates was also observed when testing for an inhibitory effect of Lm KJE (and Ec KJE) on ClpL disaggregation activity (see comment above). We speculate that the mutated NTD residues make specific contributions to aggregate recognition. The severity of binding defects (and reduction of disaggregation activities) of these mutants will depend on specific features of the aggregated model substrates. We now point out that ClpL NTD patch mutants can differ in disaggregation activities depending on the aggregated model substrate used and refer to potential differences in aggregate structures.

      (2) The authors suggest that the loss of disaggregation activity of selected NTD mutants could be linked to reduced binding to aggregated luciferase. While this is likely given that these mutations do not appear to affect ATPase activity (Supp. 4), it could be possible that these mutants can still bind to aggregated luciferase and some other mechanism may impair disaggregation. A pull-down assay would help to prove whether reduced binding is observed in these NTD ClpL mutants. This also needs to be confirmed for Supp. Figure 4.2H.

      We have shown a strong correlation between loss of aggregate binding and disaggregation activity for several NTD mutants (Fig. 4G, Figure 4 – figure supplement 2H). We decided to perform the aggregate binding assay only with mutants that show a full but not a partial disaggregation defect as we made the experience that the centrifugation-based assay provides clear and reproducible results for loss-of-activity mutants but has limitations in revealing differences for partially affected mutants. This might be explained by the use of nonhydrolyzable ATPγS in these experiments, which strongly stabilizes substrate interactions, potentially covering partial binding defects. We agree with the reviewer that some ClpL NTD mutants might have additional effects on disaggregation activity by e.g. controlling substrate transfer to the processing pore site. We have added a respective comment to the revised manuscript.

      (3) Supp. Figure 4.2H has no description in the figure legend. The Y-axes states % aggregate bound to chaperone. How was this measured? See the above comments for Figures 4E and 4F.

      We apologize and added the description to the figure legend. The determination of % aggregate bound chaperone is based on the quantifications of chaperones present in the supernatant and pellet fractions after sample centrifugation. Background levels of chaperones in the pellet fractions in absence of protein aggregates were subtracted. We added this information to the materials and methods section.

      Figure 6G:

      The authors observed reduced disaggregase activity and ATPase activity of mutant T355C under both oxidative and reducing conditions. While this observation under oxidative conditions supports the authors' hypothesis, under reducing conditions (+DTT) we would expect the enzyme to behave similarly to wild-type ClpL unless this mutation has other effects. Can the authors please comment on this and provide an explanation or hypothesis?

      The reviewer is correct, ClpL-T355C exhibit a reduced disaggregation activity (Figure 6 – figure supplement 2B). We observe a similar reduction in disaggregation activity for the ClpL MD mutant F354A, pointing to an auxiliary function of the MD in protein disaggregation. We have made a respective comment in the discussion section of the revised manuscript. How exactly ClpL MDs support protein disaggregation is currently unclear and will be subject of future analysis in the lab. We strongly believe that such analysis should be part of an independent study.

      Discussion:

      In the fourth feature, it is discussed that one disaggregase feature of ClpL is that it does not cooperate with the ClpP protease. While a reference is provided for the canonical ClpB, no data in this paper, nor a reference, is provided demonstrating that ClpL does not interact with ClpP. As discussed, it is highly unlikely that ClpL interacts with ClpP given that ClpL does not contain the IGL/F loops that mediate the interaction of ClpP with cochaperones, such as ClpX, but data or a reference is needed to make such a factual statement.

      The absence of the IGL/F loop makes an interaction between ClpL and ClpP highly unlikely. However, the reviewer is correct, direct evidence for a ClpP-independent function of ClpL, though very likely, is not provided. We have therefore rephrased the respective statement: “Forth, novel disaggregases lack the specific IGL/F signature motif, which is essential for cooperation of other Hsp100 proteins with the peptidase ClpP. This feature is shared with the canonical ClpB disaggregase [12] suggesting that protein disaggregation is primarily linked to protein refolding.”.

      References

      (1) Katikaridis P, Simon B, Jenne T, Moon S, Lee C, Hennig J, et al. Structural basis of aggregate binding by the AAA+ disaggregase ClpG. J Biol Chem. 2023:105336.

      (2) Glover JR, Lindquist S. Hsp104, Hsp70, and Hsp40: A novel chaperone system that rescues previously aggregated proteins. Cell. 1998;94:73-82.

      (3) Krzewska J, Langer T, Liberek K. Mitochondrial Hsp78, a member of the Clp/Hsp100 family in Saccharomyces cerevisiae, cooperates with Hsp70 in protein refolding. FEBS Lett. 2001;489:92-6.

      (4) Seyffer F, Kummer E, Oguchi Y, Winkler J, Kumar M, Zahn R, et al. Hsp70 proteins bind Hsp100 regulatory M domains to activate AAA+ disaggregase at aggregate surfaces. Nat Struct Mol Biol. 2012;19:1347-55.

      (5) Haslberger T, Zdanowicz A, Brand I, Kirstein J, Turgay K, Mogk A, et al. Protein disaggregation by the AAA+ chaperone ClpB involves partial threading of looped polypeptide segments. Nat Struct Mol Biol. 2008;15:641-50.

      (6) Theyssen H, Schuster H-P, Bukau B, Reinstein J. The second step of ATP binding to DnaK induces peptide release. J Mol Biol. 1996;263:657-70.

      (7) Iljina M, Mazal H, Goloubinoff P, Riven I, Haran G. Entropic Inhibition: How the Activity of a AAA+ Machine Is Modulated by Its Substrate-Binding Domain. ACS chemical biology. 2021;16:775-85.

      (8) Rosenzweig R, Farber P, Velyvis A, Rennella E, Latham MP, Kay LE. ClpB N-terminal domain plays a regulatory role in protein disaggregation. Proc Natl Acad Sci U S A. 2015;112:E6872-81.

      (9) Barnett ME, Nagy M, Kedzierska S, Zolkiewski M. The amino-terminal domain of ClpB supports binding to strongly aggregated proteins. J Biol Chem. 2005;280:34940-5.

      (10) Beinker P, Schlee S, Groemping Y, Seidel R, Reinstein J. The N Terminus of ClpB from Thermus thermophilus Is Not Essential for the Chaperone Activity. J Biol Chem. 2002;277:47160-6.

      (11) Mogk A, Schlieker C, Strub C, Rist W, Weibezahn J, Bukau B. Roles of individual domains and conserved motifs of the AAA+ chaperone ClpB in oligomerization, ATP-hydrolysis and chaperone activity. J Biol Chem. 2003;278:15-24.

      (11) Weibezahn J, Tessarz P, Schlieker C, Zahn R, Maglica Z, Lee S, et al. Thermotolerance Requires Refolding of Aggregated Proteins by Substrate Translocation through the Central Pore of ClpB. Cell. 2004;119:653-65.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      The authors identified that genetically and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.<br /> Strengths:

      The study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia. Overall, the article it's well written and clear.<br /> Weaknesses:

      Many of the experiments confirmed previous published data, which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line. The mechanistic insights of how the increased amount of long ceramides (cer c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed.

      We thank the reviewer for the assessment and would like to point out that Cers1 had not previously been studied in the context of aging. Moreover, our unbiased pathway analyses in human skeletal muscle implicate CERS1 for the first time with myogenic differentiation, which we validate in cell culture systems. To improve mechanistic insights, as suggested by Reviewer #1, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. Hence, we believe that reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition, sphingosine is forced towards the production of other, potentially less toxic or myogenesis-impairing ceramides. We added these new data to the revised manuscript as new Fig 5D-E and new Fig S5G-I.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Wohlwend et al. investigates the implications of inhibiting ceramide synthase Cers1 on skeletal muscle function during aging. The authors propose a role for Cers1 in muscle myogenesis and aging sarcopenia. Both pharmacological and AAV-driven genetic inhibition of Cers1 in 18month-old mice lead to reduced C18 ceramides in skeletal muscle, exacerbating age-dependent features such as muscle atrophy, fibrosis, and center-nucleated fibers. Similarly, inhibition of the Cers1 orthologue in C. elegans reduces motility and causes alterations in muscle morphology.<br /> Strengths:

      The study is well-designed, carefully executed, and provides highly informative and novel findings that are relevant to the field.

      Weaknesses:

      The following points should be addressed to support the conclusions of the manuscript.

      (1) It would be essential to investigate whether P053 treatment of young mice induces age-dependent features besides muscle loss, such as muscle fibrosis or regeneration. This would help determine whether the exacerbation of age-dependent features solely depends on Cers1 inhibition or is associated with other factors related to age- dependent decline in cell function. Additionally, considering the reported role of Cers1 in whole-body adiposity, it is necessary to present data on mice body weight and fat mass in P053treated aged-mice.

      We thank the reviewer to suggest that we study Cers1 inhibition in young mice. In fact, a previous study shows that muscle-specific Cers1 knockout in young mice impairs muscle function (PMID: 31692231). Similar to our observation, these authors report reduced muscle fiber size and muscle force. Therefore, we do not believe that our observed effects of Cers1 inhibition in aged mice are specific to aging, although the phenotypic consequences are accentuated in aged mice. As requested by the reviewer, we attached the mice body weights and fat mass (Author response image 1A-B). The reduced fat mass upon P053 treatment is in line with previously reported reductions in fat mass in chow diet or high fat diet fed young mice upon Cers1 inhibition (PMID: 30605666, PMID: 30131496), again suggesting that the effect of Cers1 inhibition might not be specific to aging.

      Author response image 1.

      (A-B) Body mass (A) and Fat mass as % of body mass (B) were measured in 22mo C57BL/6J mice intraperitoneally injected with DMSO or P053 using EchoMRI (n=7-12 per group). (C-D) Grip strengh measurements in all limbs (C) or only the forelimbs (D) in 24mo C57BL/6J mice intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (n=8 per group). (E-F) Pax7 gene expression in P053 or AAV9 treated mice (n=6-7 per group) (E), or in mouse C2C12 muscle progenitor cells treated with 25nM scramble or Cers1 targeting shRNA (n=8 per group) (F). (G) Proliferation as measured by luciferase intensity in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=24 per group). Each column represents one biological replicate. (H) Overlayed FACS traces of Annexin-V (BB515, left) and Propidium Iodide (Cy5, right) of mouse C2C12 muscle myotubes treated with 25nM scramble or Cers1 targeting shRNA (n=3 per group). Quantification right: early apoptosis (Annexin+-PI-), late apoptosis (Annexin+-PI+), necrosis (Annexin--PI+), viability (Annexin--PI-). (I) Normalized Cers2 gene expression in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=6-7 per group). (J-K) Representative mitochondrial respiration traces of digitonin-permeablized mouse C2C12 muscle muscle cells treated DMSO or P053 (J) with quantification of basal, ATP-linked, proton leak respiration as well as spare capacity and maximal capacity linked respiration (n=4 per group). (L) Reactive oxygen production in mitochondria of mouse C2C12 muscle muscle cells treated DMSO or P053. (M) Enriched gene sets related to autophagy and mitophagy in 24mo C57BL/6J mouse muscles intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (left), or intraperitoneally injected with DMSO or P053 (right). Color gradient indicates normalized effect size. Dot size indicates statistical significance (n=6-8 per group). (N) Representative confocal Proteostat® stainings with quantifications of DMSO and P053 treated mouse muscle cells expressing APPSWE (top) and human primary myoblasts isolated from patients with inclusion body myositis (bottom). (O) Stillness duration during a 90 seconds interval in adult day 5 C. elegans treated with DMSO or 100uM P053. (P) Lifespan of C. elegans treated with DMSO or P053. (n=144-147 per group, for method details see main manuscript page 10).

      (2) As grip and exercise performance tests evaluate muscle function across several muscles, it is not evident how intramuscular AAV-mediated Cers1 inhibition solely in the gastrocnemius muscle can have a systemic effect or impact different muscles. This point requires clarification.

      The grip strength measurements presented in the manuscript come from hindlimb grip strength, as pointed out in the Methods section. We measured grip strength in all four limbs, as well as only fore- (Author response image 1C-D). While forelimb strength did not change, only hindlimb grip strength was significantly different in AAV-Cers1KD compared to the scramble control AAV (Fig 3I), which is in line with the fact that we only injected the AAV in the hindlimbs. This is similar to the effect we observed with our previous data where we saw altered muscle function upon IM AAV delivery in the gastrocnemius (PMID: PMID: 34878822, PMID: 37118545). The gastrocnemius likely has the largest contribution to hindlimb grip strength given its size, and possibly even overall grip strength as suggested by a trend of reduced grip strength in all four limbs (Author response image 1C). We also suspect that the hindlimb muscles have the largest contribution to uphill running as we could also see an effect on running performance. While we carefully injected a minimal amount of AAV into gastrocnemius to avoid leakage, we cannot completely rule out that some AAV might have spread to other muscles. We added this information to the discussion of the manuscript as a potential limitation of the study.

      (3) To further substantiate the role of Cers1 in myogenesis, it would be crucial to investigate the consequences of Cers1 inhibition under conditions of muscle damage, such as cardiotoxin treatment or eccentric exercise.<br /> While it would be interesting to study Cers1 in the context of muscle regeneration, and possibly mouse models of muscular dystrophy, we think such work would go beyond the scope of the current manuscript.

      (4) It would be informative to determine whether the muscle defects are primarily dependent on the reduction of C18-ceramides or the compensatory increase of C24-ceramides or C24-dihydroceramides.

      To improve mechanistic insights, as suggested by Reviewer #2, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. We added these data to the manuscript as new Fig 5D-E, new Fig S5G-I. These data, together with our previous results showing that Degs1 knockout reduces myogenesis (PMID: 37118545, Fig. 6s-x and Fig. 7) suggest that C24/dhC24 might contribute to the age-related impairments in myogenesis. We added the new results to the revised manuscript.

      (5) Previous studies from the research group (PMID 37118545) have shown that inhibiting the de novo sphingolipid pathway by blocking SPLC1-3 with myriocin counteracts muscle loss and that C18-ceramides increase during aging. In light of the current findings, certain issues need clarification and discussion. For instance, how would myriocin treatment, which reduces Cers1 activity because of the upstream inhibition of the pathway, have a positive effect on muscle? Additionally, it is essential to explain the association between the reduction of Cers1 gene expression with aging (Fig. 1B) and the age-dependent increase in C18-ceramides (PMID 37118545).

      Blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore seems beneficial for muscle aging. While most enzymes in the ceramide pathway that we studied so far (SPTLC1, CERS2) revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects. This is also visible in the direction of CERS1 expression compared to the other enzymes in one of our previous published studies (PMID: 37118545, Fig. 1e and Fig. 1f). In the current study, we show that Cers1 inhibition indeed exacerbates age-related myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. As the reviewer points out, both C18- and C24-ceramides seem to accumulate upon muscle aging. We think this is due to an overall overactive ceramide biosynthesis pathway. Blocking C18-ceramides via Cers1 inhibition results in the accumulates C24-ceramides and worsens muscle phenotypes (see reply to question #4). On the other hand, blocking C24-ceramides via Cers2 inhibition improves muscle differentiation. These observations together with the finding that Cers1 mediated inhibition of muscle differentiation is dependent on proper Cers2 function (new Fig 5D-E, new Fig S5G-I) points towards C24-ceramides as the main culprit of reduced muscle differentiation. Hence, at least a significant part of the benefits of blocking SPTLC1 might have been related to reducing very long-chain ceramides. We believe that reduced Cers1 expression in skeletal muscle upon aging, observed by us and others (PMID: 31692231), might reflect a compensatory mechanism to make up for an overall overactive ceramide flux in aged muscles. Reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition (new Fig 5E-D, new Fig S5G-I), sphingosine is forced towards the production of other, potentially less toxic, or myogenesis-impairing ceramides. These data are now added to the revised manuscript (see page 7). Details were added to the discussion of the manuscript (see page 8).

      Addressing these points will strengthen the manuscript's conclusions and provide a more comprehensive understanding of the role of Cers1 in skeletal muscle function during aging.

      Reviewer #1 (Recommendations For The Authors):

      The authors identified that genetical and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.

      Even though many of the experiments only confirmed previous published data (ref 21, 11,37,38), which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line, the study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia and opens new questions on understanding how inhibition of SPTLC1 (upstream CERS1) have beneficial effects in healthy aging (ref 15 published by the same authors).

      Overall, the article it's well written and clear. However, there is a major weakness. The mechanistic insights of how the increased amount of long ceramides (c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed. At the present stage the manuscript is descriptive and confirmatory of CERS1 mediated function in preserving muscle mass. The authors should consider the following points:

      Comments:

      (1) Muscle data

      (a) The effect of CERS1 inhibition on myotube formation must be better characterized. Which step of myogenesis is affected? Is stem cell renewal or MyoD replication/differentiation, or myoblast fusion or an increased cell death the major culprit of the small myotubes? Minor point: Figure S1C: show C14:00 level at 200 h; text of Fig S2A and 1F: MRF4 and Myogenin are not an early gene in myogenesis please correct, Fig S2B and 2C: changes in transcript does not mean changes in protein or myotube differentiation and therefore, authors must test myotube formation and myosin expression.

      Cers1 inhibition seems to affect differentiation and myoblast fusion. To test other suggested effects we performed more experiments as delineated. Inhibiting Cers1 systemically with the pharmacological inhibitor of Cers1 (P053) or with intramuscular delivery of AAV expressing a short hairpin RNA (shRNA) against Cers1 in mice did not affect Pax7 transcript levels (Author response image 1E). Moreover, we did also not observe an effect of shRNA targeting Cers1 on Pax7 levels in mouse C2C12 muscle progenitor cells (Author response image 1F). To characterize the effect of Cers1 inhibition on muscle progenitor proliferation/renewal, we used scramble shRNA, or shRNA targeting Cers1 in C2C12 muscle progenitors and measured proliferation using CellTiter-Glo (Promega). Results showed that Cers1KD had no significant effect on cell proliferation (Author response image 1G). Next, we assayed cell death in differentiating C2C12 myotubes deficient in Cers1 using FACS Analysis of Annexin V (left) and propidium iodide (right). We found no difference in early apoptosis, late apoptosis, necrosis, or muscle cell viability, suggesting that cell death can be ruled out to explain smaller myotubes (Author response image 1H). These findings support the notion that the inhibitory effect of Cers1 knockdown on muscle maturation are primarily based on effects on myogenesis rather than on apoptosis. Our data in the manuscript also suggests that Cers1 inhibition affects myoblast fusion, as shown by reduced myonucleation upon Cers1KD (Fig S3H right, Fig S5I).

      (b) The phenotype of CESR1 knockdown is milder than 0P53 treated mice (Fig S5D and Figure 3F, 3H are not significant) despite similar changes of Cer18:0, Cer24:0, Cer 24:1 concentration in muscles . Why?

      Increases in very long chain ceramides were in fact larger upon P053 administration compared to AAVmediated knockdown. For example, Cer24:0 levels increased by >50% upon P053 administration, compared to 20% by AAV injections. Moreover, dhC24:1 increased by 6.5-fold vs 2.5-fold upon P053 vs AAV treatment, respectively. These differences might not only explain the slightly attenuated phenotypes in the AA- treated mice but also underlines the notion that very long chain ceramides might cause muscle deterioration. We believe inhibiting the enzymatic activity of Cers1 (P053) as compared to degrading Cers1 transcripts is a more efficient strategy to reduce ceramide levels. However, we cannot completely rule out multi-organ, systemic effects of P053 treatment beyond its direct effect on muscle. We added these details in the discussion of the revised manuscript (see page 8 of the revised manuscript).

      (c) The authors talk about a possible compensation of CERS2 isoform but they never showed mRNA expression levels or CERS2 protein levels aner treatment. Is CERS2 higher expressed when CERS1 is downregulated in skeletal muscle?

      We appreciate the suggestion of the reviewer. We found no change in Cers2 mRNA levels upon Cers1 inhibition in mouse C2C12 myoblasts (Author response image 1I). We would like to point out that mRNA abundance might not be the optimal measurement for enzymes due to enzymatic activities. Therefore, we think metabolite levels are a better proxy of enzymatic activity. It should also be pointed out that “compensation” might not be an accurate description as sphingoid base substrate might simply be more available upon Cers1KD and hence, more substrate might be present for Cers2 to synthesize very long chain ceramides. This “re-routing” has been previously described in the literature and hypothesized to be related to avoid toxic (dh)sphingosine accumulation (PMID: 30131496). Therefore, we changed the wording in the revised manuscript to be more precise.

      (d) Force measurement of AAV CERS1 downregulated muscles could be a plus for the study (assay function of contractility)

      In the current study we measured grip strength in mice, which had previously been shown to be a good proxy of muscle strength and general health (PMID: 31631989). Indeed, our results of reduced muscle grip strength are in line with previous work that shows reduced contractility in muscles of Cers1 deficient mice (PMID: 31692231).

      (e) How are degradation pathways affected by the downregulation of CERS1. Is autophagy/mitophagy affected? How is mTOR and protein synthesis affected? There is a recent paper that showed that CerS1 silencing leads to a reduction in C18:0-Cer content, with a subsequent increase in the activity of the insulin pathway, and an improvement in skeletal muscle glucose uptake. Could be possible that CERS1 downregulation increases mTOR signalling and decreases autophagy pathway? Autophagic flux using colchicine in vivo would be useful to answer this hypothesis

      Cers1 in skeletal muscle has indeed been linked to metabolic homeostasis (see PMID: 30605666). In line with their finding in young mice we also find reduced fat mass upon P053 treatment in aged mice (Author response image 1A-B). We also looked into mitochondrial bioenergetics upon blocking Cers1 with P053 treatment using an O2k oxygraphy (Author response image 1J-L). Results show that Cers1 inhibition in mouse muscle cells increases mitochondrial respiration, similar to what has been shown before (PMID: 30131496). However, we also found that reactive oxygen species production in mouse muscle cells is increased upon P053 treatment, suggesting the presence of dysfunctional mitochondria upon inhibiting Cers1 with P053.We next looked into the mitophagy/autophagy degradation pathways suggested by the reviewer and do not find convincing evidence supporting that Cers1 has a major impact on autophagy or mitophagy derived gene sets in mice treated with shRNA against Cers1, or the Cers1 pharmacological inhibitor P053 (Author response image 1M).

      We then assessed the effect of Cers1 inhibition on transcripts levels related to the mTORC1/protein synthesis, as suggested by the reviewer. Cers1 knockdown in differentiating mouse muscle cells showed only a weak trend to reduce mTORC1 and its downstream targets (new Fig S4A). In line with this, there was no notable difference in protein synthesis in differentiating, Cers1 deficient mouse C2C12 myoblasts as assessed by L-homopropargylglycine (HPG) amino acid labeling using confocal microscopy (new Fig S4B) or FACS analyses (new Fig S4C). However, Cers1KD increased transcripts related to the myostatin-Foxo1 axis as well as the ubiquitin proteasome system (e.g. atrogin-1, MuRF1) (new Fig S4D), suggesting Cers1 inhibition increases protein degradation. We added these details to the revised manuscript on page 7. We recently implicated the ceramide pathway in regulating muscle protein homeostasis (PMID: 37196064). Therefore, we assessed the effect of Cers1 inhibition with the P053 pharmacological inhibitor on protein folding in muscle cells using the Proteostat dye that intercalates into the cross-beta spine of quaternary protein structures typically found in misfolded and aggregated proteins. Interestingly, inhibiting Cers1 further increased misfolded proteins in C2C12 mouse myoblasts expressing the Swedish mutation in APP and human myoblasts isolated from patients with inclusion body myositis (Author response imageure 1N). These findings suggest that deficient Cers1 might upregulate protein degradation to compensate for the accumulation of misfolded and aggregating proteins, which might contribute to impaired muscle function observed upon Cers1 knockdown. Further studies are needed to disentangle the underlying mechanstics.

      (f) The balances of ceramides have been found to play roles in mitophagy and fission with an impact on cell fate and metabolism. Did the authors check how are mitochondria morphology, mitophagy or how dynamics of mitochondria are altered in CERS1 knockdown muscles? (fission and fusion). There is growing evidence relating mitochondrial dysfunction to the contribution of the development of fibrosis and inflammation.

      Previously, CERS1 has been studied in the context of metabolism and mitochondria (for reference, please see PMID: 26739815, PMID: 29415895, PMID: 30605666, PMID: 30131496). In summary, these studies demonstrate that C18 ceramide levels are inversely related to insulin sensitivity in muscle and mitochondria, and that Cers1 inhibition improves insulin-stimulated suppression of hepatic glucose production and reduced high-fat diet induced adiposity. Moreover, improved mitochondrial respiration, citrate synthase activity and increased energy expenditure were reported upon Cers1 inhibition. Lack of Cers1 specifically in skeletal muscle was also reported to improve systemic glucose homeostasis. While these studies agree on the effect of Cers1 inhibition on fat loss, results on glucose homeostasis and insulin sensitivity differ depending on whether a pharmacologic or a genetic approach was used to inhibit Cers1. The current manuscript describes the effect of CERS1 on muscle function and myogenesis because these were the most strongly correlated pathways with CERS1 in human skeletal muscle (Fig 1C) and impact of Cers1 on these pathways is poorly studied, particularly in the context of aging. Therefore, we would like to refer to the mentioned studies investigating the effect of CERS1 on mitochondria and metabolism.

      (2) C.elegans data:

      (a) The authors checked maternal RNAi protocol to knockdown lagr-1 and showed alteration of muscle morphology at day 5. They also give pharmacological exposure of P053 drug at L4 stage. Furthermore, the authors also used a transgenic ortholog lagr-1 to perform the experiments. All of them were consistent showing a reduced movement. It would be important to show rescue of the muscle phenotype by overexpressing CERS1 ortholog in knockdown transgenic animals.

      We used RNAi to knockdown the Cers1 orthologue, lagr-1, in C.elegans. Therefore, we do not have transgenic animals. Overexpressing lagr-1 in the RNAi treated animals would also not be possible as the RNA from the overexpression would just get degraded.

      (b) The authors showed data about distance of C.elegans. It would be interesting to specify if body bends, reversals and stillness are affected in RNAi and transgenic Knockdown worms.

      As suggested, we measured trashing and stillness as suggested by the reviewer and found reduced trashing (new Fig S5B) and a trend towards an increase in stillness (Author response image 1O) in P053 treated worms on day 5 of adulthood, which is the day we observed significant differences in muscle morphology and movement (Fig 4D-E, Fig S5A). These data are now included in the revised manuscript.

      (c) Is there an effect on lifespan extension by knocking down CERS1?

      We performed two independent lifespan experiments in C.elegans treated with the Cers1 inhibitor P053 and found reduced lifespan in both replicate experiments (for second replicate, see Author response image 1P). We added these data to the revised manuscript as new Fig 4H.

      How do the authors explain the beneficial effect of sptlc1 inhibition on healthy aging muscle? Discuss more during the article if there is no possible explanation at the moment.

      We believe that blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore is more beneficial for muscle aging. Our current work suggests that at least a significant part of Sptlc1-KD benefits might stem from blocking very long chain ceramides. While SPTLC1 and CERS2 revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects, which is also visible in Fig 1e and Fig 1f of PMID: 37118545. In the current study, we show that Cers1 inhibition indeed exacerbates aging defects in myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. The fact that the effect of Cers1 on inhibiting muscle differentiation is dependent on the clearance of Cers2-derived C24-ceramides suggests that reducing very long chain ceramides might be crucial for healthy muscle aging. We added details to the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study reports a novel measurement for the chemotactic response to potassium by Escherichia coli. The authors convincingly demonstrate that these bacteria exhibit an attractant response to potassium and connect this to changes in intracellular pH level. However, some experimental results are incomplete, with additional controls/alternate measurements required to support the conclusions. The work will be of interest to those studying bacterial signalling and response to environmental cues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper shows that E. coli exhibits a chemotactic response to potassium by measuring both the motor response (using a bead assay) and the intracellular signaling response (CheY phosporylation level via FRET) to step changes in potassium concentration. They find increase in potassium concentration induces a considerable attractant response, with an amplitude larger than aspartate, and cells can quickly adapt (but possibly imperfectly). The authors propose that the mechanism for potassium response is through modifying intracellular pH; they find both that potassium modifies pH and other pH modifiers induce similar attractant responses. It is also shown, using Tar- and Tsr-only mutants, that these two chemoreceptors respond to potassium differently. Tsr has a standard attractant response, while Tar has a biphasic response (repellent-like then attractant-like). Finally, the authors use computer simulations to study the swimming response of cells to a periodic potassium signal secreted from a biofilm and find a phase delay that depends on the period of oscillation.

      Strengths:

      The finding that E. coli can sense and adapt to potassium signals and the connection to intracellular pH is quite interesting and this work should stimulate future experimental and theoretical studies regarding the microscopic mechanisms governing this response. The evidence (from both the bead assay and FRET) that potassium induces an attractant response is convincing, as is the proposed mechanism involving modification of intracellular pH.

      Weaknesses:

      The authors show that changes in pH impact fluorescent protein brightness and modify the FRET signal; this measurement explains the apparent imprecise adaptation they measured. However, this effect reduces confidence in the quantitative accuracy of the FRET measurements. For example, part of the potassium response curve (Fig. 4B) can be attributed to chemotactic response and part comes from the pH modifying the FRET signal. Measuring the full potassium response curve of the no-receptor mutants as a control would help quantify the true magnitude of the chemotactic response and the adaptation precision to potassium.

      Response: We thank the reviewer for the suggestion. We have now measured the full potassium response curve for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. We characterized the pH effects on CFP and YFP channels at different concentrations of KCl, and the relationship between the ratio of the signal post- to pre-KCl addition and the KCl concentration was established for both channels, as shown in Fig. S4C. The pH-corrected signal after KCl addition for strains with receptors was obtained by dividing the original signal after KCl addition by this ratio at the specific KCl concentration. This was done for both CFP and YFP channels. The pH-corrected responses for the Tar-only and Tsr-only strains are represented by red dots in Fig. 5BC. The recalculated response curve and adaptation curve for the wild-type strain are shown in Fig. S5. The same correction was applied to Fig. 3 as well. We also re-performed the simulations using the corrected dose-response curve and replotted Fig. 6, though the simulation results did not change much.

      We have now added a subsection “Revised FRET responses by correcting the pH effects on the brightness of eCFP and eYFP” at line 296 in “Results” to describe this.

      The measured response may also be impacted by adaptation. For other strong attractant stimuli, the response typically shows a low plateau before it recovers (adapts). However, in the case of Potassium, the FRET signal does not have an obvious plateau following the stimuli. Do the authors have an explanation for that? One possibility is that the cells may have already partially adapted when the response reaches its minimum, which could indicate a different response and/or adaptation dynamics from that of a regular chemo-attractant? In any case, directly measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels would shed more light on the problem.

      Response: We appreciate the reviewer’s insightful questions. To observe the low plateau before adaptation, a saturating amount of attractant should be added in a stepwise manner. According to the dose-response curve we measured for potassium, a saturating amount of potassium would be close to 100 mM. In fact, there is a small segment of the low plateau in the step response to 30 mM KCl (Fig. 4C or Fig. S5A). To observe more of this low plateau, we could have used a higher concentration of KCl. However, a stimulation higher than 30 mM KCl will induce substantial physiological changes in the cell, resulting in a significant decrease in fluorescence for both channels (Fig. S7). Therefore, the range of KCl concentration that can be reliably applied in FRET measurements is limited.

      The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, demonstrating a faster adaptation than 0.1 mM MeAsp, which induced a similar magnitude of response. Nevertheless, this is still significantly slower than the time required for medium exchange in the flow chamber, which takes less than 10 s to replace 99% of the medium. Thus, the effect on the measured response magnitude due to adaptation should be small (less than 10%).

      We thank the reviewer for the suggestion of measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels. However, these mutants are typically less sensitive than the wild-type, exhibiting higher values of K0.5 (Sourjik & Berg, PNAS 99:123, 2002), and thus require an even higher KCl concentration to see the low plateau. Consistent with this, we attempted to measure the response to potassium in a cheRcheB mutant (HCB1382-pVS88). As shown in Fig. R1 below, there is no response to up to 30 mM KCl, suggesting that the sensitive region of the mutant is beyond 30 mM KCl.

      The relevant text was added at line 413-424.

      Author response image 1.

      The response of the cheRcheB mutant (HCB1382-pVS88) to different concentrations of KCl. The blue solid line denotes the original signal, while the red dots represent the pH-corrected signal. The vertical purple (green) dashed lines indicate the moment of adding (removing) 0.01 mM, 0.1 mM, 0.3 mM, 1 mM, 3 mM, 10 mM and 30 mM KCl, in chronological order.

      There seems to be an inconsistency between the FRET and bead assay measurements, the CW bias shows over-adaptation, while the FRET measurement does not.

      Response: We thank the reviewer for pointing this out. We have now demonstrated that the imprecise adaptation shown in the FRET assay primarily resulted from the pH-induced intensity change of the fluorescent proteins. As shown in Fig. S5A&C, the FRET signal also shows over-adaptation, similar to the bead assay, when we recalculated the response by correcting the CFP and YFP channels.

      Now we clarified it at line 315.

      The small hill coefficient of the potassium response curve and the biphasic response of the Tar-only strain, while both very interesting, require further explanation since these are quite different than responses to more conventional chemoattractants.

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5) and the biphasic response of the Tar-only strain (Fig. 5C). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We suspected that this Hill coefficient of slightly less than 1 resulted from the different responses of Tar and Tsr receptors to potassium.

      The Tar-only strain exhibits a repellent response to stepwise addition of low concentrations of potassium less than 10 mM, and a biphasic response above (Fig. 5C). This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA, which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      Reviewer #2 (Public Review):

      Summary:

      Zhang et al investigated the biophysical mechanism of potassium-mediated chemotactic behavior in E coli. Previously, it was reported by Humphries et al that the potassium waves from oscillating B subtilis biofilm attract P aeruginosa through chemotactic behavior of motile P aeruginosa cells. It was proposed that K+ waves alter PMF of P aeruginosa. However, the mechanism was this behaviour was not elusive. In this study, Zhang et al demonstrated that motile E coli cells accumulate in regions of high potassium levels. They found that this behavior is likely resulting from the chemotaxis signalling pathway, mediated by an elevation of intracellular pH. Overall, a solid body of evidence is provided to support the claims. However, the impacts of pH on the fluorescence proteins need to be better evaluated. In its current form, the evidence is insufficient to say that the fluoresce intensity ratio results from FRET. It may well be an artefact of pH change. Nevertheless, this is an important piece of work. The text is well written, with a good balance of background information to help the reader follow the questions investigated in this research work.

      In my view, the effect of pH on the FRET between CheY-eYFP and CheZ-eCFP is not fully examined. The authors demonstrated in Fig. S3 that CFP intensity itself changes by KCl, likely due to pH. They showed that CFP itself is affected by pH. This result raises a question of whether the FRET data in Fig3-5 could result from the intensity changes of FPs, but not FRET. The measured dynamics may have nothing to do with the interaction between CheY and CheZ. It should be noted that CFP and YFP have different sensitivities to pH. So, the measurement is likely confounded by the change in intracellular pH. Without further experiments to evaluate the effect of pH on CFP and YFP, the data using this FRET pair is inconclusive.

      Response: We thank the reviewer for pointing this out. We have now measured the full potassium response curve for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. We characterized the pH effects on CFP and YFP channels at different concentrations of KCl, and the relationship between the ratio of the signal post- to pre-KCl addition and the KCl concentration was established for both channels, as shown in Fig. S4C. The pH-corrected signal after KCl addition for strains with receptors was obtained by dividing the original signal after KCl addition by this ratio at the specific KCl concentration. This was done for both CFP and YFP channels. The pH-corrected responses for the Tar-only and Tsr-only strains are represented by red dots in Fig. 5BC. The recalculated response curve and adaptation curve for the wild-type strain are shown in Fig. S5. The same correction was applied to Fig. 3 as well. We also re-performed the simulations using the corrected dose-response curve and replotted Fig. 6, though the simulation results did not change much.

      We have now added a subsection “Revised FRET responses by correcting the pH effects on the brightness of eCFP and eYFP” at line 296 in “Results” to describe this.

      The data in Figure 1 is convincing. It would be helpful to include example videos. There is also ambiguity in the method section for this experiment. It states 100mM KCl was flown to the source channel. However, it is not clear if 100 mM KCl was prepared in water or in the potassium-depleted motility buffer. If KCl was prepared with water, there would be a gradient of other chemicals in the buffer, which confound the data.

      Response: We apologize for the ambiguity. The KCl solution used in this work was prepared in the potassium-depleted motility buffer. We have now clarified this at both lines 116 and 497. We now provided an example video, Movie S1, with the relevant text added at line 123.

      The authors show that the FRET data with both KCl and K2SO4, and concluded that the chemotactic response mainly resulted from potassium ions. However, this was only measured by FRET. It would be more convincing if the motility assay in Fig1 is also performed with K2SO4.

      Response: We thank the reviewer for the suggestion. The aim of comparing the responses to KCl and K2SO4 was to determine the role of chloride ions in the response and to prove that the chemotactic response of E. coli to KCl comes primarily from its response to potassium ions. It is more sensitive to compare the responses to KCl and K2SO4 by using the FRET assay. In contrast, the microfluidic motility assay is less sensitive in revealing the difference in the chemotactic responses, making it difficult to determine the potential role of chloride ions.

      Methods:

      • Please clarify the promotes used for the constitutive expression of FliCsticky and LacI.

      Response: The promoters used for the constitutive expression of LacIq and FliCsticky were the Iq promoter and the native promoter of fliC, respectively (ref. 57).

      Now these have been clarified at line 471.

      • Fluorescence filters and imaging conditions (exposure time, light intensity) are missing.

      Response: Thank you for the suggestion. We have now added more descriptions at lines 535-546: The FRET setup was based on a Nikon Ti-E microscope equipped with a 40× 0.60 NA objective. The illumination light was provided by a 130-W mercury lamp, attenuated by a factor of 1024 with neutral density filters, and passed through an excitation bandpass filter (FF02-438/24-25, Semrock) and a dichroic mirror (FF458-Di02-25x36, Semrock). The epifluorescent emission was split into cyan and yellow channels by a second dichroic mirror (FF509-FDi01-25x36, Semrock). The signals in the two channels were then filtered by two emission bandpass filters (FF01-483/32-25 and FF01-542/32-25, Semrock) and collected by two photon-counting photomultipliers (H7421-40, Hamamatsu, Hamamatsu City, Japan), respectively. Signals from the two photomultipliers were recorded at a sampling rate of 1 Hz using a data-acquisition card installed in a computer (USB-1901(G)-1020, ADlink, New Taipei, Taiwan).

      • Please clarify if the temperature was controlled in motility assays.

      Response: All measurements in our work were performed at 23 ℃. It was clarified at line 496.

      • L513. It is not clear how theta was selected. Was theta set to be between 0 and pi? If not, P(theta) can be negative?

      Response: The θ was set to be between 0 and π. This has now been added at line 581.

      • Typo in L442 (and) and L519 (Koff)

      Response: Thank you. Corrected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) From the motor measurements the authors find that the CW bias over-adapts to a level larger than prestimulus, but this is not seen in the FRET measurements. What causes this inconsistency? Fig. 2D seems to rule out any change in CheY binding to the motor.

      Response: We thank the reviewer for pointing this out. We have now demonstrated that the imprecise adaptation shown in the FRET assay primarily resulted from the pH-induced intensity change of the fluorescent proteins. As shown in Fig. S5A&C, the FRET signal also shows over-adaptation, similar to the bead assay, when we recalculated the response by correcting the CFP and YFP channels.

      We now clarified it at line 315.

      (2) It would be useful to compare the response amplitude for potassium (Fig. 3C) to a large concentration of both MeAsp and serine. This is a fairer comparison since your work shows potassium acts on both Tar and Tsr. Alternatively, testing a much larger concentration (~10^6 micromolar) at which MeAsp also binds to Tsr would also be useful.

      Response: We thank the reviewer for pointing this out. We have now recalculated the response to potassium by correcting the pH-induced effects on fluorescence intensity of CFP and YFP. The response to 30 mM KCl was 1.060.10 times as large as that to 100 μM MeAsp. The aim of the comparison between the responses to potassium and MeAsp was to provide an idea of the magnitude of the chemotactic response to potassium. The stimulus of 100 μM MeAsp is already a saturating amount of attractant and induces zero-kinase activity, thus using a higher stimulus (adding serine or a larger concentration of MeAsp) is probably not needed. Moreover, a larger concentration (~10^6 micromolar) of MeAsp would also induce an osmotactic response.

      (3) The fitted Hill coefficient (~0.5) to the FRET response curve is quite small and the authors suggest this indicates negative cooperativity. Do they have a proposed mechanism for negative cooperativity? Have similar coefficients been measured for other responses?

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We suspect that this Hill coefficient of slightly less than 1 results from the differing responses of Tar and Tsr receptors to potassium.

      (3a) The authors state a few times that the response to potassium is "very sensitive", but the low Hill coefficient indicates that the response is not very sensitive (at least compared to aspartate and serine responses).

      Response: We apologize for the confusion. We described the response to potassium as “very sensitive” due to the small value of K0.5. This has now been clarified at line 236.

      (3b) Since the measurements are performed in wild-type cells the response amplitude following the addition of potassium may be biased if the cell has already partially adapted. This seems to be the case since the FRET time series does not plateau after the addition of the stimulus. The accuracy of the response curve and hill coefficient would be more convincing if the experiment was repeated with a cheR cheB deficient mutant.

      Response: We thank the reviewer for raising these questions. To observe the low plateau before adaptation, a saturating amount of attractant should be added in a stepwise manner. According to the dose-response curve we measured for potassium, a saturating amount of potassium would be close to 100 mM. In fact, there is a small segment of the low plateau in the step response to 30 mM KCl (Fig. 4C or Fig. S5A). To observe more of this low plateau, we could have used a higher concentration of KCl. However, a stimulation higher than 30 mM KCl will induce substantial physiological changes in the cell, resulting in a significant decrease in fluorescence for both channels (Fig. S7). Therefore, the range of KCl concentration that can be reliably applied in FRET measurements is limited.

      The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, demonstrating a faster adaptation than 0.1 mM MeAsp, which induced a similar magnitude of response. Nevertheless, this is still significantly slower than the time required for medium exchange in the flow chamber, which takes less than 10 s to replace 99% of the medium. Thus, the effect on the measured response magnitude due to adaptation should be small (less than 10%).

      We thank the reviewer for the suggestion of measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels. However, these mutants are typically less sensitive than the wild-type, exhibiting higher values of K0.5 (ref. 46), and thus require an even higher KCl concentration to see the low plateau. Consistent with this, we attempted to measure the response to potassium in a cheRcheB mutant (HCB1382-pVS88). As shown in Fig. R1, there is no response to up to 30 mM KCl, suggesting that the sensitive region of the mutant is beyond 30 mM KCl.

      The relevant text was added at line 413-424.

      (4) The authors show that the measured imprecise adaptation can be (at least partially) attributed to pH impacting the FRET signal by changing eCFP and eYFP brightness.

      (4a) Comparing Fig. 5C and D, the chemosensing and pH response time scales look similar. Therefore, does the pH effect bias the measured response amplitude (just as it biases the adapted FRET level)?

      Response: We agree with the reviewer that the pH effect on CFP and YFP biases the measured response amplitude. We have now performed the measurement of dose-response curve to potassium for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. The pH effects on CFP and YFP were corrected. The dose-response curve and adaptation curve were recalculated and plotted in Fig. S5.

      (4b) It would help to measure a full response curve (at many concentrations) for the no-receptor strain as a control. This would help distinguish, as a function of concentration, how much response can be attributed to pH impacting the FRET signal versus the true chemotactic response.

      Response: We thank the reviewer for the suggestion. We have now performed the measurements for the no-receptor strain. The impact of pH on CFP and YFP has been corrected. The pH-corrected results, previously in Fig.3-5, are now presented in Fig. 3, Fig. S5 and Fig. 5, respectively.

      (5) The biphasic response of Tar is strange and warrants further discussion. Do the authors have any proposed mechanisms that lead to this behavior? For the 10mM and 30mM KCl measurements there is a repellent response followed by an attractant response for both adding and removing the stimuli, why is this?

      Response: We thank the reviewer for pointing this out. The Tar-only strain exhibits a repellent response to stepwise addition of low concentrations of potassium less than 10 mM, and a biphasic response above (Fig. 5C). This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA, which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      (5a) The fact that Tar and Tsr are both attractant (after the initial repellant response in Tar) appears to be inconsistent with previous work on pH response (Ref 52, Yang and Sourjik Molecular Microbiology (2012) 86(6), 1482-1489). This study also didn't see any biphasic response.

      Response: We thank the reviewer for pointing this out. The Tar-only strain shows a repellent response to stepwise addition of low concentrations of potassium, specifically less than 10 mM. This is consistent with previous observations of the response of Tar to changes in intracellular pH (refs. 44,45) and also with the work of Yang and Sourjik (new ref. 53), although the work in ref. 53 dealt with the response to external pH change, and bacteria were known to maintain a relatively stable intracellular pH when external pH changes (Chen & Berg, Biophysical Journal (2000) 78:2280-2284). Interestingly, the Tar-only strain exhibits a biphasic response to high potassium concentrations of 10 mM and above. This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA (ref. 56), which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      (5b) The response of Tar to the removal of sodium benzoate (Fig. S2) seems to be triphasic, is there any explanation for this?

      Response: We thank the reviewer for pointing this out. We have now acknowledged in the legend of Fig. S2 that this response is interesting and warrants further exploration: “The response to the removal of sodium benzoate seems to be a superposition of an attractant and a repellent response, the reason for which deserves to be further explored.”

      (6) Fitting the MWC model leads to N=0.35<1. It is fine to use this as a phenomenological parameter, but can the authors comment on what might be causing such a small effective cluster size for potassium response?

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We now refit the MWC model to the pH-corrected dose-response curve, obtaining N of 0.85. We think the small N is due partly to the fact that we are fitting the curve with four parameters: N, Kon, Koff, and fm, while only three features of the sigmoid does-response curve are relevant (the vertical scale, the midpoint concentration, and the slope of the sigmoid). Future experiments may determine these parameters more accurately, but they should not significantly affect the simulation results as long as the wild-type dose-response curve is accurate.

      (7) The results of the modeling are closely related to Zhu et. al. Phys. Rev. Lett. 108, 128101. Is the lag time for large T related to the adaptation time?

      Response: We thank the reviewer for pointing this out. We used a similar framework of modeling as Zhu et. al. The potassium response was also analogous to the chemotactic response to MeAsp. Thus, the results are closely related to Zhu et al. We have now cited Zhu et al. (Ref. 52) and noted this at line 366.

      The lag time for large T is related to the adaptation time. We have now simulated the chemotaxis to potassium for large T with different adaptation time by varying the methylation rate kR. The results are shown in Fig. S8. The simulated lag time decreases with the methylation rate kR, but levels off at high values of kR. Now this has been added at line 603.

      Minor issues:

      • Fig. 1C: should the axis label be y?

      Response: Yes, thank you. Now corrected.

      • Line 519: Koff given twice, the second should be Kon.

      Response: Thank you. Corrected.

      • When fitting the MWC model (Eq. 3 and Fig. 6B) did you fix a particular value for m?

      Response: m was treated as a fitting parameter, grouped in the parameter fm.

      Reviewer #2 (Recommendations For The Authors):

      Minor points: - I suggest explaining the acronyms when they first appear in the text (eg CMC, CW, CCW).

      Response: Thank you. Now they have been added.

      • L144. L242. "decrease" is ambiguous since membrane potential is negative. I understand the authors meant less negative (which is an increase). I suggest to avoid this expression.

      Response: Thank you for the suggestion. Now they have been replaced by “The absolute value of the transmembrane electrical potential will decrease”.

      • For Fig 1b - it says the shaded area is SEM in the text, but SD in the legend. Please clarify.

      Response: Thank you. The annotation in the legend has now been revised as SEM.

      • Fig 1C label of x axis should be "y" instead of "x" to be consistent with Fig 1A.

      Response: Thank you. It has now been revised.

      • In Figure 2, the number of independent experiments as well as the number of samples should be included.

      Response: Thank you. The response in Fig. 2C is the average of 83 motors from 5 samples for wild-type strain (JY26-pKAF131). The response in Fig. 2D is the average of 22 motors from 4 samples for the chemotaxis-defective strain (HCB901-pBES38). They have now been added to the legend.

      • Regarding the attractant or repelling action of potassium and sucrose, it would be important to have a move showing the cells' behaviours.

      Response: We thank the reviewer for the suggestion. We have now provided Movie S1 to show the cells’ behavior to potassium. As shown in Fig. 3B, the chemotactic response to 60 mM sucrose is very small compared to the response to 30 mM KCl. This implies that a noticeable response to sucrose necessitates higher concentrations of stimulation. However, Jerko et al. [Rosko, J., Martinez, V. A., Poon, W. C. K. & Pilizota, T. Proc. Natl Acad. Sci. USA 114, E7969-E7976 (2017).] have shown that high concentrations of sucrose lead to a significant reduction in the speed of the flagella motor. Thus, in a motility assay for sucrose, the osmolarity-induced motility effect may overwhelm the minor repellent-like response.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The weaknesses are the brevity of the simulations, the concomitant lack of scope of the simulations, the lack of depth in the analysis, and the incomplete relation to other relevant work.

      A 1 µs simulation of CCh (Video 1, part 2) shows that m3 (ACHA) is stable, throughout. The DG comparisons, in silico versus in vitro, indicate that 200 ns simulations are sufficient to identify LA versus HA conformational populations. Figure 6-table supplement 1 shows distances. New citations have been added.

      Reviewer #2 (Public Review):

      Weaknesses:

      After carrying out all-atom molecular dynamics, the authors revert to a model of binding using continuum Poisson-Boltzmann, surface area, and vibrational entropy. The motivations for and limitations associated with this approximate model for the thermodynamics of binding, rather than using modern atomistic MD free energy methods (that would fully incorporate configurational sampling of the protein, ligand, and solvent) could be provided. Despite this, the authors report a correlation between their free energy estimates and those inferred from the experiment. This did, however, reveal shortcomings for two of the agonists. The authors mention their trouble getting correlation to experiment for Ebt and Ebx and refer to up to 130% errors in free energy. But this is far worse than a simple proportional error, because -24 Vs -10 kcal/mol is a massive overestimation of free energy, as would be evident if the authors were to instead express results in terms of KD values (which would have an error exceeding a billion fold). The MD analysis could be improved with better measures of convergence, as well as a more careful discussion of free energy maps as a function of identified principal components, as described below. Overall, however, the study has provided useful observations and interpretations of agonist binding that will help understand pentameric ligand-gated ion channel activation.

      The objective of the calculations was to identify structural populations, not to estimate binding free energies. We knew the actual LA and HA energies (for all 4 agonists) from real-world electrophysiology experiments. We conclude that the simple PBSA method worked as a tool for identification because the calculated efficiencies match those from experiments (Figure 4B, Figure 4-Source Data 1). We discuss the mismatches in absolute G in the Results and Discussion. Methods for estimating experimental binding free energies are described in a cited, eLife companion paper. The G ratio relates to agonist efficiency.

      Main points:

      Regarding the choice of model, some further justification of the reduced 2 subunit ECD-only model could be given. On page 5 the authors argue that, because binding free energies are independent of energy changes outside the binding pocket, they could remove the TMD and study only an ECD subunit dimer. While the assumption of distant interactions being small seems somewhat reasonable, provided conformational changes are limited and localised, how do we know the packing of TMD onto the ECD does not alter the ability of the alpha-delta interface to rearrange during weak or strong binding? They further write that "fluctuations observed at the base of the ECD were anticipated because the TMD that offers stability here was absent.". As the TMD-ECD interface is the "gating interface" that is reshaped by agonist binding, surely the TMD-ECD interface structure must affect binding. It seems a little dangerous to completely separate the agonist binding and gating infrastructure, based on some assumption of independence. Given the model was only the alpha and delta subunits and not the pentamer with TMD, I am surprised such a model was stable without some heavy restraints. The authors state that "as a further control we carried out MD simulation of a pentamer docked with ACh and found similar structural changes at the binding pocket compared to the dimer." Is this sufficient proof of the accuracy of the simplified model? How similar was the model itself with and without agonist in terms of overall RMSD and RMSD for the subunit interface and the agonist binding site, as well as the free energy of binding to each model to compare?

      The statement that distant interactions are small is not an "assumption", but rather a conclusion based on data. Mutant cycle analysis of 83 pairs shows (with a few exceptions) non-additivity of free energy change prevails only with separations <~15 A (Fig.3 in Gupta et al 2017). Regardless, the adequacy of dimers and convergence by 200 ns are supported by the calculated and experimental agonist efficiencies match (Figure 4B) and the 1 ms simulation (Video 1 part 2). Apo 200ns simulation of the ECD dimer is now added (Figure 2-figure supplement 2) and the dimer interface seems to be adequate (stable).

      Although the authors repeatedly state that they have good convergence with their MD, I believe the analysis could be improved to convince us. On page 8 the authors write that the RMSD of the system converged in under 200 ns of MD. However, I note that the graph is of the entire ECD dimer, not a measure for the local binding site region. An additional RMSD of local binding site would be much more telling. You could have a structural isomerisation in the site and not even notice it in the existing graph. On page 9 the authors write that the RMSF in Figure S2 showed instability mainly in loops C and F around the pocket. Given this flexibility at the alpha-delta interface, this is why collecting those regions into one group for the calculation of RMSD convergence analysis would have been useful. They then state "the final MD configuration (with CCh) was well-aligned with the CCh-bound cryo-EM desensitized structure (7QL6)... further demonstrating that the simulation had converged." That may suggest a change occurred that is in common with the global minimum seen in cryo EM, which is good, but does not prove the MD has "converged". I would also rename Figure S3 accordingly.

      The description is now changed to “aligns well” with desensitized structure (7QL6.PDB)”. RMSD of not just the binding pocket but the whole ECD dimer is well aligned with first apo (m1) and with desensitized state (m3).

      The authors draw conclusions about the dominant states and pathways from their PCA component free energy projections that need clarification. It is important first to show data to demonstrate that the two PCA components chosen were dominant and accounted for most of the variance. Then when mapping free energy as a function of those two PCA components, to prove that those maps have sufficient convergence to be able to interpret them. Moreover, if the free energies themselves cannot be used to measure state stability (as seems to be the case), that the limitations are carefully explained. First, was PCA done on all MD trajectories combined to find a common PC1 & PC2, or were they done separately on each simulation? If so, how similar are they? The authors write "the first two principal components (PC-1 and PC-2) that capture the most pronounced C. displacements". How much of the total variance did these two components capture? The authors write the changes mostly concern loop C and loop F, but which data proves this? e.g. A plot of PC1 and PC2 over residue number might help.

      The PCA analyses have been enriched. Figure 3-Source Data 1. shows the dominance of PC1 and PC2. Because the binding energy match was sufficient to identify affinity states, we did not explore additional PCs. Residue-wise PC1 and PC2 analysis and comparison with RMSF are in Figure 2-figure supplement 2. PC1 and PC2 both correlate with fluctuations in loops C and F. Overlap analysis in different runs is shown in Figure 3-figure supplement 1. Lower variance in a particular region of the PCA landscape indicates that the system frequently visits these states, suggesting stability (a preference for these conformations).

      The authors map the -kTln rho as a free energy for each simulation as a function of PC1 & PC2. It is important to reveal how well that PC1-2 space was sampled, and how those maps converged over time. The shapes of the maps and the relative depths of the wells look very different for each agonist. If the maps were sampled well and converged, the free energies themselves would tell us the stabilities of each state. Instead, the authors do not even mention this and instead talk about "variance" being the indicator of stability, stating that m3 is most stable in all cases. While I can believe 200ns could not converge a PC1-2 map and that meaningful delta G values might not be obtained from them, the issue of lack of sampling must be dealt with. On page 12 they write "Although the bottom of the well for 3 energy minima from PCA represent the most stable overall conformation of the protein, they do not convey direct information regarding agonist stability or orientation". The reasons why not must be explained; as they should do just that if the two order parameters PC1 and PC2 captured the slowest degrees of freedom for binding and sampling was sufficient. The authors write that "For all agonists and trajectories, m3 had the least variance (was most stable), again supporting convergence by 200 ns." Again the issue of actual free energy values in the maps needs to be dealt with. The probabilities expressed as -kTln rho in kcal/mol might suggest that m2 is the most stable. Instead, the authors base stability only on variance (I guess breadth of the well?), where m3 may be more localised in the chosen PC space, despite apparently having less preference during the MD (not the lowest free energy in the maps).

      The motivations and justifications for the use of approximate PBSA energetics instead of atomistic MD free energies should be dealt with in the manuscript, with limitations more clearly discussed. Rather than using modern all-atom MD free energy methods for relative or absolute binding free energies, the author selects clusters from their identified states and does Poisson-Boltzmann estimates (electrostatic, vdW, surface area, vibrational entropy). I do believe the following sentence does not begin to deal with the limitations of that method: "there are limitations with regard to MM-PBSA accurately predicting absolute binding free energies (Genheden & Ryde, 2015; Hou et al., 2011) that depends on the parameterization of the ligand (Oostenbrink et al., 2004)." What are the assumptions and limitations in taking continuum electrostatics (presumably with parameters for dielectric constants and their assignments to regions after discarding solvent), surface area (with its assumptions and limitations), and of course assuming vibration of a normal mode can capture entropy. On page 30, regarding their vibrational entropy estimate, they write that the "entropy term provides insights into the disorder within the system, as well as how this disorder changes during the binding process". It is important that the extent of disorder captured by the vibrational estimate be discussed, as it is not obvious that it has captured entropy involving multiple minima on the system's true 3N-dimensional energy surface, and especially the contribution from solvent disorder in bound Vs dissociated states.

      As discussed above, errors in the free energy estimates need to be more faithfully represented, as fractional errors are not meaningful. On page 21 the authors write "The match improved when free energy ratios rather than absolute values were compared." But a ratio of free energies is not a typical or expected measure of error in delta G. They also write "For ACh and CCh, there is good agreement between.Gm1 and GLA and between.Gm3 and GHA. For these agonists, in silico values overestimated experimental ones only by ~8% and ~25%. The agreement was not as good for the other 2 agonists, as calculated values overestimated experimental ones by ~45%(Ebt) and ~130% (Ebt). However, the fractional overestimation was approximately the same for GLA and GHA." See the above comment on how this may misrepresent the error. On page 21 they write, in relation to their large fractional errors, that they "do not know the origin of this factor but speculate that it could be caused by errors in ligand parameterization". However the estimates from the PBSA approach are, by design, only approximate. Both errors in parameterisation (and their likely origin) and the approximate model used, need discussion.

      Again, the goal of calculating binding free energy was to identify structural correspondence to LA and HA and not to obtain absolute binding free energy values. Along with the least variance (distribution) for the principle component for m3, it also had the highest binding free energy. An association of m1 to LA and m3 to HA was done after comparing them to experimental values (efficiencies). This comparison not only validates our approach but also underscores the utility of PBSA in supplementing MD and PCA analyses with broader energetics perspectives.

      Reviewer #3 (Public Review):

      Weaknesses:

      Although the match in simulated vs experimental energies for two ligands was very good, the calculated energies for two other ligands were significantly different than the experiment. It is unclear to what extent the choice of method for the energy calculations influenced the results. See above.

      A control simulation, such as for an apo site, is lacking. Figure 2-figure supplement 2. shows the results of 200 ns MD simulations of the apo structure (n=2).

      Reviewer #4 (Public Review):

      Weaknesses:

      Timescales (200 ns) do not capture global rearrangements of the extracellular domain, let alone gating transitions of the channel pore, though this work may provide a launching point for more extended simulations. A more general concern is the reproducibility of the simulations, and how representative states are defined. It is not clear whether replicates were included in principal component analysis or subsequent binding energy calculations, nor how simulation intervals were associated with specific states.

      We are interested eventually in using MD to study the full isomerization, but these investigations are for the future and likely will involve full length pentamers and longer timescales. However, in response to this query we have in the Discussion raised this issue and offer speculations. See above, PCA has be compared between replicates (Figure 3-figure supplement 1).

      Structural analysis largely focuses on snapshots, with limited direct evidence of consistency across replicates or clusters. Figure legends and tables could be clarified.

      Snapshots and distance measurements (Figure 6-table supplement 1) were extracted from m1, m2 and m3 plateau regions of trajectories. Incorporated in the legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study gives interesting insights into the possible dynamics of ligand binding in ACh receptors and establishes some prerequisites for necessary and urgent further work. The broad interest in this receptor class means this work will have some reach.

      Suggestions:

      (1) I found the citation of relevant literature to be rather limited. In the following paper, the agonist glutamate was shown to bind in two different orientations, and also to convert. These are much longer simulations than what is presented here (nearly 50 µs), which allowed a richer view of conformational changes and ligand binding dynamics in the AMPA Receptor. Albert Lau has published similar work on NMDA, delta, and kainate receptors, including some of it in eLife. Perhaps the authors could draw some helpful comparisons with this work.

      Yu A et al. (2018) Neurotransmitter Funneling Optimizes Glutamate Receptor Kinetics. Neuron

      Likewise, the comparison to a similar piece of work on glycine receptors (not cited, https://pubs.acs.org/doi/10.1021/bi500815f) could be instructive. Several similar computational techniques were used, and interactions observed (in the simulations) between the agonist and the receptor were tested in the context of wet experiments. In the absence of an equivalent process in this paper (no findings were tested using an orthogonal approach, only compared against known results, from perhaps a narrow spectrum of papers), we have to view the major findings of the paper (docking in cis that leads to a ligand somersault) with some hesitancy.

      The Gharpure 2019 paper is cited in the context of the delta subunit but this paper was about a3b4 neuronal nicotinic receptors. This could be tidied up. Also, the simulations from that paper could be used as an index of the stability of the HA state (if ligand orientation is being cited as transferrable, other observations could be too).

      New citations have been added. It is difficult to generalize from Yu A and Yu R eta al, because in neither study was the ligand orientation associated with LA versus HA binding energy.

      (2) "To start, we associated the agonist orientation in the hold end states as cis in AC-LA versus trans in AC-HA."

      I think this a valid start, but one is left with the feeling that this is all we have and the validity of the starting state is not tested. What was really shown here? Is the docking reliable? What evidence can the authors summon for the ligand orientation that they use as a starting structure? In addition to docking energies, the match between PBSA and electrophysiology Gs and temporal sequence (m1-m2-m3) support the assignment.

      Given that these simulations cover a circumscribed part of the binding process, I think the limitations should be acknowledged. Indeed the authors do mention a number of remaining open questions.

      Paragraphs regarding 'catch' have been added to the Discussion.

      (3) Results around line 90. Hypothetical structures and states that were determined from Markov analyses are discussed as if they are well understood and identified. Plausible though these are, I think the text should underline at least the source of such information. In these simulations, a further intermediate has been identified.

      The model in Figure 1B was first published in 2012 and has been used and extended over the intervening years. In our lab, catch-and-hold is standard. We have published many papers (in top journals), plus reviews, regarding this scheme. We made presentations that are on Youtube. Here, at the end of the Introduction we now cite a new review article (Biophysical Journal, 2024). I am not sure what more we can do to raise awareness regarding catch and hold.

      (4) The figures are dense and could be better organised. Figure 2 is key but has a muddled organization. The placement of the panel label (C) makes it look like the top row (0 ns) is part of (A). Panel B- what is shown in the oval inset (not labeled or in legend). Why not show more than one view, perhaps a sequence of time points? It is confusing to change the colour of the loops in (C). Please show the individual values in D.

      Figure 2 has been redone.

      (5) A lot is made of the aK145 salt bridge with aD200 and the distances - but I didn't see any measurements, or time course. This part is vague to the point of having no meaning ("bridge tightening").

      We present a Table of distance measurements in the SI (Figure 6-table supplement 1).

      Reviewer #2 (Recommendations For The Authors):

      All main comments have been given in the above review. There are a few other minor comments below.

      The 4 agonists examined were acetylcholine (ACh), carbamylcholine (CCh), epibatidine (Ebt), and epiboxidine (Ebx). Could the choices be motivated for the reader?

      New in Methods: the agonists are about the same size yet represent different efficiency classes (citation to companion eLife paper). One of our (unmet) objectives was to understand the structural correlates of agonist efficiency.

      The authors write that state structures generated in the MD simulation were identified by aligning free energy values with those from experiments. It would be good to explain to the reader, in the introduction, how LA and HA free energies were extracted from experiments, rather than relying on them to read older papers.

      In the Introduction, we say that to get G, just measure an equilibrium constant and take the log. We think it is excessive to explain in detail in this paper how to measure the equilibrium binding constants (several methods suffice). However, we have added in Methods our basic approach: measure KLA and L2 by using electrophysiology, and compute KHA from the thermodynamic cycle using L0. We think this paper is best understood in the context of its companion, also in eLife.

      In all equilibrium equations of the type A to B (e.g. on page 5), rather than using "=" signs it would be much better to use equilibrium reversible arrow symbols.

      It is incorporated.

      Reviewer #3 (Recommendations For The Authors):

      (1) Although the match in simulated vs experimental energies for two ligands was very good, the calculated energies for Ebt and Ebx were significantly different than the experiment. Are there any alternative methods for calculating binding energies from the MD simulations that could be readily compared to?

      See above. We did not use more sophisticated energy calculations because we already knew the answers. Our objective was to identify states, not to calculate energies.

      (2) It would be nice to see control simulations of an apo site to ensure that the conformational changes during the MD are due to the ligands and not an artifact of the way the system is set up. I am primarily asking about this as the simulation of the isolated ECDs for the binding site interface seems like it may be unhappy without the neighboring domains that would normally surround it. On that note, was the protein constrained in any way during the MD?

      Apo simulation results are presented in Figure 2-figure supplement 2. The dimer interface seems to be adequate (stable).

      (3) Figure 4A-B: Should the colors for m1 and m3 be reversed?

      Colors have been changed and a bar chart has been added.

      Reviewer #4 (Recommendations For The Authors):

      (1) Although simulations are commendably run in triplicate, it is difficult in some places to discern their consistency.

      (1a) Table S1 provides important quantification of deviations in different replicates and with different agonists. Please confirm that the reported values are accurate. All values reported for the epibatidine system are identical to those reported for carbamylcholine, which seems statistically improbable. Similarly, runs 1 and 3 with epiboxidine seem identical to one another, and runs 1 and 2 with acetylcholine are nearly the same.

      Figure 2-Source Data 1 has been corrected.

      (1b) In reference to Figure S3, the authors comment that the simulated system (one replicate with carbamylcholine) converges within 0.5 Å RMSD of a desensitized experimental structure. This seems amazing; please specify over what atoms this deviation was calculated and with reference to what alignment. It would be interesting to know the reproducibility of this remarkable convergence in additional replicates or with other ligands; for example, Figure 5 indicates that loop C transitions to a lesser extent in the context of epibatidine than other agonists.

      The comparison was for the entire dimer ECD; 0.5 Å is the result. It may be worthwhile to pursue this remarkable convergence, but not in this paper. Here, we are concerned with identifying ACLA and ACHA. Similarity between ACHA and AD structures is for a different study.

      (1c) For principal-component and subsequent analyses, it appears that only one trajectory was considered for each system. Please clarify whether this is the case; if so, a rationale for the selection would be helpful, and some indication of how reproducible other replicates are expected to be.

      We have added new PCA results (Results, Figure 3-figure supplement 1) that show comparable principal components in other replicates.

      (2) Figure 3 shows free energy landscapes defined by principal components of fluctuation in Cα positions.

      (2a) Do experimental structures (e.g. PDB IDs 6UWZ, 7QL6u) project onto any of these landscapes in informative ways?

      6UWZ.pdb matches well with the apo (7QKO.pdb), comparable to m1, and 7QL6.pdb with the m3.

      (2b) Please indicate the meaning of colored regions in the righthand panels.

      The color panels in the top left panel indicate the colored regions in the righthand panel also, which is indicative of direction and magnitude of changes with PC1 and PC2.

      (2c) Please also check the legend; do the porcupine plots really "indicate the direction and magnitude of changes between PC1 and PC2," or rather between negative and positive values of each principal component?

      It indicates the direction and magnitude of changes with PC1 and PC2.

      (3) It would be helpful to clarify how trajectory segments were assigned to specific minima, particularly m2 and m3.

      (3a) Please verify the timeframes associated with the m2 minima, reported as "20-50 ns [with acetylcholine], 50-60 ns [with carbamylcholine], 60-100 ns [with epibatidine, and] 100-120 ns [with epiboxidine]." It seems improbable that these intervals would interleave so precisely in independent systems. Furthermore, the intervals associated with acetylcholine and epiboxidine do not appear to correspond to the m2 regions indicated in Figure S8.

      Times are given in Figure 4-Source Data 1 and Figure 3-figure supplement 2. The m2 classification is based on loop displacement as well as agonist orientation. For all agonists, the selection was strictly from PCA and cluster analysis.

      (3b) The text (and legend to Figure 3) indicate that 180+ ns of each trajectory was assigned to m3, which seems surprisingly consistent. However, Figure S5 indicates this minimum is more variable, appearing at 160 ns with acetylcholine but at 186 ns with carbamylcholine. Please clarify.

      see above: the selection was from PCA and cluster analysis. Times are in Figure 3-figure supplement 2 and also in Figure 4-Source Data 1 (none in Fig. 3 legend).

      (3c) Figures 5, 6, S6, and S7 illustrate structural features of free-energy minima in each ligand system. Please clarify what is shown, e.g. a representative snapshot, centroid, or average structure from a particular prominent cluster associated with a given minimum.

      They are all representative snapshots (now in Methods). Snapshots and distance measurements (Figure 6-table supplement 1) were extracted from m1, m2 and m3 plateau regions of trajectories.

      (4) Figure S4 helpfully shows the behavior of a pentameric control system; however, some elements are unclear.

      (4a) The 2.5-6.5 Å jump in RMSD at ~40 ns seems abrupt; can it be clarified whether this corresponds to a transition to either m2 or m3 poses, or to another feature of e.g. alignment?

      Figure 2-figure supplement 4 left bottom is just the ligand. The jump is the flip, m1 to m2.

      (4b) It seems difficult to reconcile the apparently bimodal distribution of states with the proposed 3-state model. Into which RMSD peak would the m2 intermediate fall?

      The simulations are only to 100 ns, where we found a complete flip of the agonist represented in the histograms. This confirmed that dimer showed similar pattern as the pentamer. In depth analysis was only done only on dimers.

      (4c) The top panel is labeled "Com" with a graphical legend indicating "ACh." Does this indicate the ligand or, as described in the text legend, "the pentamer" (i.e. the receptor)? For both panels, please verify whether they are calculated on the basis of center-of-mass, heavy atoms, Cα, etc.

      "Com" (for complex) has been changed to system (protein+ligand).

      (5) Minor concerns:

      (5a) In Figures 1 and S3, correct the PDB references (6UWX and 7QL7 are not nAChRs).

      They are now corrected.

      (5b) In Figure 4, do all panels represent mean {plus minus} standard deviation calculated across all cluster-frames reported in Table 1?

      Yes.

      Also check the graphical legend in panel A: presumably the red bars correspond to m1/LA, and the blue to m3/HA?

      Corrected

      (5c) In the legend to Figure S1, please clarify that panel B is reproduced from Indurthi & Auerbach 2023.

      This figure has been deleted.

      (5d) As indicated in Figure S2, it seems surprising that the RMSF is so apparently low at the periphery, where the subunits should contact neighbors in the extracellular domain; how might the authors account for this? Specify whether these results apply to all replicates of each system.

      The redness in the periphery for all four systems indicates the magnitude of fluctuation. As we focus on the orthosteric site, we highlight the loops around the agonist binding pocket and kept other regions 75% transparent. We now include Apo simulations and the dimer appears to be stable even without an agonist present.

      (5e) Within each minimum in Figure S5, three "prominent" clusters appear to be colored (by heteroatom) with carbons in cyan, pink, and yellow respectively. If this is correct, note these colors in the text legend.

      Colors have been added to the legend.

      (5f) In Figure S6, note in the legend that key receptor sidechains are shown as spheres, with the ligand as balls-and-sticks, and that ligand conformations in both low- and high-affinity complexes are shown in both receptor states for comparison.

      This is now added in the legend.

      (5g) The legend to Figure S6 also notes "The agonists are as in Fig S4," but that figure contains a single replicate of a different system; please check this reference.

      This has been updated to Figure 5.

      (5h) In Figure S8, the colors in the epibatidine system appear different from the others.

      The colors are the same for m1, m2 and m3 in all systems including epibatidine.

      (5i) In Table 1, does "n clusters" indicate the number of simulation frames included in the three prominent clusters chosen for MM-PBSA analysis? Perhaps "n frames" would be more clear.

      It was a good suggestion. It has now been changed to ‘n frames’

      (5j) Pg 24-ln 453 presumably should read "...that separate it from m1 and m3..."

      This sentence is now changed in the discussion.

    1. Thus the worth of any object to be acquired by our action is always conditional. Beings the exis­tence of which rests not on our will but on nature, if they are beings with­out reason, still have only a relative worth, as means, and are therefore called things, whereas rational beings are called persons because their nature already marks them out as an end in itself, that is, as something that may not be used merely as a means, and hence so far limits all choice (and is an

      I think he is saying that we as humans are not be to used as means to end. To do so would be ethically wrong.

  6. small-tech.org small-tech.org
    1. Personal Small Technology are everyday tools for everyday people. They are not tools for startups or enterprises. Easy to use Personal technology are everyday things that people use to improve the quality of their lives. As such, in addition to being functional, secure, and reliable, they must be convenient, easy to use, and inclusive. If possible, we should aim to make them delightful. Related aspects: inclusive Non-colonial Small technology is made by humans for humans. They are not built by designers and developers for users. They are not built by Western companies for people in African countries. If our tools specifically target a certain demographic, we must ensure that our development teams reflect that demographic. If not, we must ensure people from a different demographic can take what we make and specialise it for their needs. Related aspects: share alike, non-commercial, interoperable Private by default A tool respects your privacy only if it is private by default. Privacy is not an option. You do not opt into it. Privacy is the right to choose what you keep to yourself and what you share with others. “Private” (i.e., for you alone) is the default state of small technologies. From there, you can always choose who else you want to share things with. Related aspects: zero knowledge, peer to peer Zero knowledge Zero-knowledge tools have no knowledge of your data. They may store your data, but the people who make or host the tools cannot access your data if they wanted to. Examples of zero-knowledge designs are end-to-end encrypted systems where only you hold the secret key, and peer-to-peer systems where the data never touches the devices of the app maker or service provider (including combinations of end-to-end encrypted and peer-to-peer systems). Related aspects: private by default, peer to peer Peer to peer Peer-to-peer systems enable people to connect directly with one and another without a person (or more likely a corporation or a government) in the middle. They are the opposite of client/server systems, which are centralised (the servers are the centres). On peer to peer systems, your data – and the algorithms used to analyze and make use of your data – stay in spaces that you own and control. You do not have to beg some corporation to not abuse your data because they don’t have it to begin with. Related aspects: zero knowledge, private by default Share alike Most people’s eyes cloud over when technology licenses are mentioned but they’re crucial to protecting your freedom. Small Technology is licensed under Copyleft licenses. Copyleft licenses stipulate that if you benefit from technology that has been put into the commons, you must share back (“share alike”) any improvements, changes, or additions you make. If you think about it, it’s only fair: if you take from the commons, you should give back to the commons. That’s how we cultivate a healthy commons. Related aspects: interoperable, non-colonial, non-commercial Interoperable Interoperable systems can talk to one another using well-established protocols. They’re the opposite of silos. Interoperability ensures that different groups can take a technology and evolve it in ways that fit their needs while still staying compatible with other tools that implement the same protocols. Interoperability, coupled with share alike licensing, helps us to distribute power more equally as rich corporations cannot “embrace and extend” commons technology, thereby creating new silos. Interoperability also means we don’t have to resort to colonialism in design: we can design for ourselves and support other groups who design for themselves while allowing all of us to communicate with each other within the same global network. Related aspects: share alike, non-colonial Non-commercial The primary purpose for Small Technology is not to make a profit but to increase human welfare. As such, they are built by not-for-profit organisations. Eventually, we hope that small technologies will be recognised for their contribution to the common good and therefore supported from the commons (e.g., from our taxes). In the interim, some methods for monetising Small Technology include: Charging for hosting and maintenance services Sales on App Stores (for native apps) Donations and patronage Grants and awards Equity-based / Venture Capital investment is incompatible with Small Technology as the success criterion is the sale of the organisation (either to a larger organisation or to the public at large via an IPO). Small Technology is not about startups (temporary companies designed to either fail fast or grow exponentially and get sold), it’s about stayups (sustainable organisations that contribute to the common good). Related aspects: non-colonial, share alike, interoperable Inclusive Being inclusive in technology is ensuring people have equal rights and access to the tools we build and the communities who build them, with a particular focus on including people from traditionally marginalised groups. Accessibility is the degree to which technology is usable by as many people as possible, especially disabled people Small Technology is inclusive and accessible. With inclusive design, we must be careful not to assume we know what’s best for others, despite us having differing needs. Doing so often results in colonial design, creating patronising and incorrect solutions.

      Small Technology Small Technology are everyday tools for everyday people designed to increase human welfare, not corporate profits.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary Maintenance of the histone H3 variant CENP-A at centromeres is necessary for proper kinetochore assembly and correct chromosome segregation. The Mis18 complex recruits the CENP-A chaperone HJURP to centromeres to facilitate CENP-A replenishment. Here the authors characterise the Mis18 complex using hybrid structural biology, and determine complex interface separation-of-function mutants.

      Major Comments The SAXS and EM data on the full-length Mis18 components must be included in the main Figures, either as an additional figure or by merging/rearranging the existing figures. The authors discuss these results in three whole paragraphs, which are a very important part of the paper.

      We thank the reviewer for this constructive suggestion. We have now included an additional figure (new Fig. 2, attached below), that highlights the fit of the integrative model against the SAXS and EM data.

      Could the authors also compare the theoretical SAXS scattering curves generated by their final model(s) with the experimental SAXS curves? This would provide some additional evidence for the overall shape of their complex model beyond the consistency with the Dmax/Rg.

      We acknowledge the importance of this suggestion. We have now compared the theoretical SAXS scattering curve of the Mis18a/b core complex (named Mis18a/b DN), which lacks the flexible elements (disordered regions and the helical region flexibility connected to the Yippee domains). The theoretically calculated SAXS scattering curve of the model matches nicely with the experimental data with c2 value of 1.36. This data is now included in new Fig. 2 (Fig. 2f) and is referenced on page 9 line 21.

      Minor Comments

      While the introduction is clearly written, an additional cartoon schematic, representing the system/question would be helpful to a non-specialist reader to interpret the context of the study.

      We have now included a cartoon in the revised Fig. 1 to support the introduction on centromere maintenance and the central role of the Mis18a/b/BP1 complex in this process. Please find the new Fig. 1 below.

      No doubt the authors had a reason for choosing their figure allocation, but I wonder if more material couldn't be brought from the supplementary into the main figures?

      As addressed in our response to one of the major comments, we have now moved key CLMS, SAXS and EM data from the supplemental figure into the main figure, new Fig. 2.

      Page 6 "Mis18-alpha possesses an additional alpha-helical domain" - please make it clear in addition to what (I assume it's in addition to Mis18-beta).

      Apologies for the lack of clarity. We have now rephrased this sentence to highlight that this difference is in comparison with Mis18b on page 6 line 15.

      Page 7 - Report the RMSD of the Pombe vs. Human Mis18-alpha yipee structures?

      The S. pombe Mis18 Yippee structure superposes on to the Human Mis18a Yippee domain with an RMSD of 0.92 angstroms with is now mentioned on page 7 line 9.

      Page 7 - "We generated high-confidence structural models...." is there a metric for the confidence as reported by RaptorX? Perhaps includinging the PAE plots in the supplementary for the AlphaFold generated models would be useful?

      We thank the reviewer for the valid suggestion. We have now included the PAE plot corresponding to the AlphaFold model in the supplementary Fig. S1d and reference on page 7 line 18. RaptorX ranks models based on estimated error. We have now included this information in the new figure legend for Supplementary Fig. S1.

      Figure 1 - Perhaps label figure 1b as being experimentally determined, with the R values (as for Figure 1d), and 1c being a predicted model.

      We have included Rfree and Rwork values for the Mis18a Yippee homo dimer structure and labelled Mis18a/b Yippee hetero-dimer as the predicted model in Fig. 1c and 1d.

      Page 8 "This observation is consistent with the theoretically calculated pI of the Mis18alpha helix" This is a circular argument, of course this region has a low pI due to the amino acid composition. Please remove this statement.

      We have now removed this statement as suggested.

      Page 8 "...reveals tight hydrophobic interactions" these are presumably shown in Figure 1d rather than in the referenced 1e.

      We apologise for the oversight. We have now referred to the correct figure (Fig. 1f in the revised Fig. 1).

      Page 8 - The authors should briefly somewhere discuss why there is a difference between their results and those in Pan et al 2009. As I understand it, the Pan et al paper was based in part on modelling with CLMS data as restraints.

      We thank the reviewer for this suggestion. According to Pan et al., 2009, the model shown by them was generated using CCBuilder, and their CLMS data could not differentiate the two models with the 2nd Mis18a C-terminal helix in either parallel or anti-parallel orientation. We now briefly discuss this on page 8 and line 22 as follows: "Although the Pan et al., 2019 model presented the 2nd Mis18a in a parallel orientation, they did not rule out the possibility of this assembling in an anti-parallel orientation within the Mis18a/b C-terminal helical assembly (Pan et al., 2019)."

      Figure 1 - The labelling of the residues for Mis18-alpha in Figure 1d is problematic, they are black on dark purple (might be my printer/screen/eyes) suggest amending.

      We have now rearranged the label positions to overcome this issue. For clarity, the labels that could not be moved appropriately are shown in white.

      Figure S3a - Do the authors have some data to show the mass of the cross-linked complex that was loaded onto grids is consistent with what is expected?

      Unfortunately, the amount of material that we recover after performing GraFix is not sufficient enough to determine the molecular weight of the crosslinked sample by techniques such as SEC-MALS. However, GraFix fractions were analysed by SDS PAGE, and fractions that ran around the expected molecular weight were selected for EM analysis. We have now included the corresponding SDS-PAGE showing the migration of the crosslinked sample analysed by EM (Supplementary Fig. S3a).

      Figure S3b - scale bar

      Revised Fig. 2d now includes the scale bar shown.

      Figure S3c - Could the authors show or explain the differences between these different 3D reconstructions?

      The models mainly differ in the relative orientations of the bulkier structural features that are referred to as 'ear' and 'mouth' pieces of a telephone handset. This has been mentioned in the text, but we note that the figure is not referenced right next to this statement. We have now amended this (Page 9 line 19), and to make it clear, we have also highlighted the difference using an arrowhead in Fig. 2e and S3b. The different orientations are also stated in the corresponding figure legends.

      Page 9 - The use of "AFM" for AlphaFoldMultimer" is a little confusing since AFM is the established acronym for Atomic Force Microscopy. Perhaps AF2M?

      We have now replaced AFM with AF2M on page 9 to avoid confusion.

      Figure S4a - Control missing for Mis18-alpha wild-type

      Apology for the confusion, this control is present in Fig. 4a. We have now stated this in the figure legend of S4a for clarity.

      Figure S4 d and e - The contrast between the bands and the background is very bad (at least in my copy).

      We have now adjusted the contrast of the blots in Fig. S4d and S4e response to this comment.

      Page 13 "Our structural analysis suggests that two Mis18BP1 fragments.....". How did you arrive at this conclusion? Is this based on the AlphaFold/RaptorX model? What additional evidence do you have that the positioning of the Mis18BP1 is correct? Does the CLMS data support this?

      We confirm that this statement is based on AlphaFold model. We have now explicitly highlighted this on page 14, line 5. As noted in the same paragraph (page 14, line 19), this model agrees with the contacts suggested by the cross-linking mass spectrometry data presented here.

      Figure 4a - Would the authors like to consider using a different colour for Mis18BP1? The contrast is not great, especially in the electrostatic surface inset.

      In response to this suggestion, the Mis18BP1 helix is now shown in grey in the inset of Fig. 5a.

      Reviewer #1 (Significance (Required)):

      General Assessment The paper is extremely clearly written. Likewise the figures are beautifully presented and the data extremely clean and fully supportive of the authors conclusions. Indeed it is seldom that one sees the depth of the structural approaches (X-ray, CLMS, EM, SAXS) in one paper which is a huge strength of the manuscript. In addition the translation of this data into very clean cell biological experiments, makes the paper truly outstanding.

      Advance The authors provide the first model of the Mis18 complex, with extensive evidence to back up this model. The authors provide additional evidence as to how the deposition/renewal of CENP-A might be mediated by the Mis18 complex. The advance comes from both the level of clarity, detail, and scope achieved in this paper.

      Audience This will likely be of great interest to anyone with an interest in chromosome biology, plus be of interest to structural biologists as an outstanding example of hybrid structural biology.

      Expertise I am a biochemist with a background in structural biology with some familiarity with centromere biology

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The manuscript "structural basis for Mis18 complex assembly: implications for centromere maintenance" by Thamkachy and colleagues describes a study that uses structural analysis to test essential candidate residues in Mis18 complex components in CENP-A loading. For chromosomes to faithfully segregate during cell division, CENP-A levels must be maintained at the centromere. How CENP-A levels are maintained is therefore important to understand at the mechanistic level. The Mis18 complex has been found to be important, but how exactly the various Mis18 complex components interact and how they regulate new CENP-A loading remains not fully understood. This study set out to characterize the critical residues using X-ray crystallography, negative staining EM, SEC analysis, molecular modeling (Raptorx, AlphaFold2, and AlphaFold-multimer) to identify the residues of Mis18a and Mis18b that are critical for the formation of the Mis18a/b hetero-hexamer and which residues are important for Mis18a and Mis18BP1 interactions. A complex beta-sheet interface dictates the Mis18a and Mis18b interactions. Mutating the Mis18a residues that are important for the Mis18a/b interactions resulted in impaired pull-down of Mis18b and reduced centromeric levels of mutated Mis18a. The functional consequences of mutating residues that impair Mis18a/b interactions is that with reduced centomeric levels of Mis18a, also impaired new CENP-A loading. Interestingly, mutated Mis18b did not impact centromeric Mis18a levels and only modestly impaired new CENP-A loading. These data were interpreted that Mis18a is critical for new CENP-A loading, whereas Mis18b might be involved in finetuning how much new CENP-A is loaded. Overall, it is a very well described and well written study with exciting data.

      Major comments:

      • Overall, the structural data and the IF data support the importance of Mis18a residues 103-105 are critical for centromeric localization and new CENP-A loading, whereas Mis18b residues L199 and I203 are critical for centromeric localization, but only very modestly impair centromeric Mis18a localization and new CENP-A loading. In the discussion the authors argue that the N-terminal helical region of Mis18a mediate HJURP binding. This latter is postulated based on published work, but not tested in this work. This should be clarified as such.

      We thank the reviewer for this comment. Our very recent study aimed at understanding the licencing role of Plk1, independent of the work reported here, serendipitously has now validated this suggestion and demonstrates that a Plk1-mediated phosphorylation cascade activates the Mis18a/b complex via a conformational switch of the N-terminal helical region of Mis18a, which facilitates a robust HJURP-Mis18a/b interaction (Parashara et al. bioRxiv 2024). An independent study from the Musacchio lab (Conti et al. bioRxiv, 2024) also reports similar findings, mutually strengthening our independent conclusions. Overall, these studies highlight the importance of the critical structural insights into the Mis18 complex this study reports. We now explicitly discuss the validation of our original hypothesis by citing our recent work along with that of the Musacchio lab. The corresponding section of the last paragraph now reads as follows (page 17 line 10): "Previously published work identified amino acid sequence similarity between the N-terminal region of Mis18a and R1 and R2 repeats of the HJURP that mediates Mis18a/b interaction (Pan et al., 2019). Deletion of the Mis18a N-terminal region enhanced HJURP interaction with the Mis18 complex (Pan et al., 2019). Here, we show that the N-terminal helical region of Mis18a makes extensive contact with the C-terminal helices of Mis18a and Mis18b, which had previously been shown to mediate HJURP binding by Pan et al., 2019. Collectively these observations suggest that the N-terminal region of Mis18a might directly interfere with HJURP - Mis18 complex interaction. Two independent recent studies (Parashara et al., 2024, Conti et al., 2024) reveal that this is indeed the case and a Plk1-mediated phosphorylation cascade involving several phosphorylation and binding events of the Mis18 complex subunits relieve the intramolecular interactions between the Mis18a N-terminal helical region and the HJURP binding surface of the Mis18a/b C-terminal helical bundle. This facilitates robust HJURP-Mis18a/b interaction in vitroand efficient HJURP centromere recruitment and CENP-A loading in cells. Overall, these studies also highlight the importance of the critical structural insights into the Mis18 complex we report here."

      • Overall, the authors clearly describe their data and methodology and use adequate statistical analyses. The structural data of the Mis18a/b complex being a hetero-hexamer is convincing, but the validation in vivo is missing. As structural experiment are not performed under physiological conditions, it is important to establish the stoichiometry in vivo to further support the totality of the findings of the structural experiments and modeling. The data for the hierarchical assembly of Mis18a and Mis18b at the centromere and its importance in new CENP-A loading is convincing. An additional open question is whether "old" centromeric CENP-A or HJURP:new CENP-A complex is needed to recruit Mis18a to the centromere and whether the identified residues have a role in Mis18a centromeric localization. These data would provide a solid link between the Mis18 complex and how it is directly linked to new CENP-A loading.

      We agree that establishing the stoichiometry of Mis18 subunits of the Mis18 complex in vivo would be insightful. However, considering that the Mis18 complex assembles in a specific window of the cell cycle (late Mitosis and early G1), we think characterising the stoichiometry in cells is extremely difficult and technically challenging. However, consistent with our structural model, several lines of independent evidence (Pan et al., 2017 and Spiller et al., 2017) using different biophysical methods (Analytical Ultra Centrifugation (Pan et al., 2017), SEC-MALS (Spiller et al., 2017)) showed that recombinantly purified Mis18 complex (irrespective of the expression host, from both E. Coli or insect cells) is a hetero-octamer made of a hetero-hexameric Mis18a/b (4 Mis18a and 2 Mis18 b) complex bound to two copies of Mis18BP1. These observations suggested that hetero-hexamerisation of the Mis18a/b complex may be needed to bind and dimerise Mis18BP1 in cells. Previously published cellular studies support the in vivo requirement of the hetero-octameric Mis18 assembly as: (i) Perturbing the hetero-hexamerisation of the Mis18a/b complex (by introducing mutations at the Mis18a/b Yippee dimerisation interface, which while did not disrupt Mis18a/b complex formation, perturbed its hetero-hexamerisation and resulted in a hetero-trimeric Mis18a/b complex made of 2 Mis18aand 1 Mis18b) abolished Mis18BP1 binding in vitro and in cells, consequently abolished CENP-A deposition (Spiller et al., 2017) and (ii) artificial dimerisation of Mis18BP1, by expressing Mis18BP1 as a GST-tagged protein, enhanced the centromere localisation of Mis18BP1 highlighting the requirement of Mis18a/b hexameric assembly mediated dimerization of Mis18BP1 in cells (Pan et al., 2017). While these studies highlighted the importance of maintaining the right stoichiometry (hetero-octamer of 4 Mis18a, 2 Mis18b and 2 Mis18BP1), lack of structural information on how this essential biological assembly is established remained a major knowledge gap. Our work presented here fills this critical knowledge gap by showing that a segment of Mis18BP1 (aa 20-51) also binds at the Yippee dimerisation interface. To highlight this, we have included the following statements in the introduction on page 5 and 20 "Perturbing the Yippee domain-mediated hexameric assembly of Mis18a/b (that resulted in a Mis18a/b hetero-trimer, 2 Mis18a and 1 Mis18b) abolished its ability to bind Mis18BP1 in vitro and in cells (Spiller et al., 2017), emphasising the requirement of maintaining correct stoichiometry of Mis18a/b subunits. Consistent with this, artificial dimerisation of Mis18BP1, by expressing Mis18BP1 as a GST-tagged protein, enhanced the centromere localisation of Mis18BP1 (Pan et al., 2017)." and in the Results section on page 14 line 12: "Mis18BP120-51 contains two short b strands that interact at Mis18a/b Yippee interface extending the six-stranded-b sheets of both Mis18a and Mis18b Yippee domains. This provides the structural rationale for why Yippee domains-mediated Mis18a/b hetero-hexamerisation is crucial for Mis18BP1 binding (Spiller et al., 2017)."

      Regarding the question "whether 'old' centromeric CENP-A or HJURP:new CENP-A complex is needed to recruit Mis18a centromere localisation and whether identified residues have a role in Mis18a centromere localisation": According to the published literature, the Mis18 complex associates with centromeres through interaction with CCAN components CENP-C and CENP-I (Shono et al., 2015, Dambacher et al., 2012, Moree et al., 2011, Hoffmann et al., 2020). Considering CCAN assembles on CENP-A nucleosomes, and HJURP:new CENP-A centromere recruitment depends on the Mis18 complex, it will be reasonable to argue that the 'old' centromeric CENP-A contributes to the centromere localisation of the Mis18 complex. Amongst the components of the Mis18 complex, Mis18BP1 and Mis18bhave previously been suggested to interact with CENP-C. Within the Mis18 complex, we (Spiller et al., 2017) and others (Pan et al., 2017) have shown that Mis18a can directly interact with Mis18BP1, but it does so more efficiently when Mis18a hetero-oligomerises with Mis18b via their Yippee domains. Here, our structural analysis mapped the interaction interfaces and showed that Mis18a residues E103, D104 and T105 contribute to Mis18BP1 binding, as mutating these residues abolishes centromere localisation of Mis18a (Fig. 5c and 5d). To accentuate our findings, we have now included the following paragraph in the discussion section (page 17 line 26): "One of the key outstanding questions in the field is how does the Mis18 complex associate with the centromere. Previous studies identified CCAN subunits CENP-C and CENP-I as major players mediating the centromere localisation of the Mis18 complex mainly via Mis18BP1 (Shono et al., 2015, Dambacher et al., 2012, Moree et al., 2011), although Mis18b subunit has also been suggested to interact with CENP-C (Stellfox et al., 2016). Within the Mis18 complex, we and others have shown that the Mis18a/b Yippee hetero-dimers can directly interact with Mis18BP1. Here our structural analysis allowed us to map the interaction interface mediating Mis18a/b-Mis18BP1 binding. Perturbing this interface on Mis18a completely abolished Mis18a centromere localisation and reduced Mis18BP1 centromere levels. These observations show that Mis18a associates with the centromere mainly via Mis18BP1, and assembly of the Mis18 complex itself is crucial for its efficient centromere association, as previously suggested. Future work aimed at characterising the intermolecular contact points between the subunits of the Mis18 complex, centromeric chromatin and CCAN components and understanding if the Mis18 complex undergoes any conformational and/or compositional variations upon centromere association and/or during CENP-A deposition process, will be crucial to delineate the mechanisms underpinning the centromere maintenance."

      Minor comments:

      • The bar graphs shown ideally also show the individual data points for the authros to appreciate the spread of the data. These figures can be replicated in the Supplemental to avoid making the main figures look too busy.

      We thank the reviewers for this suggestion. Reviewer #3 made a similar comment and suggested we use Superplot, which allows visualisation of individual data points of independent experiments. We have now revised all bar graphs using Superplot to address both reviewers' suggestions.

      Reviewer #2 (Significance (Required)):

      • This study uses a broad range of structural techniques, including molecular modeling which were subsequently validated by in vitro pull-down assays, co-IP, and IF. This combination of these techniques is important because many structural techniques cannot be performed under physiological conditions. Validating the main findings of the structural results by IF and co-IP is therefore critical.
      • This work greatly advances our structural understanding how Mis18a, Mis18b, and Mis18BP1 form the Mis18 complex and how the critical residues in especially Mis18a help the Mis18 complex localize to the centromere and influence new CENP-A loading. This study also provides the first strong evidence in hierarchical assembly of the Mis18 complex.
      • How centromere identity is maintained is a critical question in chromosome biology and genome integrity. The Mis18 complex has been identified as an important complex in the process. Several structural and mutational studies (all adequately cited in this manuscript) have tried to address which residues guide the assembly and functional regions of the Mis18 complex. This work builds and expands our understanding how especially Mis18a holds a pivotal role in both Mis18 complex formation and its impact on maintaining centromeric CENP-A levels.
      • This work will be of interest to the chromosome field in general and anyone studying the mechanism of cell division.
      • Chromatin, centromere, CENP-A, cell division. This reviewer has limited expertise in structural biology.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Centromere identity is defined by CENP-A loading to specific sites on genomic DNA. CENP-A loading is known to rely on the Mis18 complex, and several regulators are known; yet how the Mis18 complex achieves this complex process has remained puzzle. By elucidating the structural basis of Mis18 complex assembly using integrative structural approaches the authors show that multiple homo and heterodimeric interfaces of Mis18alpha, beta and Mis18BP1 are involved in centromere maintenance. The authors show that Mis18alpha can associate with centromeres and deposit CENP-A independent of Mis18 β. Mis18α functions in CENP-A deposition at centromeres independent of Mis18β. Mis18β is required for maintaining a specific level of CENP-A occupancy at centromeres. Thus, using structure-guided and separation-of-function mutants the study reveals how Mis18 complex ensures centromere maintenance. Major comments: This is an excellent study on centromere inheritance, combining structural and cell biology techniques. The comments here primarily refer to Cell biology aspect of the work.

      Figures show that new CENP-A deposits in Mis18βL199D/I203D mutants, but the level was reduced moderately. Based on this observation, the authors make a strong conclusion that Mis18β licenses the optimal levels of CENP-A at centromeres. Mis18α may be essential for both CENP-A incorporation and depositing a specific amount of CENP-A, as Mis18α and CENP-A levels are both reduced in Mis18βL199D/I203D mutants which failed to form the triple helical assembly with Mis18α as shown in Figure 3B and 3C. The authors may want to qualify some of these claims as preliminary or speculative.

      We thank the reviewer for this suggestion. We agree that although the reduction in CENP-A levels upon replacing WT Mis18b with Mis18b L199D/I203D is more prominent than the reduction in centromere localised Mis18a, one cannot completely rule out the contribution of reduced Mis18a on CENP-A loading. This also raises an interesting possibility where Mis18b ensures the correct amount of CENP-A deposition by facilitating the optimal level of Mis18a at centromeres. We now explicitly discuss this in the discussion as follows (page 16 line 26): "Whilst proteins involved in CENP-A loading have been well established, the mechanism by which the correct levels of CENP-A are controlled is yet to be thoroughly explored and characterised. The data presented here suggest that Mis18b mainly contributes to the quantitative control of centromere maintenance - by ensuring the right amounts of CENP-A deposition at centromeres - and maybe one of several proteins that control CENP-A levels. We also note that the Mis18b mutant, which cannot interact with Mis18a, moderately reduced Mis18a levels at centromeres, and hence, it is possible that Mis18b ensures the correct level of CENP-A deposition by facilitating optimal Mis18a centromere recruitment. Future studies will focus on dissecting the mechanisms underlying the Mis18b-mediated control of CENP-A loading amounts along with any other mechanisms involved."

      This work and others show that phosphorylation of Mis18BP1 by CDK1 can interfere with complex function (Spiller et al., 2017, Pan et al., 2017). Does the structure provide any insight into PLK1-mediated phosphorylation surfaces for activation of the complex? If yes, a brief discussion would help to link CDK1 and PLK1 mediated opposing actions will strengthen the work.

      As described in our response to the first major comment of Reviewer 2, our very recent study aimed at understanding the licencing role of Plk1, independent of the work reported here, identified and evaluated the functional contribution of Plk1 phosphorylation on the subunits of the Mis18 complex (Parashara et al., bioRxiv 2024). Serendipitously, this recent work has now validated our hypothesis proposed based on the structural characterisation reported here and demonstrates that a Plk1-mediated phosphorylation cascade activates the Mis18a/b complex via a conformational switch of the N-terminal helical region of Mis18a which facilitates a robust HJURP-Mis18a/b interaction (Parashara et al. bioRxiv 2024). An independent study from the Musacchio lab (Conti et al., bioRxiv 2024) also reports similar findings, mutually strengthening our independent conclusions. Overall, these studies highlight the importance of the critical structural insights into the Mis18 complex this study reports. We now explicitly discuss the validation of our original hypothesis by citing our recent work along with that of the Musacchio lab. The corresponding section of the last paragraph now reads as follows (page 17 line 10): "Previously published work identified amino acid sequence similarity between the N-terminal region of Mis18a and R1 and R2 repeats of the HJURP that mediates Mis18a/binteraction (Pan et al., 2019). Deletion of the Mis18a N-terminal region enhanced HJURP interaction with the Mis18 complex (Pan et al., 2019). Here, we show that the N-terminal helical region of Mis18a makes extensive contact with the C-terminal helices of Mis18a and Mis18b, which had previously been shown to mediate HJURP binding by Pan et al., 2019. Collectively these observations suggest that the N-terminal region of Mis18a might directly interfere with HJURP - Mis18 complex interaction. Two independent recent studies (Parashara et al., 2024, Conti et al., 2024) reveal that this is indeed the case and a Plk1-mediated phosphorylation cascade involving several phosphorylation and binding events of the Mis18 complex subunits relieve the intramolecular interactions between the Mis18a N-terminal helical region and the HJURP binding surface of the Mis18a/b C-terminal helical bundle. This facilitates robust HJURP-Mis18a/b interaction in vitro and efficient HJURP centromere recruitment and CENP-A loading in cells. Overall, these studies also highlight the importance of the critical structural insights into the Mis18 complex we report here."

      I am happy with the way cell biology data and the methods are presented so that they can be reproduced. The experiments are adequately replicated and the statistical analysis adequate. It will help to include sample size of cells or centromeres used for building the graphs.

      We have now included this information in figure legends of Fig. 3a, 3c, 4b, 4c, 5b, 5c and 5d.

      This is a strong interdisciplinary study using a variety of in vitro and in vivo techniques. Can the authors discuss if they expect chromatin associated Mis18 complex to host a similar structure as the soluble one? In other words, are they able to comment on any key differences between chromatin and non-chromatin associated Mis18 complexes.

      We thank the reviewer for the suggestion. We agree that one cannot rule out the possibility of the Mis18 complex undergoing compositional and/or conformational variations during the processes of CENP-A loading at centromeres. We now explicitly discuss this possibility in the last paragraph of the discussion section (page 18 line 10): "Future work aimed at characterising the intermolecular contact points between the subunits of the Mis18 complex, centromeric chromatin and CCAN components and understanding if the Mis18 complex undergoes any conformational and/or compositional variations upon centromere association and/or during CENP-A deposition process, will be crucial to delineate the mechanisms underpinning the centromere maintenance."

      Minor comments: -

      In cell biology experiments, fluorescence intensities could be presented as a superplot for added value across cells and repeats (instead of bar graphs). More on superplot:https://doi.org/10.1083/jcb.202001064.

      We thank the reviewers for this kind suggestion. We have now included graphs made using 'superplot' as suggested.

      In general, ACA levels do not appear to change significantly between WT and mutant expressing cells although new CENP-A loading is significantly absent in the presence of a few mutants - please comment if ACA used here can recognise CENP-A. Would this mean that old CENP-A remains normally?

      We thank the reviewer for this comment. While new CENP-A incorporated at centromeres is selectively labelled using the SNAP-tag, the ACA antibody used in these experiments can recognise CENP-A, CENP-B and CENP-C, with CENP-B being the primary target (Kallenberg, Clinical Rheumatology,1990). We would also like to note that ACA has commonly been used to locate the centromere in CENP-A loading assays where new CENP-A levels are assessed via selective labelling (e.g. McKinley 2014).

      It is unclear whether any of the mutant acted in a dominant negative fashion in the presence of endogenous Mis18 proteins. It would have been useful to test this particularly in the context of mis18alpha mutants that seem to fully abolish new CENP-A recruitment.

      As Mis18 subunits oligomerise (homo and hetero), we thought expressing these mutants in the presence of endogenous proteins might interfere with endogenous protein in a heterogenous manner and might make the interpretation difficult. Hence, we did not test this. Instead, as described in the manuscript we have tested these mutants in siRNA rescue experiments (Fig. 3, 4 and 5).

      In figure 3a, GFP panel (input lane, 1) is shown to mark a band corresponding to GFP. Is this expected? Please comment.

      Yes, as a control, an empty vector was transfected to express just GFP along with Mis18a-mCherry. These were used to show that there was no unspecific interaction between the beads used for IP or Mis18a-mCherry and GFP tag, and that any interaction seen was due to Mis18b. A similar control was used in S4b, where mCherry was expressed along with Mis18b-GFP. We have now clarified this in the corresponding legends of Fig. 4a and S4b.

      Would be useful to have the scale for the cropped images presented as insets. Figure 4B should read YFP and not YPF.

      We apologise for this typographical error. We have now corrected this.

      The authors may want to explain whether the tag differences matter for their study (Case in point: His-SUMO-Mis18a191-233 WT and mutant His-MBP-Mis18b188-229 proteins).

      The MBP tag was chosen to perform amylose pull-down assays, whereas the SUMO tag was chosen to increase the protein size. This is crucial as the C-terminal fragments of Mis18a and Mis18b are less than 50 amino acids long and are not easy to visualise by the band intensity in the Coomassie-stained SDS PAGE gels.

      Reviewer #3 (Significance (Required)):

      This work elucidates the structural basis of Mis18 complex assembly and the intermolecular interfaces essential for Mis18 functions. This is a significant advance in the field as it helps researchers in the field better understand CENP-A deposition and mechanism underpinning the maintenance of centromere identity. This is a broad area of research benefitting those studying cell division, genome stability, centromere identity and epigenetics might all be interested in and influenced by these findings. Novelty and strength lies in combining structural and cell biology work. Strengths of the work are structural details of the Mis18 complex. Minor weakness is the link between Mis18 structure and Centromere inheritance is limited to one immunostaining assay (I have mentioned this as a minor comment because addressing this may not be within the scope of this manuscript and is likely to require a repeat of a vast majority of the work with additional reagents which may not directly add value to the current manuscript).

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The manuscript needs proper editing and is not complete. Some wordings lack precision and make it difficult to follow (e.g. line 98 "we assembled a chromosome-scale genome of ..." should read instead "we assembled a chromsome-scla genome sequence of ...". Also, panel Figure 2E is missing.

      We will make the suggested change of adding “sequence”. Concerning additional changes, we have carefully edited our manuscript and looked for any incomplete sections. Unfortunately, it is difficult to see what other issues are being raised here without any further information. And the example given is not helpful to ascertain what other changes may be necessary, since we cannot see any problem with the sentence “we assembled a chromosome-scale genome of” as this phrase is widely used in many similar publications.

      As for panel E of figure 2, it is not missing. The panel located to the right, just below “Target Cells”.

      The shortcomings of the manuscripts are not limited to the writing style, and important technical and technological information is missing or not clear enough, thereby preventing a proper evaluation of the resolution of the genomic resources provided:

      • Several RNASeq libraries from different tissues have been built to help annotate the genome and identify transcribed regions. This is fine. But all along the manuscript, gene expression changes are summarized into a single panel where it is not clear at all which tissue this comes from (whole embryo or a specific tissue ?), or whether it is a cumulative expression level computed across several tissues (and how it was computed) etc. This is essential information needed for data interpretation.

      No fertilised eggs or embryos have been sequenced, individual tissues derived from juvenile fish were used for the genome annotation and whole larval fish for the developmental analysis. We will specify in the figures and text that the results shown are from whole larvae, and add more detail to the material and methods section about which type of sample was analysed in which way.

      • The bioinformatic processing, especially of the assemble and annotation, is very poorly described. This is also a sensitive topic, as illustrated by the numerous "assemblathon" and "annotathon" initiatives to evaluate tools and workflows. Importantly, providing configuration files and in-depth description of workflows and parameter settings is highly recommended. This can be made available through data store services and documents even benefit from DOIs. This provides others with more information to evaluate the resolution of this work. No doubt that it is well done,but especially in the field of genome assembly and annotation, high resolution is VERY cost and time-intensive. Not surprisingly, most projects are conditioned by trade-offs between cost, time, and labor. The authors should provide others with the information needed to evaluate this.

      We will upload the code used to assemble and annotate this genome to a public repository or add it to the supplementary material.

      The genome assembly did not use a specific workflow (e.g., nextflow), but was done with a simple command and standard parameters in IPA. Scaffolding was carried out by Phase Genomics using their standardised proprietary workflow, of which a detailed description provided by Phase Genomics can be found in the supplementary material. The annotation workflow has been described in a previous publication already, but an in-depth description can also be found in the Material and methods section, including parameters used for specific steps. The RNA-seq mapping and analysis part has also been described in the Material and Methods section, including parameters and models for DESEq2.

      • Quantifications of T3 and T4 levels look fairly low and not so convincing. The work would clearly benefit from a discussion about why the signal is so low and what are the current technological limitations of these quantifications. This would really help (general) readers.

      We will add a comment on this in the manuscript as suggested. Basically, the T3/T4 levels are consistent with other published work in fish. In the present manuscript for grouper we have a peak level of 1.2 ng/g (1,200 pg/g) of T4 and 0.06 ng/g (60 pg/g) of T3. This is a higher level of T4 and comparable level of T3 to what was found in convict tang (Holzer et al. 2017; Figure 2) with 30 pg/g of T4 and 100 pg/g of T3. Of course, there are also examples with higher levels, such as clownfish (Roux et al. 2023; Figure 1), with 10 ng/g (10,000 pg/g) of T4 and 2 ng/g (2,000 pg/g) of T3.

      The differences could be due to different structure of fish tissues and therefore different hormone extraction efficiency, different hormone measurement protocols, different fish physiology, different fish size (e.g., the weighting of tiny grouper larvae is difficult and less precise than in convict tang). What is important is not the absolute level but the relative level, which shows the change within different larval stages of a species with identical extraction and measurement protocols. Which means our data is internally consistent and coherent with what the grouper literature says.

      Holzer, Guillaume, et al. "Fish larval recruitment to reefs is a thyroid hormone-mediated metamorphosis sensitive to the pesticide chlorpyrifos." Elife 6 (2017): e27595.

      Roux, Natacha, et al. "The multi-level regulation of clownfish metamorphosis by thyroid hormones." Cell Reports 42.7 (2023).

      • Differential analysis highlights up to ~ 15,000 differentially expressed genes (DEG), out of a predicted 26k genes. This corresponds to more than half of all genes. ANOVA-based differential analysis relies on the simple fact that only a minority of genes are DEG. Having >50% DEG is well beyond the validity of the method. This should be addressed, or at least discussed.

      As the reviewer notes, there are a large number of differentially expressed genes due to the fact that this is coming from a larval developmental transcriptome going from one day old larva to fully metamorphosed juveniles at around day 60.

      While DESeq2 indeed works on an assumption that most genes are not differentially expressed, this affects normalization but not hypothesis testing (Wald-test, LRT tests or ANOVA). Normalisation in DESeq2 is fairly robust to this assumption. According to the author of DESeq2, Micheal Love, DESeq2 is using the median ratio for normalisation, and as long as the number of up and down regulated genes is relatively even, DESeq2 will be able to handle the data. As part of our general quality control for this project we consulted the MA plots, which do not show any overrepresented up or down expression patterns. Additionally see Michael Love comment on comparing different tissues, which is also applicable here when comparing vastly different larval stages (https://support.bioconductor.org/p/63630/): “For experiments where all genes increase in expression across conditions, the median ratio method will not be able to capture this difference, but this is typically not the case for a tissue comparison, as there are many "housekeeping" genes with relatively similar expression pattern across tissues.”

      Reviewer #3 (Public Review):

      Weaknesses:

      However, the authors make substantial considerations that are not proven by experimental or functional data. In fact, this is a descriptive study that does not provide any functional evidence to support the claims made.

      We agree with the reviewer that our paper lacks functional experiments but despite that, the transcriptomic data clearly show the activation of TH and corticoid pathways during two distinct periods; an early activation between D1 and D10, and a second one between D32 and juvenile stage. These data are interesting as they call for further examination of 1) the possible interaction of corticoids and TH during metamorphosis, a question that is certainly not settled yet in teleost fishes, and 2) the existence of an early larval developmental step also involving TH and corticosteroids.

      Especially 2) is of interest and importance, since this early activation (unique to our knowledge in any teleost fish studied so far) raises a lot of new questions and once again will certainly be scrutinised by other groups in the years to come, therefore ensuring a good citation impact of our study. We hope that the reviewer, while disagreeing with some our statements, will recognize that our study will be stimulating at that level and that this is what scientific studies should do.

      The consideration that cortisol is involved in metamorphosis in teleosts has never been shown, and the only example cited by the authors (REF 20) clearly states that cortisol alone does not induce flatfish metamorphosis. In that work, the authors clearly state that in vivo cortisol treatment had no synergistic effect with TH in inducing metamorphosis. Moreover, in Senegalensis, the sole pre-otic CRH neuron number decreases during metamorphosis, further arguing that, at least in flatfish, cortisol is not involved in flatfish metamorphosis (PMID: 25575457).

      We will do our best to improve the clarity of the revised manuscript to avoid any misunderstanding about our claims. However, we would like to point out the semantic shift in the reviewer first sentence: Indeed “being involved” is not the same as “cortisol alone does not induce”. In ref 20 the authors explicitly wrote that “Cortisol further enhanced the effects of both T4 and T3, but was ineffective in the absence of thyroid hormones” and in our view this indeed corresponds to ”being involved in metamorphosis”.

      We are not claiming that cortisol alone is involved in metamorphosis as the reviewer suggests, but simply that there is a possible involvement of cortisol together with TH in metamorphosis. We stand on this claim as we indeed observed an activation of corticoid pathway genes around D32, which is sufficient to say it is involved. We do agree that functional experiments will be needed to properly demonstrate the involvement of corticoids in grouper metamorphosis, but this was not possible in the current study as it would imply to set up a full grouper life cycle in lab conditions which is impossible for the scope of this manuscript.

      We also mentioned in the discussion that the role of corticoids in fish larval development is still debated, and we agree that this remain a contentious issue.

      We wrote that “there is contrasting evidence of communication between these two pathways [TH and corticosteroids] in teleost fish with some data suggesting a synergic and other an antagonistic relationship. In terms of synergy, an increase in cortisol level concomitantly with an increase in TH levels has been observed in flatfish (ref 19), golden sea bream (ref 100) and silver sea bream (ref 101). Cortisol was also shown to enhance in vitro the action of TH on fin ray resorption (phenomenon occurring during flatfish metamorphosis) in flounder (ref 20). TH exposure increases MR and GR genes expression in zebrafish embryo (ref 55). It has also been shown that cortisol regulates local T3 bioavailability in the juvenile sole via regulation of deiodinase 2 in an organ-specific manner (ref 56) On the antagonistic side, it has been shown that experimentally induced hyperthyroidism in common carp, decreasing cortisol levels (ref 57), whereas cortisol exposure decreases TH levels in European eel (ref 58). Given this scattered evidence, the existence of a crosstalk active during teleost metamorphosis has never been formally demonstrated. The results we obtained in grouper are clearly indicating that HPI axis and cortisol synthesis are activated (i) during early development and (ii) during metamorphosis. This may suggest that in some aspect cortisol synthesis can work in concert with TH, as has been shown in several different contexts in amphibians (ref 17).” In the revised manuscript, we will also add the interesting case of the Senegal sole mentioned by the reviewer.

      In the last revision, we had also added that our results “brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy” meaning that we clearly acknowledge that we are only revealing a hypothesis that remains to be tested. We later follow up with a discussion about the most novel observation and focus of our study, the increase in THs and cortisol during early development, which was unexpected and very intriguing. Again, these results suggest that there might be a link between the two, as has been shown in amphibians. This is typically the kind of results that should encourage more investigations into other fish species. Indeed, this has been pointed out by other authors and in particular by Bob Denver (probably the foremost expert on this topic) in Crespi and Denver 2012: “Elevation in HPA/I axis activity has been described prior to Metamorphosis in amphibians and fish, birth in mammals (reviewed in Crespi & Denver 2005a; Wada 2008)”. B. Denver also adds that: “Experiments in which GCs were elevated prior to metamorphosis or prior to hatching or birth (e.g. Weiss, Johnston & Moore 2007) or inhibited by treatments with GC synthesis blockers (e.g. metyrapone) or receptor antagonists (e.g. RU486, Glennemeir & Denver 2002) demonstrate that GCs play a causal role in precipitating these life-history transitions (also reviewed in Crespi & Denver 2005a; Wada 2008).” We believe the reviewer will be convinced by these elements coming from a colleague unanimously respected in the field.

      Furthermore, the authors need to recognise that the transcriptomic analysis is whole-body and that HPA axis genes are upregulated, which does not mean they are involved in regulating the HPT axis. The authors do not show that in thyrotrophs, any CRH receptor is expressed or in any other HPT axis-relevant cells and that changes in these genes correlate with changes in TSH expression. An in-situ hybridisation experiment showing co-expression on thyrotrophs of HPA genes and TSH could be a good start. However, the best scenario would be conducting cortisol treatment experiments to see if this hormone affects grouper metamorphosis.

      We agree that functional experiments are needed to validate our hypothesis. As the early peaks of expression levels observed for many genes were very intriguing for us, we did carry out thyroid hormones and goitrogenic treatment on young grouper larvae to test their effect on the morphological changes. Unfortunately, such experiments, already tricky on metamorphosing larvae, are even more risky on such tiny individuals just after hatching and we encountered high mortality rates. We must add that because we cannot establish a full grouper life cycle under lab conditions, we have done these experiment in the context of a commercial husbandry system in Japan, which while excellent limits the scope of possible experiments. We were thus not able to provide functional validation of our hypothesis. Such experiments will be a full project in itself, requiring setting up a rearing system suitable for both larval survival and economical constraints related to drug treatments. We were further limited by the spawning times of the grouper in the operational aquaculture farm, which are limited to a short time during each year. So even if we strongly agree with the necessity of conducting such experiments, we think that this is not in the scope of the present paper, but something future research can explore.

      High TSH and Tg levels usually parallel whole-body TH levels during teleost metamorphosis. However, in this study, high Tg expression levels are only achieved at the juvenile stage, whereas high TSH is achieved at D32, and at the juvenile stage, they are already at their lowest levels.

      This is exactly our point. We observe two peaks in TSH expression, one at D3 and one at D32. The peak at D3 coincides with high thyroid hormone levels on the same day, and while we have not measured TH at D32, existing literature shows that there is a peak in TH during that time (e.g., de Jesus et al., 1998). Similarly, there is a small peak of Tg at D3. Our manuscript focused more on the upregulation of these genes at D3, which has not been reported before in the literature and raised the question of the role of TH so early in the larval development, outside of the metamorphosis period.

      Regarding the respective levels of TSH and Tg, we first would like to add that their respective order of appearance before metamorphosis (TSH at D32, Tg after) is consistent with what we would expect. We agree however that the strong increase of Tg and TPO expression is later than expected. We will make this clear in the revised manuscript.

      It is very difficult to conclude anything with the TH and cortisol levels measurements. The authors only measured up until D10, whereas they argue that metamorphosis occurs at D32. In this way, these measurements could be more helpful if they focus on the correct developmental time. The data is irrelevant to their hypothesis.

      We respectfully disagree with the reviewer, considering that 1) TH levels have already been investigated in groupers coinciding with pigmentation changes and fin rays resorption, 2) that there is also evidence in numerous fish species that TH level increase is concomitant with increase of TH related genes, and 3) that we observed in our data an increase in the expression of TH related genes as well as pigmentation changes and fin rays resorption. Based on our experience in fish metamorphosis and the literature we can say confidently that those observations indicate that metamorphosis is occurring between D32 and the juvenile stage. To reinforce our point, we plan to add a figure to the revised manuscript, which puts our data in the context of earlier studies done in grouper. This will clearly show that our inference is correct. Additionally, we would like to point out that from our experience in several fish species transcriptomic data are more robust and precise than hormone measurements.

      However, as we were surprised by the activation of TH and corticoid pathway genes very early in the larval development (at D3), which is clearly outside of the metamorphosis period, we decided to measure TH and cortisol levels during this period of time to determine if whether or not there this surprising early activation was indeed corresponding to an increase in both TH and cortisol. As such observation has never been made in other teleost species (to our knowledge), and as we were wondering if gene activation was accompanied by hormonal increase, the measurements we did for TH and cortisol between D1 and D10 are relevant. We will make sure to improve the clarity of the revised version of the manuscript to avoid any confusion between the two periods we are studying: early larval development (between D1 and D10) and metamorphosis (between D32 and juvenile stage).

      Moreover, as stated in the previous review, a classical sign of teleost metamorphosis is the upregulation of TSHb and Tg, which does not occur at D32 therefore, it is very hard for me to accept that this is the metamorphic stage. With the lack of TH measurements, I cannot agree with the authors. I think this has to be toned down and made clear in the manuscript that D32 might be a putative metamorphic climax but that several aspects of biology work against it. Moreover, in D10, the authors show the highest cortisol level and lowest T4 and T3 levels. These observations are irreconcilable, with cortisol enhancing or participating in TH-driven metamorphosis.

      We thank the reviewer for this comment, but we think that there might be a misunderstanding here.

      (1) We clearly observed an increase of TSHb (that occurs between D18 and juvenile stage) and an increase of tg from D32 which coincide with the activation of other genes involved in TH pathway (dio2, dio3, and also a strong increase of TRb). All this and put in the context of what we know from previous grouper studies, clearly supports our conclusion that TH-regulated metamorphosis is starting at around D32 in grouper. We also observed morphological changes such as fin rays resorption and pigmentation changes between D32 and juvenile stage. Such morphological changes have already been associated as corresponding to metamorphosis in groupers (De Jesus et al 1998) as they occur during TH level increase, and they also happen to be under the control of TH in grouper (De Jesus et al 1998). Based on this study but also on studies (conducted on many other teleost species) showing that the increase of TH levels is always associated with an activation of TH pathway genes and morphological and pigmentation changes we concluded that metamorphosis of E. malabaricus occurs between D32 and juvenile stage. We will improve the clarity of the manuscript to make sure that our conclusion is based on our transcriptomic and morphological data plus the available literature.

      (2) We clearly observed another activation of TH related gene earlier in the development (between D1 and D10, with a surge of trhrs, tg and tpo at D3. As this activation was very unexpected for us, we decided to focus the analysis of TH levels between D1 and D10 and very interestingly we observed high level of T4 at D3 indicating that THs are instrumental very precociously in the larval development of the malabar grouper which has never been shown before. We declared line 195 that our “data reinforce the existence of two distinct periods of TH signalling activity, one early on at D3 and one late corresponding to classic metamorphosis at D32”. However, we agree that we could have been clearer and clearly explained that this early activation was very intriguing for us and that we wanted to investigate hormonal levels around that period. However, we never claimed anywhere in the manuscript that this early developmental period corresponds to metamorphosis. Something else is occurring and both TH and cortisol seem to be involved but further experiments need to be conducted to understand their role and their possible interaction.

      (3) Finally, regarding the comment about cortisol enhancing or participating in TH driven metamorphosis, our data clearly showed an activation of the corticoid pathway genes around metamorphosis (between D32 and juvenile stage) suggesting a potential implication of corticoids in metamorphosis, but we agree with the reviewer that further experiment are needed to test that. We never claimed that cortisol was enhancing or participating in metamorphosis, on the contrary we are “suggesting a possible interaction between TH and corticoid pathway during metamorphosis”. And we also say that our “results brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy.” Nonetheless, we agree that some parts of our manuscript can be confusing in regards of cortisol synthesis during metamorphosis as we did not measure cortisol levels between D32 and juvenile stage. We will correct this in the revised version.

      Given this, the authors should quantify whole-body TH levels throughout the entire developmental window considered to determine where the peak is observed and how it correlates with the other hormonal genes/systems in the analysis.

      We did not measure TH levels at later stages as it has already been measured during Epinephelus coioides metamorphosis and the morphological changes observed in this species around the TH peak corresponds to what we observed in Epinephelus malabaricus around the peak of expression of TH pathway genes (see De Jesus et al., 1998 General and Comparative Endocrinology, 112:10-16). We are planning to add a figure reconciling all these data together. However, the main focus of this manuscript is the novel observation of the existence of an early activation period observed at D3, and for which we needed TH levels to determine if they were involved in another early developmental process (not related to metamorphosis). Our hypothesis is that this early activation might be related to the growth of fin rays necessary to enhance floatability during the oceanic larval dispersal. As we may have arrived at the explanation of this hypothesis too rapidly without setting up the context well enough, we will pay attention to improve that part too.

      Even though this is a solid technical paper and the data obtained is excellent, the conclusions drawn by the authors are not supported by their data, and at least hormonal levels should be present in parallel to the transcriptomic data. Furthermore, toning down some affirmations or even considering the different hypotheses available that are different from the ones suggested would be very positive.

      We thank the reviewer for acknowledging the solidity of the method of our paper and the quality of the results. We agree that there were several parts where our message is unclear, which we will address in the revised version of the manuscript to make sure there is no more confusion between the two distinct periods we studied in this paper (early larval development and metamorphosis). We will also make sure that our claims about TH/corticoids interaction during both periods remain hypothetical as we cannot yet, despite trials, sustain them with functional experiment.

    1. Author Response

      We provide here a provisional response to the Public Comments and main issues raised by the reviewers. We appreciate the opportunity to submit a revision and will give all of the reviewers’ comments careful consideration when modifying the manuscript.

      (1) BioRxiv version history.

      Reviewer 1 correctly noted that we have posted different versions of the paper on bioRxiv and that there were significant changes between the initial version and the one posted as part of the eLife preprint process. Here we provide a summary of that history.

      We initially posted a bioRxiv preprint in November, 2021 (Version 1) that included the results of two experiments. In Experiment 1, we compared conditions in which the stimulation frequency was at 2 kHz, 3.5 kHz, or 5.0 kHz. In Experiment 2, we replicated the 3.5 kHz condition of Experiment 1 and included two amplitude-modulated (AM) conditions, with a 3.5 kHz carrier signal modulated at 20 Hz or 140 Hz. Relative to the sham stimulation, non-modulated kTMP at 2 kHz and 3.5 kHz resulted in an increase in cortical excitability in Experiment 1. This effect was replicated in Experiment 2.

      In the original posting, we reported that there was an additional boost in excitability in the 20 Hz AM condition above that of the non-modulated condition. However, in re-examining the results, we recognized that the 20 Hz AM condition included an outlier that was pulling the group mean higher. We should have caught this outlier in the initial submission given that the resultant percent change for this individual is 3 standard deviations above the mean. Given the skew in the distribution, we also performed a log transform on the MEPs (which improves the normality and homoscedasticity of MEP distributions) and repeated the analysis. However, even here the participant’s results remained well outside the distribution. As such, we removed this participant and repeated all analyses. In this new analysis, there was no longer a significant difference between the 20 Hz AM and nonmodulated conditions in Experiment 2. Indeed, all three true stimulation conditions (nonmodulated, AM 20 Hz, AM 140 Hz) produced a similar boost in cortical excitability compared to sham. Thus, the results of Experiment 2 are consistent with those of Experiment 1, showing, in three new conditions, the efficacy of kHz stimulation on cortical excitability. But the results fail to provide evidence of an additional boost from amplitude modulation.

      We posted a second bioRxiv preprint in May, 2023 (Version 2) with the corrected results for Experiment 2, along with changes throughout the manuscript given the new analyses.

      Given the null results for the AM conditions, we decided to run a third experiment prior to submitting the work for publication. Here we used an alternative form of amplitude modulation (see Kasten et. al., NeuroImage 2018). In brief, we again observed a boost in cortical excitability in from non-modulated kTMP at 3.5 kHz, but no additional effect of amplitude modulation. This work is included in the third bioRrxiv preprint (Version 3), the paper that was submitted and reviewed at eLife.

      (2) Statistical analysis.

      Reviewer 1 raised a concern with the statistical analyses performed on aggregate data across experiments. We recognize that this is atypical and was certainly not part of an a priori plan. Here we describe our goal with the analyses and the thought process that led us to combine the data across the experiments.

      Our overarching aim is to examine the effect of corticospinal excitability of different kTMP waveforms (carrier frequency and amplitude modulated frequency) matched at the same estimated cortical E-field (2 V/m). Our core comparison was of the active conditions relative to a sham condition (E-field = 0.01 V/m). We included the non-modulated 3.5 kHz condition in Experiments 2 and 3 to provide a baseline from which we could assess whether amplitude modulation produced a measurable difference from that observed with non-modulated stimulation. Thus, this non-modulated condition as well as the sham condition was repeated in all three experiments. This provided an opportunity to examine the effect of kTMP with a relatively large sample, as well as assess how well the effects replicate, and resulted in the strategy we have taken in reporting the results.

      As a first step, we present the data from the 3.5 kHz non-modulated and sham conditions (including the individual participant data) for all three experiments in Figure 4. We used a linear mixed effect model to examine if there was an effect of Experiment (Exps 1, 2, 3) and observed no significant difference within each condition. Given this, we opted to pool the data for the sham and 3.5 kHz non-modulated conditions across the three experiments. Once data were pooled, we examined the effect of the carrier frequency and amplitude modulated frequency of the kTMP waveform.

      (3) Carry-over effects

      As suggested by Reviewer 1, we will examine in the revision if there is a carry-over effect across sessions (for the most part, 2-day intervals between sessions). For this, we will compare MEP amplitude in baseline blocks (pre-kTMP) across the four experimental sessions.

      Reviewer 1 also commented that mixing the single- and paired-pulse protocols might have impacted the results. While our a priori focus was on the single-pulse results, we wanted to include multiple probes given the novelty of our stimulation method. Mixing single- and different paired-pulse protocols has been relatively common in the noninvasive brain stimulation literature (e.g., Nitsche 2005, Huang et al, 2005, López-Alonso 2014, Batsikadze et al 2013) and we are unaware of any reports suggested that mixed designs (single and paired) distort the picture compared to pure designs (single only).

      (4) Sensation and Blinding

      Reviewer 2 bought up concerns about the sham condition and blinding of kTMP stimulation. We do think that kTMP is nearly ideal for blinding. The amplifier does emit an audible tone (at least for individuals with normal hearing) when set to an intensity to produce a 2 V/m E-field. For this reason, the participants and the experimenter wore ear plugs. Moreover, we played a 3.5 kHz tone in all conditions, including the sham condition, which effectively masked the amplifier sound. We measured the participant’s subjective rating of annoyance, pain, and muscle twitches after each kTMP session (active and sham). Using a linear mixed effect model, we found no difference between active and sham for each of these ratings suggesting that sensation was similar for active and sham (Fig 8). This matches our experience that kHz stimulation in the range used here has no perceptible sensation induced by the coil. To blind the experimenters (and participants) we used a coding system in which the experimenter typed in a number that had been randomly paired to a stimulation condition that varied across participants in a manner unknown to the experimenter.

      Reviewer 1 asked why we did not explicitly ask participants if they thought they were in an active or sham condition. This would certainly be a useful question. However, we did not want to alert them of the presence of a sham condition, preferring to simply describe the study as one testing a new method of non-invasive brain stimulation. Thus, we opted to focus on their subjective ratings of annoyance, pain, and finger twitches after kTMP stimulation for each experimental session.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02352

      Corresponding author(s): Elise, Belaidi

      1. General Statements

      We would like to thank the reviewers for their constructive suggestions and comments. We hope that the “point by point answer” and the revision plan proposed below will convince the reviewers and the editor.

      We thank the reviewer 1 for his/her comments. We would like to specify that the involvement of HIF-1 in IH-induced mitochondrial remodeling has indeed been initiated by RNA-seq analysis and confirmed in a cell-based model as well as in wild-type and HIF-1a+/- heterozygous mice subjected to intermittent hypoxia (IH). In vivo, we originally demonstrated that Metformin reversed IH-induced increase in myocardial infarct size through AMPKa2 and, we proposed that metformin could modify HIF-1 activity. Then, we validated our hypothesis in an in vitro model allowing to demonstrate that Metformin, by increasing HIF-1a phosphorylation decreases its activity. We acknowledge that we used several models and this is the reason why we detailed as much as possible the Materials and Methods section including all models, experimental sets designed and methods details. We hope that the point by point response that we made for the reviewer 1 will increase the clarity of our work and we hope that the new results provided will strengthen the evidences concerning the mechanisms by which metformin can inhibit and modulate the deleterious impact of HIF-1 on IH-induced an increase in myocardial infarct size.

      We thank the reviewer 2 for his/her conclusion highlighting that “our work opens new avenues for exploring the potential effects of metformin as a modulatory of HIF-1𝛂 activity in obstructive sleep apnea syndrome”. We hope that the clarifications and/or justifications brought will convince him/her.<br /> We thank the reviewer 3 for having underlined that “metformin induces HIF-1α phosphorylation, decreases its nuclear localization and subsequently HIF-1 transcriptional activity are very much interesting” and for having highlighting that “our study is convincing”. We hope that the justification and the corrections brought in the point by point answer will convince him/her.

      Alltogether, as underlined by the 3 reviewers, our study is very interesting for translational science in the fields of cardiovascular, respiratory and sleep medicine. We hope that the point by point answer and the revision plan proposed will allow the publication of our article in EMBO Molecular Medicine.

      2. Description of the planned revisions

      • *

      Please find below the revision that we plan to address to answer to the questions of the reviewer 1.


      Figure 1 - it would be helpful to list all of the DEGs (what genes are changed?). Including the expression of HIF-1α and PHD isoforms would be informative. If there is a robust HIF-1α signal, changes in the expression of HIF and PHD isoforms would be anticipated. Fig 1F - with regards to glycolysis and hypoxia pathway analysis, most of the DEGs are not canonical HIF-1α/hypoxia targets.

      Figure 1 aimed at better understanding and manipulate the well-recognized involvement of HIF-1 in response to our specific IH stimulus (Semenza, Physiology 2009; Belaidi, Pharmacol & Ther. 2016). __The results provided by the RNA-seq analysis shows that IH induces cardiac oxidative and metabolic stress which are inter-related with HIF-1 activation. __We did not claim that these genes are HIF-1 targets genes. The RNA seq analysis did not allow to reveal HIF-1a and PHD1-3 transcript as the most dysregulated genes of the panel. In case of publication, bulk data and DEGS will be provided in an online file. We agree with the reviewer that the list of the 40 up and down-regulated genes would be very informative and would increase the value of the paper. Thus, we plan to add the name of the 40 up and down-regulated genes on Figure 1B.

      Figure 5G-I, show cytoplasmic HIF1a as well as nuclear.

      Alternatively, why not use IHC for subcellular localization?

      We think that the comments of the reviewer 1 concern Fig.4G-I and not 5G-I. In this figure, we showed that IH increases nuclear HIF-1____a____ expression compared to N condition and that this IH-effect is abolished in mice treated with Metformin, suggesting that, upon IH, Metformin impacts HIF-1__a __nuclear content and subsequently, its activity. The nuclear localization of HIF-1a is the most relevant mean to indicate its activation. We agree with the reviewer that IHC also allows for the indication of the nuclear localization of HIF-1a. Indeed, we previously performed IHC on nuclear HIF-1a localization and demonstrated that IH increased HIF-1a nuclear localization by IHC that was corroborated by Western-blot (Moulin S, TACD, 2020). Western-blot and IHC are both semi-quantitative techniques with different process of analyses. In this study, we choose Western-blot because we have the material to perform this technique and because IHC is associated with an analysis process (size of a slice, areas to analyze, colorimetry…) that is more complex than the analysis process of Western-blot (densitometry solely).

      While the nuclear localization of HIF-1a is the most relevant mean to indicate its activation; it could be interesting to see that HIF-1a cytosolic content was neither modify by IH nor by Metformin. This would also corroborate the results of the RNA-seq that did not demonstrate any difference in DEGs of HIF-1a or of other members of the HIF family. This would also confirm that Metformin plays a major role on HIF-1 activaty regulation (and not transcription) in the context of IH.

      Thus, we plan to perform a Western-blot of HIF-1a on cytosolic extracts of hearts from mice exposed to N or IH and treated or not with Metformin. These extracts are already available and Western-blot would be performed and replicated in 3 weeks. We could also provide a Western-blot in order to show the purity of our extraction protocol (nucleus vs cytosol).

      Figure 5F, it would be important to show the levels of expression of HIF1a in these experiments. Are there positive and negative controls that the authors could use for HIF21a activity in this experiment?

      In our manuscript, we aimed at demonstrating that Metformin decreases HIF-1 activity in a context of strong HIF-1____a____expression and/or stabilizion those mimics what happens after chronic IH in mice (Belaidi E, Int J Cardiol 2016, Moulin S, Ther Adv Chronic Dis 2020) and in apneic patients (Moulin S, Can J Cardiol 2020). Thus, we used a transfection allowing to overexpress HIF-1a that is one of the best means to increase HIF-1 activity. In the Figure 1 below, HA-HIF-1α-WT Addgene AmpR and 5 HRE GFP AmpR plasmids co-transfection induced a decrease in H9c2 viability and an increase in GFP-positive cells that were not observed in H9c2 transfected with pcDNA 3.1 HA-C AmpR (negative control). __This validates our in vitro model as a good positive control to mimic IH consequences. __ However, we agree with the reviewer that we could add a supplemental figure or a panel demonstrating that our transfection induced an increase in HIF-1a expression. Thus, we will perform a Western-blot targeting HIF-1a on H9c2 transfected with the control plasmid (pcDNA 3.1 HA-C AmpR) or the plasmid allowing the overexpression of HIF-1a (HA-HIF-1α-WT AmpR). This work would be performed in 2 months.

      Moreover, we already improved the lisibility of the Figure 5F to clarify the experimental conditions (table inserted under the graphic); we also completed the Materials and Methods section to specify the plasmid used (modifications are in red in the manuscript).

      Figure to see on the downloaded file.



      Figure 1 : GFP fluorescence in H9c2 cells transfected with pcDNA 3.1 HA-C AmpR (control condition) or HA-HIF-1α-WT AmpR (positive control, overexpression of HIF-1a) and 5 HRE-GFP AmpR plamids and treated with CoCl2 (1mM, 2h); magnification x100.

      • *

      This paragraph concerns only the point 4 of the fifth question.

      In these experiments, as well as subsequent studies, it would be very informative to use a specific AMPK activator e.g. MK-8772, to compare with metformin. It is well known that metformin has a number of other targets in addition to AMPK.

      We agree with the reviewer that metformin has pleiotropic effect. Very interestingly, we demonstrated that the reduced-infarct size is not related to the metabolic systemic effect of metformin since it failed to improve the IH-induced insulin resistance while it improves the answer to insulin in normoxic mice (supplemental Figure S3B). This demonstrates that in our model, the cardioprotective effects of Metformin are independent of a potential systemic effect. Then, we demonstrated that metformin protects the heart against ischemia-reperfusion through AMPK____a____2 activation by using AMPK____a____2KO exposed to IH in which Metformin failed to decrease infarct size (Fig.4N). MK-8772 is not widely used in vivo models. Moreover a recent study indicates that chronic treatment with MK-8772 (14 days 1 month in mice and rats, respectively) induces cardiac hypertrophy characterized by an increase in heart weight (Myers R, Science 2017). In vivo experiments with MK-8772 would be not clinically relevant as the use of metformin that is already used in clinic. However, in order to improve the mechanistic investigation concerning the role of AMPKa2 activation on inhibiting HIF-1 activity, we propose to perform the in vitroexperiments performed in Figure 5 with a specific allosteric small-molecule activator of AMPKa2 such as 991.

      We plan to:

      -Expose H9c2 to CoCl2 and treat them with 991 in order to measure HIF-1a phosphorylation.

      -Transfect H9c2 with our plasmids HA-HIF-1α-WT AmpR and 5 HRE GFP AmpR and treat them or not with 991 in order to measure HIF-1 activity (GFP fluorescence). These experiments would be performed in 2 months.

      __Please find below the revision that we plan to address to answer to the second question of the reviewer 2. __

      • *

      2) The WB images were cut and pasted. Please add the original images

      We acknowledge the reviewer's comment and will address it by submitting a supplementary file containing the uncropped immunoblot images. Since this file already exists, our plan is to standardize it by providing, for each slide (immunoblot), all relevant information pertaining to our experiments, including groups, molecular weight markers, cutting, membrane stripping, and other pertinent details.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      • *

      __Please find below the answers or the revisions that have already been incorporated in response to the comments of the reviewer 1. Please note that we provided new results and new figures at the discretion of the reviewer, but we are ready to insert them as figures or supplemental figures in a new revised manuscript if the reviewers and the editor think that it would improve our message. __

      Was mitochondrial content in the hearts after IH experiment measured e.g. mtDNA measurements? IH results in mitochondrial dysfunction/reduced mitochondrial content. It would have been good to show mitochondrial dysfunction by doing basic functional experiments (e.g. TMRM/MitoROS imaging etc.) by isolating cardiomyocytes from the N and IH experiments.

      We thank the reviewer for these questions about the mitochondrial function and content. The impact of IH on mitochondrial function has already been demonstrated in heart (Moulin S, Antioxydants 2022, Wei Q, Am J Physiol 2012). __Indeed, we previously showed that mitochondria isolated from hearts of mice exposed to IH had a decrease in maximal respiration in complex I and II that was not observed in HIF-1_a_+/- ____mice (Moulin S, Antioxidants 2022), indicating that HIF-1 is responsible for IH-induced mitochondrial dysfunction. __

      Figure 4 shows that Metformin abolished IH-induced mitochondrial remodeling similar to what we observed in HIF-1a+/-(Figure 2). This means that treating with Metformin or partially deleting the gene encoding for HIF-a induce the same impact on IH. Then, we demonstrated that Metformin can control HIF-1 activation and we concluded that metformin could be cardioprotective through inhibiting HIF-1 activation and subsequent mitochondrial stress and remodeling. In this study, we focused on the effects of Metformin on HIF-1 and we did not aim at directly test the effect of metformin on mitochondrial function. Actually, metformin exhibits biphasic effects on bioenergetics of cardiac tissue depending on the modality of administration (i. e. single injection, time of administration during an ischemia-reperfusion procedure); the dose administered and the tissue studied (i. e. hiPSC-CMs, isolated mitochondria…) (Emelyanova N, Transl Res 2021). But, we collected some data that we would like to submit at the discretion of the reviewer. Using oximetry, we measured maximal respiration in complex 1 and 2 on isolated mitochondria from hearts of mice exposed to N, IH and treated or not with metformin during the exposure__. While we observed that IH decreases maximal respiration in complex 1 and 2, we did not find any effect of metformin on mitochondrial respiration alteration induced by IH (Figure 2A, B). Using spectrofluorometry, we measured the mitochondrial membrane potential using TMRM; __we did not find any modification of membrane potential in IH or Metformin-treated mice (Figure 2C). Because we previously did not observe any impact of IH on mtDNA/gDNA ratio (Figure 2D), we did not test metformin on this parameter.

      To conclude, we think that these results are not directly in the scope of our work but if the reviewer thinks that they deserve to be discussed, we could add them in a supplementary figure.

      Please, see the figure on the dowloaded file

      Figure 2 : Mice were exposed to 21 days of Normoxia (N) or Intermittent Hypoxia (IH) (1-min cycle of FiO2 5%-21%) and treated with vehicle (Vh, CmCNa 0.01%, 0,1ml.10g-1) or Metformin (Met, 300mg.kg-1.d-1). (A, B) Mitochondrial function __was measured by oximetry with sequential addition of substrate (state 2), ADP (200mM, state 3, maximal respiration) and oligomycin (12.5 mM, state 4) ; quantification of O2 consumption for NADH-linked mitochondrial respiration (complex I-glutamate-malate, GM, 20mM) (A), and for FADH2-linked mitochondrial respiration (complex II-succinate, S, 5mM, in presence of complex I inhibition by rotenone, 6.25mM) (B) (n=8). (C) Mitochondrial membrane potential measured by spectrofluorometry after Tetramethylrhodamine Methyl Ester (TMRM, 0.2mM) in presence of GM (basal condition), maximal respiration (ADP) and uncoupling condition Carbonyl cyanide 4-(trifluoromethoxy)phenylhydrazone, FCCP, 3mM); fluorescence intensity is expressed relative to fuorescence at baseline (before GM) (n=8). (D__) Mitochondrial content assessed by the expression of mitochondrial DNA (mtDNA, COX1) relative to genomic DNA (gDNA, ApoB) measured after PCR (n=6); *p

      Fig 2A-D - CoCl2 is not a good model to mimic hypoxia due its effect on disrupting iron homeostasis in cells, which can mean that some of the effects are due to changes in iron levels and not HIF stabilisation.

      The capacity of CoCl2 to chelate iron is the main property of CoCl2 that we used in order to stabilize HIF-1____a____. Actually, prolyl-4-hydroxylases need Fe2+ to hydroxylate HIF-1____a____ and induce its degradation. __Then, intermittent hypoxia (IH) is characterized by very rapid changes in PO2. This stimulus was designed to reproduce sleep apnea syndrome and its associated disorders (i. e. insulin-resistance, hypertension, increase in myocardial infarct size). This model was firstly developed and validated in rodents (Dematteis M, ILARJ 2008, Belaidi E, Eur Resp Rev 2022, Harki , Eur Resp J 2022). Compelling evidence indicate that the involvement of HIF-1 in IH-deleterious consequences is related to the repetitive phases of oxygenation and especially to IH-induced oxidative stress (Semenza Physiology 2009, Belaidi Pharmacol. & Ther 2016). In order to increase the level of mechanistic insights on HIF-1, we next attempted to optimize in vitro models. A device was developed by Minoves et al. (Minoves M, Am J Physiol, 2017) to expose endothelial and cancer cells to IH. __However, as illustrated below, this device does not mimic efficient rapid hypoxia-reoxygenation cycles able to induce cardiac cell death (Figure 3A). However, CoCl2 decreases H9C2 viability by 60% (Figure 3B) that is associated with a sustained stabilization of HIF-1____a (Figure 3C,D). Thus, we choose this in vitro model as it replicates cardiac cell death and HIF-1_a_overexpression or stabilization which we similarly observe in our in vivo model and in apneic patients (Moulin S, Can J Cardiol 2020).

      Please see the figure on the dowloaded file

      Figure 3 : (A-B) Cell viability measured by 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl-2H- tetrazolium bromide (MTT) of H9c2 cells exposed to 6 hours (h) of repetitive cycles of Intermittent hypoxia (2 minutes (min) PO2 16% - 2 min PO2 2%) (n=3) (A) or treated with CoCl2 (1mM, 2h) (n=6) (B). (C-D), __quantification of __total ____HIF-1____𝛂 expression relative to tubulin (C) and representative image of Wetsern-blot (D) (n=2-3); *p****p*

      Fig. 2K, change in BNIP3 expression is modest, but change in Parkin is very dramatic. BNIP3 is a HIF-1α target but Parkin is not, so it is plausible that mitophagy could be occurring through a HIF-1α independent mechanism.

      Fig. 2K is a representative panel of 2-3 independent experiments. The quantification reported in Figures 2I and 2J demonstrated a significant decrease in BNIP3 and Parkin expressions in HIF-1a heterozygous mice exposed to Intermittent Hypoxia (IH) compared to HIF-1a+/+ mice exposed to IH. While we acknowledge that only BNIP3 is a direct target of HIF-1, the role of HIF-1 in IH-induced auto/mitophagy is demonstrated by our experiments performed in HIF-1____a____heterozygous mice. This shows an important role for HIF-1 without excluding any impact of HIF-1a independent mechanisms.

      What are we meant to be looking at in Fig 2L?

      Figure 2L aims at illustrating mitochondrial remodeling under IH. Stars indicate that mitochondria have abnormal fate in IH conditions and arrows point autophagosomal membrane and formation. This figure was magnified to be clearer (please see new Figure 2L).

      Figure 3B-C, the reduction in pT172 on AMPK is modest. It would be good to include pACC as a downstream target for AMPK.

      As recommended by the reviewer, we inserted in the manuscript the quantification of 79Ser-P-ACC/ACC western-blot as well as a representative image of the Western-blot (See new Figure 3). We also modified the legend of the figure. 79Ser-P-ACC is an important target of AMPK; however, in our experimental conditions, its phosphorylation is not associated to the decrease in AMPK phosphorylation. This could be explained by many points. First, Metformin was administered every day and hearts were harvested 24 hours later after the last administration. Most studies demonstrating a modification of AMPK and ACC phosphorylation are experiments performed in vitro or directly (less than 1 hour) after a single dose of Metformin administration. In the context of myocardial ischemia-reperfusion, Yin et al. showed an increase in P-AMPK/AMPK directly after Metformin treatment without showing any data on P-ACC/ACC (Yin M; Am J Physiol 2011); similar data were published in models of chronic cardiac diseases (Soraya H, Eur J Pharmacol, Gundewar S, Circ Res 2009). Second, in line with the previous explanation, the lack of effect of metformin on P-AMPK and/or P-ACC in rodent models could be explained by its rapid distribution (Sheleme T Clin Pharmacokinetics of Metformin 2021) and its short half-life that is around 3.7 hours in mice (Junien N, Arch Int Pharmacodyn Ther 1979).

      To conclude, since we performed all our analysis 24h after the last treatment and exposure to hypoxia, we argue that the slight but significative decrease in AMPK phosphorylation that we observed in our study highlight a robust impact of chronic IH. However, this would be elegant to confirm this result by measuring AMPK through its phosphorylation capacity (Cool B, Cell Metab 2006, Ducommun S, Am J Physiol 2014). We already sent hearts from mice exposed to Normoxia or Intermittent Hypoxia to Luc Bertrand’s lab (IREC, Belgium) where they used to perform this assay.

      Fig. E-G, show data for mice treated with vehicle.

      In Figure 4 I-J­­, we demonstrated that Metformin significantly decreases infract size in IH condition only and this validates our main hypothesis regarding the specific beneficial effect of this drug in the context of chronic IH. __In order to show that the cardioprotective effect of Metformin is relative to AMPKa2 activation, we first showed that 79Ser-PACC/ACC, one of the main downstream targets of AMPKa2 was increased (Fig. E-G). We did not find it necessary to does not exhibit cardioprotective effects. However, as shown in Figure 3 below, __Metformin also increases 79Ser-PACC/ACC in Normoxic mice validating the treatment. Thus, in normoxic conditions, AMPK____a____2 activation does not exert any cardioprotective effect. We acknowledge that this reinforces our result about the specificity of AMPK____a____2 activation by Metformin under chronic IH condition. We could add this Figure in supplemental results.

      Please, see the figure on the dowloaded file

      Figure 4 : AMPK activation in Normoxic mice treated with vehicle (Vh, CmCNa 0.01%, 0,1ml.10g-1) or __ __metformin (Met, 300mg.kg-1.d-1 : __ __172Thr-P-AMPK/AMPK (A) and 79Ser-P-ACC/ACC (B) ratio and representative image of Western-blot (C) (n=3-6); *p

      Fig. 3K - what cre is used for the a2 KO mice?

      As written in the Materials and methods section, AMPKa2KO mice are not inducible Knock-out mice. Constitutive AMPKα2 knockout mice were kindly generated by Benoit Viollet (Viollet B, JCI, 2003).

      Include normoxia data for the a2 KO mice studies.

      The question of the reviewer concerning the cardioprotective effects of metformin is interesting but is not aligned with the objectives of the study. Indeed, we did not treat normoxic mice with Met for several reasons. First, the objective of the study was to find a cardioprotective strategy against IH-induced an increase in infarct size. Second, Fig. 3I shows that Met significantly reduced infarct size upon IH only; this suggests that AMPK____a____2 activation is specifically involved in IH-induced increase in infarct size but not in reducing infarct size in normoxic mice. Moreover, the beneficial impact of metformin in standard models of myocardial ischemia-reperfusion is controversial and has been extensively discussed (Foretz et al. Cell Metab. 2014). Overall, using AMPKa2 mice was legitimated in the context of IH only. We validated the involvement of AMPKa2 in the cardioprotective effect of metformin especially in IH conditions.

      Figure 5G, what is the rationale for switching to CoCl2 in the mice to prove metformin reduced HIF-1α expression? Why not use reduced O2 tension in mice.

      We respectfully disagree with the reviewer since mice were exposed to N and IH and treated or not with Metformin to demonstrate that this drug abolished IH-induced increase in HIF-1a nuclear expression (Figure 4 H, I). The same model was used in Figure 3 to demonstrate the impact of Metformin on infarct size. Fig. 5G was conducted to demonstrate the potential link between AMPKa2 and HIF-1a phosphorylation in basal conditions of AMPKa2 content or in absence of AMPKa2 (AMPKa2-/- mice). The single presence of AMPK____a____2 demonstrates an increase in HIF-1____a____ phosphorylation if its stabilization is increased by CoCl2; this was not observed in AMPK____a____2-/- mice highlighting that AMPK____a____2 plays an important role in HIF-1____a phosphorylation.


      Please find below the answer to the first question asked by the reviewer 2.

      1) Why did authors choose the IH protocol illustrated in Fig. S1A

      The choice of the hypoxic stimulus was based on literature and mainly on our recognized expertise in preclinical studies aiming at better understanding obstructive sleep apnea syndrome (OSA); a chronic pathology associated with several comorbidities such as diabetes, hypertension… We are conscious that the hypoxic stimulus used in this study is very severe, with a nadir arterial oxygen saturation (SaO2) around 60%. However, this experimental design is required to induce detrimental cardiovascular effects __in the absence of any confounding factors (i.e., obesity) or genetic susceptibility for complications (i. e. genetic susceptibility to hypertension) (Dematteis M, ILARJ 2008). Especially in the context of myocardial infarction, exposing rodents to 14 to 21 days of IH at 5% and subjected them to a myocardial ischemia-reperfusion protocol allows us to reproduce the increase in infarct size in rats (Belaidi E, J. Am. Coll. Cardiol., 2009; Bourdier G, Am J Physiol, 2016) and in mice (Belaidi E, Int J Cardiol 2016; Moulin S, Can J Cardiol 2020) similar to what has been observed in apneic patients (Buchner S, EHJ 2014). __Moreover, we recently conducted a meta-analysis based on 23 preclinical studies aiming at investigating the impact of the IH pattern (duration, FiO2, repetition of cycles…) on infarct size and cardiomyocyte death (Belaidi E, Eur. Resp. J 2022). We showed that IH significantly increases infarct size when IH is applied several days (especially 14 to 21 days) and when FiO2 is around 5%; whereas IH decreases infarct size when it is applied a single day at a FiO2 at 10%. This meta-analysis provided the confirmation that we need to apply a chronic and severe stimulus to reproduce an increase in infarct size that is observed in apneic patients which are exposed every day, during several days to a decrease in SaO2. If the reviewers and the editor consider that this point should be discussed in the discussion section, we will be happy to include it.

      • *

      Please find below the answers or the revisions that have already been incorporated in response to the comments of the reviewer 3. They appear in red in the new manuscript except the modifications performed in the “references” section which appear in black.

      1 The authors used H9c2 rat cardiac cells in vitro experiments although they used mouse model in vivo experiments. Using mouse P19.CL6 cardiac cells instead of rat H9c2 cells may much clearer. Why the authors did not use P19.CL6 cells should be explained.

      We thank the reviewer for his/her suggestion. P19CL6 cell line has been isolated from pluripotent P19 embryonal carcinoma (EC) cells after long term culture under conditions for mesodermal differentiation (Habara-Ohkubo A, Cell Struc Funct 1996). Therefore, these cells are mostly used to ____study the differentiation of cardiac muscle. Indeed, they were recognized to avoid large variations in the differentiation rates which were extensively reported (Mueller I, J Biomed Biotechnol 2010). To our knowledges no ventricular non-beating mice cell line. In this study, we used H9c2 which are extensively used and recognized as a gold standard cellular model to study the biology of cardiomyocytes including mechanisms involved in cardiac ischemia-reperfusion injury (Paillard M, Circulation, 2013; Zhang G Circulation 2021__), cardiac hypertrophy__ (Zhang N, Cell Death Diff 2020; Hu H, Cardiovasc Res 2020), intra-organites calcium exchanges (Moulin S, Antioxydants, 2022, Paillard M, Circulation, 2013) as drug testing (Beshay NM, J Pharm and Tox Methods, 2007). Recently, H9c2 and P19.CL6 were exposed to intermittent hypoxia (70 cycles of FiO2 1% (5 min) - FiO2 21% (10%)) in order to “mimic OSA” and investigate the transcription level of a pool of genes. The authors show some similiraties and differences of mRNA expression between the two cell lines that, indeed, could be attributed to variations in the cell origin (Takasawa S, IJMS 2022). However, in this study, there are no experiment allowing to assess the state of cardiac cells (apoptosis, life, metabolism, remodeling) questioning the pathophysiologic transposability of the model. Moreover, the number of experiments conducted on H9C2 (pubmed references : 7000 vs 100 for P19.CL6) to understand the mechanism involved in acute and chronic cardiac pathologies makes our choice confident and relevant.

      2 The authors described, "The protocol was approved by the French minister (APAFIS#23725-2020012111137561.v2)." in Animals (page 16) without showing approval date, the authors should clearly show the approval date together with their approval numbers.

      We added the approvement date in the materials and methods section. It was approved on February 20, 2020.

      3 In Figure 2 L, scale bar(s) should be added because figures are magnified and/or reduced by printer.

      We agree with the reviewer that scale bars were not visible; we highlighted them.

      4 In Figure 3J and 3N, scale bar(s) should be added.

      Scale bars have been now added on figures 3J and 3N. All pictures were acquired at the maximal zoom of a camera placed at an equal distance from the slices. Then, analyses were performed, slice per slice, with Image J with the same zoom (x5 to get an image at 100%). In this context, scale was added based on a photo of slice taken close to a ruler.

      5 In introduction, "HIF-1" should be changed to "hypoxia-inducible factor 1 (HIF-1)".

      Thank you, we replaced HIF-1 by Hypoxia Inducible Factor-1 (HIF-1) in the introduction section.

      6 In Results, "Angpt1, Txnip, Nmrk2, Nuak1 or Pfkfb1" should be changed to "Angiopoietin 1 (Angpt1), Thioredoxin-interacting protein (Txnip), Nicotinamide riboside kinase 2 (Nmrk2), NUAK family SNF1-like kinase 1 (Nuak1) or 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 1 (Pfkb1)".

      We did the modification and let the abbreviation in italic since it concerns genes name.

      7 In Chronic intermittent hypoxia, "FiO2" should be changed to "FiO2".

      Thank you, we changed it in the materials and methods section.

      8 In Western-blot, "Bio-Rad, California, USA" should be changed to "Bio-Rad, Hercules, CA".

      Thank you, we did it.

      9 In Western-blot, what "tubulin" (α-tubulin or β-tubulin) should be clarified.

      We agree with the reviewer that this point should be specified; α-tubulin was stained, we added the “α”.

      As mentioned below, we have done all the modifications required by the reviewer in the references section.

      10 In Ref. 8, "Antioxidants (Basel) 11 (2022)" should be changed to "Antioxidants (Basel) 11, 2326 (2022)".

      Done

      11 In Ref.10, "Pharmacol Ther (2016)" should be changed to "Pharmacol Ther 168, 1-11 (2016)".

      Done

      12 In Ref.13, "Antioxidants (Basel) 11 (2022)" should be changed to "Antioxidants (Basel) 11, 1462 (2022).

      Done

      13 In Ref. 21, "Diabetes (2017)" should be changed to "Diabetes 66, 2942-2951 (2017)".

      Done

      14 In Ref. 28, "Adv Biol (Weinh), e2300292 (2023)" should be changed to "Adv Biol (Weinh), 8, 2300292 (2023)".

      Done

      15 In Ref. 37, "J Am Heart Assoc 6 (2017)" should be changed to "J Am Heart Assoc 6, e006680 (2017)".

      Done

      16 In Ref. 41, "Eur Respir Rev 32 (2023)" should be changed to "Eur Respir Rev 32, 230083 (2023)".

      Done

      17 In Ref. 46, "Int J Mol Sci 22 (2020)" should be changed to "Int J Mol Sci 22, 268 (2021)".

      Done

      18 In Ref. 47, "Int J Mol Sci 21 (2020)" should be changed to "Int J Mol Sci 21, 2428 (2020)". Done

      4. Description of analyses that authors prefer not to carry out

      __Please find below the answers to the comment of the reviewer 3 that we cannot provide and that is not in the scope of the study. __

      • *

      Figure 5, multiple phosphorylation sites have been identified on HIF1a. What is the nature of the Thr/SerP-HIF1a antibody? It would be far more preferable (essential?) to identify the site(s) within HIF1a that are phosphorylated by AMPK.

      The antibody was provided by Cell Signalling, ref. 9631. Phospho-(Ser/Thr) Phe Antibody detects phospho-serine or threonine in the context of tyrosine, tryptophan or phenylalanine.

      The identification of the Phosphorylation sites will require a long-time consuming phosphoproteomic analysis and subsequent functional validation in vivo and in vitro (directed mutagenesis, knock-in mice, …) which are out of the scope of our paper.

    1. The document provides a fascinating overview of the technologies available to monitor personal health and sports performance, showing how the sector is rapidly evolving towards increasingly intelligent and personalized solutions. Exploring a wide range of devices and mobile applications designed to record and analyze physiological and biomechanical parameters, we highlight how we are moving from simple measurement of standard vital signs to sophisticated predictive algorithms based on population information and artificial intelligence.

      Personally, I believe it is crucial to highlight the importance of independent validation of these technologies. It is worrying to know that many of them have not yet been adequately tested, raising questions about their effectiveness and usefulness for the average consumer. In a world where we are increasingly surrounded by data and devices, it is essential that the technologies we choose to use are reliable and based on solid scientific evidence.

      Analyzing specific devices such as the Zephyr Sensor, E4 Wristband, and Reign Active Recovery Band offers a detailed look at their capabilities and potential benefits. Furthermore, the reference to the Mio SLICE™ bracelet, with its "Personal Activity Score" calculated via an algorithm, is particularly interesting. The clinical study that demonstrated a reduction in the risk of cardiovascular disease for those with a score ≥100 is tangible evidence of the benefits these technologies can offer when tested properly.

      Finally, discussion of sleep and cardiorespiratory monitoring technologies, such as the OURA ring and Fitbit Charge2™, highlights current challenges and limitations. Personally, I think sleep tracking is one of the most interesting aspects of these technologies, as rest is critical to overall health and physical performance. However, it is important to be aware of limitations in the precision and interpretation of the data collected, as this may affect our confidence in the results provided.

      Overall, the document offers a comprehensive overview of an ever-expanding sector. As technologies for monitoring health and sports performance continue to evolve, it is essential to stay informed about their potential and limitations. Innovation is exciting, but careful research and development is essential to ensure that these technologies are truly useful and reliable for improving our lives and performance.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Public Reviewer Comments

      We again thank the reviewers for the time and effort they clearly put into reviewing our manuscript. We have revised our manuscript to take into account the majority of their suggestions, primary among them being refinements of our model and classification approach, detailed sensitivity analysis of our model, and several new simulations. Their very constructive feedback has resulted in what we feel is a much-improved paper. In what follows, we respond to each of their points.

      Reviewer #1:

      COMMENT: The reviewer suggested that our control policy classification thresholds should be increased, especially if the behavioral labels are to be subsequently used to guide analyses of neural data which “is messy enough, but having trials being incorrectly labeled will make it even messier when trying to quantify differences in neural processing between strategies.”

      REPLY: We appreciate the observation and agree with the suggestion. In the revised manuscript, we simplified the model (as another reviewer suggested), which allowed for better training of the classifier. This enabled an increase in the threshold to 95% to have more confidence in the identified control strategies. Figures 7 and 8 were regenerated based on the new threshold.

      COMMENT: The reviewer asked if we could discuss what one might expect to observe neurally under the different control policies, and also suggested that an extension of this work could be to explore perturbation trials, which might further distinguish between the two control policies.

      REPLY: It is indeed interesting to speculate what neural activity could underlie these different behavioral signatures. As this task is novel to the field, it is difficult to predict what we might observe once we examine neural activity through the lens of these control regimes. We hope this will be the topic of future studies, and one aspect worthy of investigation is how neural activity prior to the start of the movement may reflect two different control objectives. Previous work has shown that motor cortex is highly active and specific as monkeys prepare for a cued movement and that this preparatory activity can take place without an imposed delay period (Ames et al., 2014; Cisek & Kalaska, 2005; Dekleva et al., 2018; Elsayed et al., 2016; Kaufman et al., 2014; Lara et al., 2018; Perich et al., 2018; Vyas et al., 2018; Zimnik & Churchland, 2021). It seems possible that the control strategies we observed correspond to different preparatory activity in the motor cortex. We added these speculations to the discussion.

      The reviewer’s suggestion to introduce perturbations to probe sensory processing is very good and was also suggested by another reviewer. We therefore conducted additional simulations in which we introduced perturbations (Supplementary Material; Figure S10). Indeed, in these model simulations the two control objectives separated more. However, testing these predictions via experiments must await future work.

      COMMENT: “It seems like a mix of lambda values are presented in Figure 5 and beyond. There needs to be some sort of analysis to verify that all strategies were equally used across lambda levels. Otherwise, apparent differences between control strategies may simply reflect changes in the difficulty of the task. It would also be useful to know if there were any trends across time?”

      REPLY: We appreciate and agree with the reviewer’s suggestion. We have added a complementary analysis of control objectives with respect to task difficulty, presented in the Supplementary Material (Figures S7 and S8). We demonstrate that, overall, the control objectives remain generally consistent throughout trials and difficulty levels. Therefore, it can be concluded that the difference in behavior associated with different control objectives does not depend on the trial sequence or difficulty of the task. A statement to this extent was added to the main text.

      COMMENT: “Figure 2 highlights key features of performance as a function of task difficulty. …However, there is a curious difference in hand/cursor Gain for Monkey J. Any insight as to the basis for this difference?”

      REPLY: The apparently different behavior of Monkey J in the hand/cursor RMS ratio could be due to subject-to-subject variability. Given that we have data from only two monkey subjects, we examined inter-individual variations between human subjects in the Supplementary Material by presenting individual hand/cursor gain data for all individual human subjects (Figure S1). As can be seen, there was indeed variability, with some subjects not exhibiting the same clear trend with task difficulty. However, on average, the RMS ratio shows a slight decrease as trials grow more difficult, as was earlier shown in Figure 2. We added a sentence about the possibility of inter-individual variations to address the difference in behavior of monkey J with reference to the supplementary material.

      Reviewer #2:

      (Reviewer #2's original review is with the first version of the Reviewed Preprint. Below is the authors' summary of those comments.)

      COMMENT: The reviewer commends the care and effort taken to characterize control policies that may be used to perform the CST, via dual human and monkey experiments and model simulations, noting the importance of doing so as a precursor to future neural recordings or BMI experiments. But the reviewer also wondered if it is all that surprising that different subjects might choose different strategies: “... it makes sense that different subjects might choose to favor different objectives, and also that they can do so when instructed. But has this taught us something about motor control or simply that there is a natural ambiguity built into the task?”

      REPLY: The redundancy in the task that allowed different solutions to achieve the task was deliberate, and the motivation for choosing this task for this study. We therefore did not regard the resulting subject-to-subject variability as a finding of our study. Rather, redundancy and inter-individual variability are features ubiquitous in all everyday actions and we explicitly wanted to examine behavior that is closer to such behavior. As commended by the reviewers, CST is a rich task that extends our research beyond the conventional highly-constrained reaching task. The goal of our study was to develop a computational account to identify and classify such differences to better leverage future neural analyses of such more complex behaviors. This choice of task has now been better motivated in the Introduction of the revised manuscript.

      COMMENT: The reviewer asks about our premise that subjects may use different control objectives in different trials, and whether instead a single policy may be a more parsimonious account for the different behavioral patterns in the data, given noise and instability in the system. In support of this view, the reviewer implemented a simple fixed controller and shared their own simulations to demonstrate its ability to generate different behavioral patterns simply by changing the gain of the controller. The reviewer concludes that our data “are potentially compatible with any of these interpretations, depending on which control-style model one prefers.”

      REPLY: We first address the reviewer’s concern that a simple “fixed” controller can account for the two types of behavioral patterns observed in Experiment 2 (instructed groups) by a small change in the control gain. We note that our controller is also fixed in terms of the plant, the actuator, and the sensory feedback loop; the only change we explore is in the relative weights of position vs. velocity in the Q matrix. This determines whether it is deviations in position or in velocity that predominate in the cost function. This, in turn, generates changes in the gain vector L in our model, since the optimal solution (i.e. the gains L that minimize the cost function) depends on the Q matrix as well as the dynamics of the plant (specifically, the lambda value). Hence, one could interpret the differences arising from changes in the control objective (the Q matrix) as changes in the gains of our “fixed” controller.

      More importantly, while the noise and instability in the system may indeed occasionally result in distinct behavioral patterns (and we have observed such cases in our simulations as well), these factors are far from giving an alternative account for the structural differences in the behavior that we attribute to the control objective. To substantiate this point, we performed additional simulations that are provided in the Supplementary Material (Figures S4—6). These simulations show that neither a change in noise nor in the relative cost of effort can account for the two distinct types of behavior. These differences are more consistently attributed to a change in the control objective.

      In addition, our approach provides a normative account of the control gains needed to simulate the observed data, as well as the control objectives that underlie those gains. As such, the two control policies in our model (Position and Velocity Control) resulted in control gains that captured the differences in the experimental groups (Experiment 2), both at the single trial and aggregate levels and across different task difficulties. Figure S9 in the Supplementary Material shows how the control gains differ between Position and Velocity Control in our model across different difficulty levels.

      We agree,with the reviewer’s overall point, that there are no doubt many models that can exhibit the variability observed in our experimental data, our simulations, or the reviewer’s simulations. Our study aimed to explore in detail not only the model’s ability to generate the variable behavior observed in experimental data, but also to match experimental results in terms of performance levels, gains, lags and correlations across a wide range of lambda values, wherein the only changes in the model were the lambda value and the control objective. Without the details of the reviewer’s model, we are unable to perform a detailed analysis of that model. Even so, we are not claiming that our model is the ‘ground truth,’ only that it is certainly a reasonable model, adopted from the literature, that provides intuitive and normative explanation about the performance of humans and monkeys over a range of metrics, system dynamics, and experimental conditions.

      Finally, we understand the reviewer’s concern regarding whether the trial-by-trial identification of control strategy in Figure 8 suggests that (uninstructed) subjects constantly switch control objectives between Position and Velocity. Although it is not unreasonable to imagine that individuals would intuitively try different strategies between ‘keeping the cursor still’ and ‘keeping the cursor at the center’ across trials, we agree that it is generally difficult to determine such trial-to-trial changes, especially when the behavior lies somewhere in between the two control objectives. In such cases, as we originally discussed in the manuscript, an alternative explanation could be a mixed control objective that generates behavior at the intersection of Position and Velocity Control, i.e., between the two slopes in Figure 8. We believe, however, that our modeling approach is still helpful in cases where performance is predominantly based on Position or Velocity Control. After all, the motivation for this study was to parse neural data into two classes associated with each control objective to potentially better identify structure underlying these behaviors.

      We clarified these points in the main text by adding further explanation in the Discussion section.

      COMMENT: The reviewer suggested additional experiments, such as perturbation trials, that might be useful to further explore the separability of control objectives. They also suggested that we temper our conclusion that our approach can reliably discriminate amongst different control policies on individual trials. Finally, the reviewer suggested that we modify our Introduction and/or Discussion to note past human/monkey research as well as investigations of minimization of velocity-error versus position-error in the smooth pursuit system.

      REPLY: We have expanded our simulations to investigate the effects of perturbation on the separability of different control objectives (Figure S10 in Supplementary Materials). We demonstrated that introducing perturbations more clearly differentiated between Position and Velocity Control. These results provide a good basis for further experimental verifications of the control objectives, but we defer these for future work.

      We also appreciate the additional past work that bridges human and monkey research that the reviewer highlights, including the related discussions in the eye movement literature on position versus velocity control. We have modified our Introduction and Discussion accordingly.

      Reviewer #3:

      COMMENT: The reviewer asked whether the observed differences in behavior might be due to some other factors besides the control policy, such as motor noise or effort cost, and suggested that we more systematically ruled out that possibility.

      REPLY: We appreciate and have heeded the reviewer’s suggestion. The revised manuscript now includes additional simulations in which the control objective was fixed to either Position or Velocity Control, while other parameters were systematically varied. Specifically, we examined the influence of the relative effort cost, the sensory delay, and motor noise, on performance. The results of these sensitivity analyses are presented in the Supplementary Material, Figures S4—6. In brief, we found that changing the relative effort cost, delay, or noise levels, mainly affected the success rate in performance (as expected), but did not affect the behavioral features originally associated with control objectives. We include a statement about this result in the main text with reference to the details provided in the Supplementary Material.

      COMMENT: The reviewer questioned our choice of classification features (RMS position and velocity) and wondered if other features might yield better class separation, such as the hand/cursor gain. In a similar vein, reviewer 2 suggested in their recommendations that we examine the width of the autocorrelation function as a potentially better feature.

      REPLY: We note first that our choice of cursor velocity and position stems from a dynamical systems perspective, where position-velocity phase-space analysis is common. However, we also explored other features as suggested. We found that they, too, exhibited overlap between the two different control objectives, and did not provide any significant improvement in classification performance (Figures S2 and S3; Supplementary Materials). Of course, that is not to say that a more exhaustive examination of features may not find ones that yield better classification performance than those we investigated, but that is beyond the scope of our study. We refer to this consideration of alternative metrics in the discussion.

      COMMENT: The reviewer notes that “It seems that the classification problem cannot be solved perfectly, at least on a single-trial level.” To address this point, the reviewer suggested that we conduct additional simulations under the two different control objectives, and quantify the misclassifications.

      REPLY: We appreciate the reviewer’s suggestion, and have conducted the additional simulations as suggested, the results of which are included in the revised manuscript.

      COMMENT: “The problem of inferring the control objective is framed as a dichotomy between position control and velocity control. In reality, however, it may be a continuum of possible objectives, based on the relative cost for position and velocity. How would the problem differ if the cost function is framed as estimating a parameter, rather than as a classification problem?”

      REPLY: A blended control strategy, formulated as a cost function that is a weighted combination of position and velocity costs, is indeed a possibility that we briefly discussed in the original manuscript. This possibility arises particularly for individuals whose performance metrics lie somewhere between the purely Position or purely Velocity Control. While our model allows for a weighted cost function, which we will explore in future work, we felt in this initial study that it was important to first identify the behavioral features unique to each control objective.

      Response to Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      None beyond those stated above.

      Reviewer #2 (Recommendations For The Authors):

      COMMENT: Line 166 states "According to equation (1), this behavior was equivalent to reducing the sum (𝑝 + 𝑥) when 𝜆 increased, so as to prevent rapid changes in cursor velocity". This doesn't seem right. In equation 1, velocity (not acceleration) depends on p+x. So a large p+x doesn't create a "rapid change in cursor velocity", but rather a rapid change in cursor position.

      REPLY: The reviewer is correct and we have corrected this misworded sentence; thank you for catching that.

      COMMENT: The reviewer points out the potential confusion readers may have, given our unclear use of ‘control strategy’ vs. ‘control policy’ vs. ‘control objective’. The reviewer suggests that “It would be helpful if this could be spelled out early and explicitly. 'Control strategy' seems perilously close to 'control policy', and it would be good to avoid that confusion. The authors might prefer to use the term 'cost function', which is really what is meant. Or they might prefer 'control objective', a term that they introduce as synonymous with 'control strategy'.”

      REPLY: We thank the reviewer for noting this ambiguity. We have clarified the language in the Introduction to explicitly note that by strategy, we mean the objective or cost function that subjects attempt to optimize. We then use ‘control objective’ consistently and removed the term ‘policy’ from the paper to avoid confusion. We also now use Position Control and Velocity Control as the labels for our two control objectives.

      COMMENT: The reviewer notes that in Figure 2B and the accompanying text in the manuscript, we need to be clearer about what is being correlated; namely, cursor and hand position.

      REPLY: Thank you for pointing out this lack of clarity, which we have corrected as suggested.

      COMMENT: The reviewer questions our attribution of decreasing lag with task difficulty as a consequence of subjects becoming more attentive/responsive when the task is harder, and points out that our model doesn’t include this possible influence yet the model reproduces the change in lag. The reviewer suggests that a more likely cause is due to phase lead in velocity compared to position, with velocity likely increasing with task difficulty, resulting in a phase advance in the response.

      REPLY: Our attribution of the decrease in lag with task difficulty being due to attention/motivation was a recapitulation of this point made in the paper by Quick et al. [2018]. But as noted by the reviewer, this potential influence on lag is not included in our model. Accordingly, the change in lag is more likely a reflection of the phase response of the closed loop system, which does change with task difficulty since the optimal gains depend upon the plant dynamics (i.e., the value of lambda). We have, therefore, deleted the text in question.

      COMMENT: “The Methods tell us rather a lot about the dynamics of the actual system, and the cost functions are also well defined. However, how they got from the cost function to the controller is not described. I was also a bit confused about the controller itself. Is the 50 ms delay assumed when deriving the controller or only when simulating it (the text seems to imply the latter, which might make sense given that it is hard to derive optimal controllers with a hard delay)? How similar (or dissimilar) are the controllers for the two objectives? Is the control policy (the matrix that multiplies state to get u) quite different, or only subtly?”

      REPLY: Thanks for pointing this out. For brevity, we had omitted the details and referred readers to the original paper (Todorov, 2005). However, we now revised the manuscript to now include all the details in the Methods section. Hence, the entire section on the model is new. This also necessitated updating all data figures (Figures 3, 4, 5, 6, 7, 8) as they contain modeling results.

      COMMENT: “Along similar lines, I had some minor to moderate confusions regarding the OFC model as described in the main text. Fig 3 shows a model with a state estimator, but it isn't explained how this works. …Here it isn't clear whether there is sensory noise, or a delay. The methods say a delay was included in the simulation (but perhaps not when deriving the controller?). Noise appears to have been added to u, but I'm guessing not to x or x'? The figure legend indicates that sensory feedback contains only some state variables, and that state estimation is used to estimate the rest. Presumably this uses a Kalman filter? Does it also use efference copy, as would be typical? My apologies if this was stated somewhere and I missed it. Either way, it would be good to add a bit more detail to the figure and/or figure legend.”

      REPLY: As the lack of detail evidently led to some confusion, we now more clearly spell out the details of the model in the Methods, including the state estimation procedure.

      COMMENT: The reviewer wondered why we chose to plot mean velocity vs. mean position as in Figure 5, noting that, “ignoring scale, all scatter plots would be identical if the vertical axis were final position (because mean velocity determines final position). So what this plot is really examining is the correlation between final position and average position. Under position control, the autocorrelation of position is short, and thus final position tends to have little to do with average position. Under velocity control, the autocorrelation of position is long, and thus final position tends to agree with average position. Given this, why not just analyze this in terms of the autocorrelation of position? This is expected to be much broader under velocity control (where they are not corrected) than under position control (where they are, and thus disappear or reverse quickly). To me, thinking of the result in terms of autocorrelation is more natural.”

      REPLY: The reviewer is correct that the scatter plots in Fig. 5 would be the same (to within a scale factor of the vertical axis) had we plotted final position vs. mean position instead of mean velocity vs. mean position as we did. Our preference for mean velocity vs. mean position stems from a dynamical systems perspective, where position-velocity phase-space analysis is common. We now mention these perspectives in the revised manuscript for the benefit of the reader.

      As suggested, we also investigated the width of the (temporal) autocorrelation function (acf) of cursor position for 200 simulated position control trials and 200 simulated velocity control trials, at four different lambda values (50 simulated trials per lambda). Figs. S2A and B (Supplementary Materials) show example trials and histograms of the acf width, respectively. As the reviewer surmised, velocity control trials tend to have wider acfs than position control trials. However, as with the metrics we chose to analyze, there is overlap and there is no visible benefit for the classification.

      COMMENT: “I think equation ten is incorrect, but would be correct if the identity matrix were added? Also, why is the last term of B set to 1/(Tau*M). What is M? Is it mass (which above was lowercase m)? If so, mass should also be included in A (it would be needed in two places in the last column). Or if we assume m = 1, then just ignore mass everywhere, including here and equation 5. Or perhaps I'm confused, and M is something else?”

      REPLY: Thanks for pointing this out. The Matrix A shown in the paper is for the continuous-time representation of the model. However, as the reviewer correctly mentioned, for the discrete-time implementation of the model, a modification (identity matrix) was added in our simulations. We have now clarified this in the Methods section of the revised manuscript. Also, as correctly pointed out, M is the mass of the hand, which depending on whether the hand acceleration (d^2 p/dt^2) or hand force (F) are taken as the state, it can be included in the A matrix. In our case, the A matrix is modified according to the state vector. Similarly, the B matrix is also modified. This is now clarified in the Methods section of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      COMMENT: “Equations 4-8 are written in continuous time, but Equation 9 is written in discrete time. Then Equation 10 is in discrete time. This needs to be tidied up. … I would suggest being more detailed and systematic, perhaps formulating the control problem in continuous time and then converting to discrete time.”

      REPLY: Thank you for this helpful suggestion. The model section in the Methods has been expanded to provide further details of the equation of motion, the discretization process, the control law calculation and the state estimation process.

      COMMENT: “It seems slightly odd for the observation to include only position and velocity of the cursor. Presumably participants can also observe the state of their own hand through proprioception (even if it were occluded). How would it affect the model predictions if the other states were observable?”

      REPLY: Thanks for pointing this out. We initially included only cursor position and velocity since we felt that was the most prominent state feedback, and the system is observable in that case. Nevertheless, we revised the manuscript and repeated all simulations using a full observability matrix. Our findings and conclusions remain unchanged. With the changes in the modeling, the figures were also updated (Fig.3, 4, 5, 6, 7, 8).

      COMMENT: “It seems unnecessary to include the acceleration of the cursor in the formulation of the model. …the acceleration is not even part of the observed state according to line 668… I think the model could therefore be simplified by omitting cursor acceleration from the state vector.”

      REPLY: We agree. We have simplified the model, and generated new simulations and figures. Our results and conclusions were unchanged by this modification. With the changes in the modeling, the figures were also updated (Fig.3, 4, 5, 6, 7, 8).

      COMMENT: “In the cost function, it's not clear why any states other than position and velocity of the cursor need to have non-zero values. …The choice to have the cost coefficient for these other states be 1 is completely arbitrary… If the point is that the contribution of these other costs should be negligible, then why not just set them to 0?”

      REPLY: We agree, and have made this change in the Methods section. Our findings and conclusions were unaffected.

      COMMENT: “It seems that the cost matrices were specified after transforming to discrete-time. It is possible however (and perhaps recommended) to formulate in continuous time and convert to discrete time. This can be done cleanly and quite straightforwardly using matrix exponentials. Depending on the discretization timestep, this can also naturally lead to non-zero costs for other states in the discrete-time formulation even if they were zero under continuous time. … A similar comment applies to discretization of the noise.”

      REPLY: Thanks for the suggestion. We have expanded on the discretization process in our Methods section, which uses a common approximation of the matrix exponentiation method.

      COMMENT: “Most of the parameters of the model seem to be chosen arbitrarily. I think this is okay as the point is to illustrate that the kinds of behaviors observed are within the scope of the model. However, it would be helpful to provide some rationale as to how the parameters were chosen. e.g. Were they taken directly from prior literature, or were they hand-tuned to approximately match observed behavior?”

      REPLY: We have revised the manuscript to more clearly note that the noise parameters, as well as parameters of the mechanical system (mass, muscle force, time scale, etc) in our model were taken from previous publications (Todorov, 2005, Cluff et al. 2019). As described in the manuscript, the parameter values of the cost function (Q matrix) were obtained by tuning the parameters to achieve a similar range of success rate with the model as observed in the experimental data. This is now clarified in the Methods section.

      COMMENT: “The ‘true’ cost function for this task is actually a 'well' in position space - zero cost within the screen and very high cost elsewhere. In principle, it might be possible to derive the optimal control policy for this more veridical cost function. It would be interesting to consider whether or not this model might reproduce the observed behaviors.”

      REPLY: This is indeed a very interesting suggestion, but difficult to implement based on the current optimal feedback control framework. However, this is interesting to consider in future work.

      Minor Comments:

      COMMENT: “In Figs 4 and 5, the data points are drawn from different conditions with varying values of lambda. How did the structure of this data depend on lambda? Might it be possible to illustrate in the figure (e.g. the shade/color of each dot) what the difficulty was for each trial?”

      REPLY: We performed additional analyses to show the effects of task difficulty on the choice of control objective. Overall, we found that the main behavioral characteristics of the control objective remained fairly unchanged across different task difficulties or across time. The results of this analysis are included in Fig. S7 and S8 of the Supplementary Materials.

      COMMENT: “Should mention trial duration (6s) in the main narrative of the intro/results.”

      REPLY: We now mention this detail when we describe the task for the first time.

      COMMENT: “As an alternative to training on synthetic data (which might not match behavior that precisely, and was also presumably fitted to subject data at some level) it might be worth considering to do a cross-validation analysis, i.e. train the classifier on subsets of the data with one participant removed each time, and classify on the held-out participant.”

      REPLY: This is indeed a valid point. The main reason to train the classifier based on model simulations was two-fold: first, to have confidence in the training data, as the experimental data was limited and noisy, which would result in less reliable classifications; and second, the model simulations are available for different contexts and conditions, where experimental data is not necessarily available. The latter is a more practical reason to be able to identify control objectives for any subject (who received no instructions), without having to collect training data from matching control subjects who received explicit instructions. Nonetheless, we appreciate the reviewer’s recommendation and will consider that for our future studies.

      COMMENT: “line 690 - Presumably the optimal policy was calculated without factoring in any delay (this would be tricky to do), but the 50ms delay was incorporated at the time of simulation?”

      REPLY: The discretization of the system equations allowed us to incorporate the delay in the system dynamics and solve for the optimal controller with the delay present. This was done simply by system augmentation (e.g., Crevecoeur et al., 2019), where the states of the system in the current time-step were augmented with the states from the 5 preceding time-steps to form the new state vector x(t)_aug =[x(t) , x(t-1) , … , x(t-d) ]. Similarly, the matrices A, B, and H from the system dynamics could be expanded accordingly to form the new dynamical system:

      $$x(t+1){aug} = A{aug} * x(t){aug} + B{aug} * u$$

      Then, the optimal control was implemented on the new (augmented) system dynamics.

      We have revised the manuscript (Methods) to clarify this issue.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study identifies the gene mamo as a new regulator of pigmentation in the silkworm Bombyx mori, a function that was previously unsuspected based on extensive work on Drosophila where the mamo gene is involved in gamete production. The evidence supporting the role of Bm-nano in pigmentation is convincing, including high-resolution linkage mapping of two mutant strains, expression profiling, and reproduction of the mutant phenotypes with state-of-the-art RNAi and CRISPR knock-out assays. While the discussion about genetic changes being guided or accelerated by the environment is extremely speculative and has little relevance for the findings presented, the work will be of interest to evolutionary biologists and geneticists studying color patterns and evolution of gene networks.

      Response: Thank you very much for your careful work. In the revised version, we conducted a comparative genomic analysis of the upstream regions of the Bm-mamo gene in 51 wild silkworms and 171 domesticated local silkworms. The analysis of nucleotide diversity (pi) and the fixation index (FSTs) of the Bm-mamo genome sequences in the wild and domesticated silkworm populations were also performed. The results showed that the Bm-mamo genome sequence of local silkworms was relatively conserved, while the upstream sequence of wild silkworms exhibited high nucleotide diversity. This finding suggested a high degree of variability in the regulatory region of the Bm-mamo gene, in wild strains. Additionally, the sequence in this region may have been fixed by domestication selection. We have optimized the description in the discussion section.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This papers performs fine-mapping of the silkworm mutants bd and its fertile allelic version, bdf, narrowing down the causal intervals to a small interval of a handful of genes. In this region, the gene orthologous to mamo is impaired by a large indel, and its function is later confirmed using expression profiling, RNAi, and CRISPR KO. All these experiments are convincingly showing that mamo is necessary for the suppression of melanic pigmentation in the silkworm larval integument.

      The authors also use in silico and in vitro assays to probe the potential effector genes that mamo may regulate.

      Strengths:

      The genotype-to-phenotype workflow, combining forward (mapping) and reverse genetics (RNAi and CRISPR loss-of-function assays) linking mamo to pigmentation are extremely convincing.

      This revision is a much improved manuscript and I command the authors for many of their edits.

      Response: Thank you very much for your careful work. With the help of reviewers and editors, we have revised the manuscript to improve its readability.

      I find the last part of the discussion, starting at "It is generally believed that changes in gene expression patterns are the result of the evolution of CREs", to be confusing.

      In this section, I believe the authors sequentially:

      • emphasize the role of CRE in morphological evolution (I agree)

      • emphasize that TF, and in particular their own CRE, are themselves important mutational targets of evolution (I agree, but the phrasing need to insist the authors are here talking about the CRE found at the TF locus, not the CRE bound by the TF).

      • use the stickleback Pel enhancer as an example, which I think is a good case study, but the authors also then make an argument about DNA fragility sites, which is hard to connect with the present study.

      • then continue on "DNA fragility" using the peppered moth and butterfly cortex locus. There is no evidence of DNA fragility at these loci, so the connection does not work. "The cortex gene locus is frequently mutated in Lepidoptera", the authors say. But a more accurate picture would be that the cortex locus is repeatedly involved in the generation of color pattern variants. Unlike for Pel fragile enhancer, we don't know if the causal mutations at this locus are repeatedly the same, and the haplotypes that have been described could be collateral rather than causal. Overall, it is important to clarify the idea that mutation bias is a possible factor explaining "genetic hotspots of evolution" (or genetic parallelism sensu 10.1038/nrg3483), but it is also possible that many genetic hotspots are repeated mutational targets because of their "optimal pleiotropy" (e.g. hub position in GRNs, such as mamo might be), or because of particularly modular CRE region that allow fine-tuning. Thus, I find the "fragility" argument misleading here. In fact the finding that "bd" and "bdf" alleles are different in nature is against the idea of a fragility bias (unless the authors can show increased mutation rates at this locus in a wild silkmoth species?). These alleles are also artificially-selected ie. they increased in frequency by breeding rather than natural selection in the wild, so while interesting for our understand of the genotype-phenotype map, they are not necessarily representative of the mutations that may underlie evolution in the wild.

      Response: Thank you very much for your careful work. DNA fragility is an interesting topic, but some explanations for DNA fragility are confusing. One study measured the rate of DNA double-strand breaks (DSBs) in yeast artificial chromosomes (YACs), which are chromosomes containing marine Pel that broke ~25 to 50 times more frequently than did the control. These authors believe that the increase in the mutation rate is caused by DNA sequence characteristics, particularly TG-dinucleotide repeats. Moreover, they found that adding a replication origin on the opposite side of Pel did not cause the fungus to switch fragile, making the forward sequence stable and the reverse complement fragile. Thus, Pel fragility is also dependent on the direction of DNA replication. In summary, they suggested that the special DNA sequence is the cause of DNA fragility. In addition, the sequence features associated with DNA fragility in the Pel region are also found in thousands of other positions in the stickleback and human genomes (Xie KT et al, 2019, science).

      In yeast artificial chromosomes (YACs), the characteristics of DNA sequences, such as TG-dinucleotide repeat sequences, may be important reasons for DNA fragility, and these breaks occur during DNA replication. However, the inserted sequence of YAC often undergoes deletion or recombination during cultivation and passage. In addition, yeast is a single-celled organism. Therefore, the results in yeast cannot represent the situation in multicellular organisms. If multicellular organisms are like this, there are several issues as follows:

      (1) The DNA replication process occurs separately in different multicellular organisms. Because DNA breakage and repair are independent, they can lead to the presence of different alleles in different cells. This can potentially lead to the occurrence of extensive chimeric organisms. However, we have not found such a situation in the genome sequencing of many multicellular organisms.

      (2) If the DNA sequence, TG-dinucleotide repeats, is the determining factor, the mutations near the sequence lose their strong correlation with environmental changes. The researchers conducted yeast artificial chromosome experiments in the same environment and found that the frequency of DNA breaks containing TG dinucleotide repeat sequences was 25 to 50 times greater than that of the control group. This means that, whether in the marine population or the lake population, this part of the sticklebacks’ genome has undergone frequent mutations. However, according to related research, populations of lake sticklebacks, rather than marine populations, often exhibit a decrease in the pelvic phenotype.

      (3) Researchers have found thousands of loci in the genome of sticklebacks and humans that contain such sequences (TG-dinucleotide repeats). This means that thousands of sites undergo frequent mutations during DNA replication. Unless these sites do not possess functionality, they will have some impact on the organism, even causing damage. Even if they are not functional sequences, these sequences will gradually be discarded or replaced during frequent mutations rather than being present in large quantities in the genome.

      Therefore, the study of DNA fragility in yeast cannot explain the situation in multicellular organisms.

      As you noted, we want to express that the frequent variation in the cortex gene should be regulated by targeted regulation involving the GRN in Lepidoptera. In addition, studies on specific epigenetic modifications discovered through the referenced fragile DNA sites suggest that DNA fragility is not determined by the DNA sequence (Ji F, 2020, Cell Res) but rather by other factors, such as epigenetic factors. The sequence features discovered at fragile DNA sites are traces of frequent mutations, not causes.

      In this revision, we analyzed the nucleotide diversity of the mamo genome in 51 wild and 171 domestic silkworms. We found high nucleic acid diversity from the third exon to the upstream region of this gene in wild silkworms. We randomly selected 12 wild silkworms and 12 domestic silkworms and compared their upstream sequences to approximately 1 kb. In wild silkworms, there is significant diversity in their upstream sequences. In domestic silkworms, the sequences are highly conserved, but in some silkworms, a long interspersed nuclear element (LINE) is inserted. This finding suggested that there is frequent variation in the sequence of this region in wild silkworms, while fixation occurs in domesticated silkworms. These genomic data are sourced from the pangenome of silkworms (Tong X, 2022, Nat Commun.). In the pangenomic research, 1078 strains (205 local strains, 194 improved strains, 632 mutant strains, and 47 wild silkworms), which included 545 third-generation sequencing genomes, were obtained. An online website was built to utilize these data (http://silkmeta.org.cn/). We warmly welcome you to use these data.

      In summary, for clearer expression, we have rewritten this section.

      Xie KT, Wang G, Thompson AC, Wucherpfennig JI, Reimchen TE, MacColl ADC, Schluter D, Bell MA, Vasquez KM, Kingsley DM. DNA fragility in the parallel evolution of pelvic reduction in stickleback fish. Science. 2019 Jan 4;363(6422):81-84. doi: 10.1126/science.aan1425.

      Ji F, Liao H, Pan S, Ouyang L, Jia F, Fu Z, Zhang F, Geng X, Wang X, Li T, Liu S, Syeda MZ, Chen H, Li W, Chen Z, Shen H, Ying S. Genome-wide high-resolution mapping of mitotic DNA synthesis sites and common fragile sites by direct sequencing. Cell Res. 2020 Nov;30(11):1009-1023. doi: 10.1038/s41422-020-0357-y.

      Tong X, Han MJ, Lu K, Tai S, Liang S, Liu Y, Hu H, Shen J, Long A, Zhan C, Ding X, Liu S, Gao Q, Zhang B, Zhou L, Tan D, Yuan Y, Guo N, Li YH, Wu Z, Liu L, Li C, Lu Y, Gai T, Zhang Y, Yang R, Qian H, Liu Y, Luo J, Zheng L, Lou J, Peng Y, Zuo W, Song J, He S, Wu S, Zou Y, Zhou L, Cheng L, Tang Y, Cheng G, Yuan L, He W, Xu J, Fu T, Xiao Y, Lei T, Xu A, Yin Y, Wang J, Monteiro A, Westhof E, Lu C, Tian Z, Wang W, Xiang Z, Dai F. High-resolution silkworm pan-genome provides genetic insights into artificial selection and ecological adaptation. Nat Commun. 2022 Sep 24;13(1):5619. doi: 10.1038/s41467-022-33366-x.

      Lu K, Pan Y, Shen J, Yang L, Zhan C, Liang S, Tai S, Wan L, Li T, Cheng T, Ma B, Pan G, He N, Lu C, Westhof E, Xiang Z, Han MJ, Tong X, Dai F. SilkMeta: a comprehensive platform for sharing and exploiting pan-genomic and multi-omic silkworm data. Nucleic Acids Res. 2024 Jan 5;52(D1):D1024-D1032. doi: 10.1093/nar/gkad956.

      Curiously, the last paragraph ("Some research suggests that common fragile sites...") elaborate on the idea that some sites of the genome are prone to mutation. The connection with mamo and the current article are extremely thin. There is here an attempt to connect meiotic and mitotic breaks to Bm-mamo, but this is confusing: it seems to propose Bm-mamo as a recruiter of epigenetic modulators that may drive higher mutation rates elsewhere. Not only I am not convinced by this argument without actual data, but this would not explain how the mutations at the Bm-mamo itself evolved.

      Response: Thank you very much for your careful work. This section mainly illustrates that DNA fragility is not determined by sequence but is regulated by other factors in animals. In fruit flies, they found that mamo is an important candidate gene for recombination hotspot setting in meiosis. First, we evaluated PRDM9, which plays an important role in setting recombination hotspots during meiosis. Our purpose in mentioning this information is to illustrate that chromosome recombination is a process of programmed double strand breaks and to answer another reviewer's question about programmed events in the genome. In summary, we suggest that some variations in DNA sequences are procedural results. We have optimized the description of this section in this version.

      On a more positive note, I find it fascinating that the authors identified a TF that clearly articulates or orchestrate larval pattern development, and that when it is deleted, can generate healthy individuals. In other words, while it is a TF with many targets, it is not too pleiotropic. This idea, that the genetically causal modulators of developmental evolution are regulatory genes, has been described elsewhere (e.g. Fig 4c in 10.1038/s41576-020-0234-z, and associated refs). To me, the beautiful findings about Bm-mamo make sense in the general, existing framework that developmental processes and regulatory networks "shape" the evolutionary potential and trajectories of organisms. There is a degree of "programmability" in the genomes, because some loci are particularly prone to modulate a given type of trait. Here, Bm-mamo, as a potentially regulator of both CPs and melanin pathway genes, appear to be a potent modulator of epithelial traits. Claiming that there are inherent mutational biases behind this is unwarranted.

      Response: Thank you very much for your careful work. I completely agree with your statement that the genome exhibits a certain degree of programmability. On the one hand, some transcription factors can precisely control the spatiotemporal expression levels of some structural genes (such as pigment synthesis genes). On the other hand, these transcription factors are also subject to strict expression regulation. Because the color pattern is complex, changes in single or minority structural genes result in incomplete or imprecise changes in coloring patterns. Nevertheless, several regulatory factors can regulate multiple downstream target genes. Changes in their expression patterns can lead to holistic and significant changes in color patterns. There are long intergenic regions upstream of many important transcription factors, dozens of kilobase pairs (Kb) to hundreds of Kb, which may contain many different regulatory elements for better control of their expression patterns. Therefore, gene regulatory networks can directly regulate transcription factors to modulate a given type of trait. Transcription factors and their downstream target genes can form a functional module, which is similar to a functional module in software or operating systems. This regulation of transcription factors is simpler in terms of steps, which are similar to a single click switch button. The gene regulatory network regulates these modules in response to environmental changes and is widely recognized.

      Some people do not agree that genetic variations can also be regulated. They claim that this is completely random. The infinite monkey theorem (Félix-Édouard-Justin-Émile Borel, 1909) states that if an infinite number of monkeys were given typewriters and an infinite amount of time, they would eventually produce the complete works of Shakespeare. Although this theory advocates randomness on the surface, its conclusions are full of inevitability (tail event). In nature, some things we observe do not have obvious regularity because they involve relatively complex factors, and the underlying logic is obscure and difficult to understand. We often name them random. However, as we gradually understand the logic behind this complex event, we can also recognize the procedural nature of this randomness.

      Previously, chromosomal recombination during meiosis was believed to be a random event. However, currently, it is believed that the process is procedural. The occurrence of meiotic recombination mentioned earlier indicates that the genome has the ability to self-set the position of double-strand breaks to form new allelic forms. Because meiotic recombination is programmed, transcription factors that recognize DNA sites, enzymes that cleave double strands, and DNA repair systems exist, programming can also introduce genetic variation. A study in plants has provided insights into this programmed mutation (Monroe JG, 2023, nature). Frequent changes in the expression patterns of some transcription factors occur between and/or within species. In this article, we only discuss the possible reasons for variations in the expression patterns of some transcription factors in a general manner and simple reasoning. We have added an analysis of the response of wild silkworms and improved the relevance of the discussion.

      Monroe JG, Srikant T, Carbonell-Bejerano P, Becker C, Lensink M, Exposito-Alonso M, Klein M, Hildebrandt J, Neumann M, Kliebenstein D, Weng ML, Imbert E, Ågren J, Rutter MT, Fenster CB, Weigel D. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature. 2022 Feb;602(7895):101-105. doi: 10.1038/s41586-021-04269-6. Epub 2022 Jan 12. Erratum in: Nature. 2023 Aug;620(7973):

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Please structure your Discussion with section headers.

      Response: Thank you very much for your careful work. We have added relevant section headers.

      • As explained in my public review, I found the two last sections of the Discussion to be dispersed and confusing. I also must say that I carefully read the Response to Reviewers on this, which helped me to better understand the authors' intentions here. Please consider the revision of this Discussion as this feels extremely speculative difficult to connect with Bm-mamo.

      Response: Thank you very much for your careful work. We have rewritten this part of the content.

      • typo: were found near the TTS of yellow --> TSS

      Response: Thank you very much for your careful work. We have made these modifications.

      • l. 234 :"expression level of the 18 CP genes in the integument". Consider adding a mention of Figure 7 here, as only Fig. S10 is cited here.

      Response: Thank you very much for your careful work. We have made these modifications.

      • Editorial comment on the second half of the Abstract:

      Wu et al : "We found that Bm-mamo can comprehensively regulate the expression of related pigment synthesis and cuticular protein genes to form color patterns. This indicates that insects have a genetic basis for coordinate regulation of the structure and shape of the cuticle, as well as color patterns. This genetic basis provides the possibility for constructing the complex appearances of some insects. This study provides new insight into the regulation of color patterns."

      I respectfully suggest a more accurate rephrasing, where the methods are mentioned, and where the logical argument is more straightforward. For example

      "Using RNAi and CRISPR we show that Bm-mamo is a repressor or dark melanin patterns in the larval epithelium. Using in-vitro binding assays and gene expression profiling in wild-type and mutant larvae, we also show that Bm-mamo likely regulate the expression of related pigment synthesis and cuticular protein genes in a coordinated manner to mediate its role in color pattern formation. This mechanism is consistent with a dual role of this transcription factor in regulating both the structure and shape of the cuticle and pigments that are embedded within it. This study provides new insight into the regulation of color patterns as well as in the construction more complex epithelial features in some insects."

      I hope this let the ideas of the original version transpire as the authors intended.

      Response: Thank you very much for your careful work. We have made these modifications.

    1. Author Response

      Public Reviews

      We thank both reviewers for taking the time and effort to think critically about our paper and point out areas where it can be improved. In this document, we do our best to clarify any misunderstandings with the hope that further consideration about the strengths and weaknesses of our approach will be possible. Our responses are in bold.

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation.

      We thank the reviewer for this positive assessment of our work. We are happy that the reviewer noted what we feel is a unique strength of our approach: we scaled up experimental evolution by using DNA barcodes and by exploring 12 related selection pressures. Despite this scaling up, we still see phenotypic convergence among the 744 adaptive mutants we study.

      The environments we study represent 12 different concentrations or combinations of two drugs, radicicol and fluconazole. Our hope is that this large dataset (774 mutants x 12 environments) will be useful, both to scientists who are generally interested in the genetic and phenotypic underpinnings of adaptation, and to scientists specifically interested in the evolution of drug resistance.

      Weaknesses:

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements.

      This is a misunderstanding that we will work to clarify in the revision. Our starting set did not include 21,000 adaptive lineages. The total number of unique adaptive lineages in this starting set is much lower than 21,000 for two reasons.

      First, ~21,000 represents the number of single colonies we isolated in total from our evolution experiments. Many of these isolates possess the same barcode, meaning they are duplicates. Second, and more importantly, most evolved lineages do not acquire adaptive mutations, meaning that many of the 21,000 isolates are genetically identical to their ancestor. In our revised manuscript, we will explicitly state that these 21,000 isolated lineages do not all represent unique, adaptive lineages. In figure 2 and all associated text, we will change the word “lineages” to “isolates,” where relevant.

      More broadly speaking, several previous studies have demonstrated that diverse genetic mutations converge at the level of phenotype, and have suggested that this convergence makes adaptation more predictable (PMID33263280, PMID37437111, PMID22282810, PMID25806684). Our study captures mutants that are overlooked in previous studies, such as those that emerge across subtly different selection pressures (e.g., 4 𝜇g/ml vs. 8 𝜇g/ml flu) and those that are undetectable in evolutions lacking DNA barcodes. Thus, while our experimental design misses some mutants (see next comment), it captures many others. Note that 774 adaptive lineages is more than most previous studies. Thus, we feel that “our work – showing that 774 mutants fall into a much smaller number of groups” is important because it “contributes to growing literature suggesting that the phenotypic basis of adaptation is not as diverse as the genetic basis (lines 161 - 162).”

      As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance.

      The word “briefly” feels a bit unfair because we discuss this bias on 3 separate occasions (on lines 146 - 147, 260 - 264, and in more detail on 706 - 714). We even walk through an example of a class of mutants that our study misses. We say, “our study is underpowered to detect adaptive lineages that have low fitness in any of the 12 environments. This is bound to exclude large numbers of adaptive mutants. For example, previous work has shown some FLU resistant mutants have strong tradeoffs in RAD (Cowen and Lindquist 2005). Perhaps we are unable to detect these mutants because their barcodes are at too low a frequency in RAD environments, thus they are excluded from our collection of 774.”

      In our revised version, we will add more text to the first mention of these missing mutants (lines 146 - 147) so that the implications are more immediately made apparent.

      While we “miss” some classes of mutants, we “catch” other classes that may have been missed in previous studies of convergence. For example, we observe a unique class of FLU-resistant mutants that primarily emerged in evolution experiments that lack FLU (Figure 3). Thus, we think that the unique design of our study, surveying 12 environments, allows us to make a novel contribution to the study of phenotypic convergence.

      One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs.

      We discussed these implications in some detail in the 16 lines mentioned above (146 - 147, 260 - 264, 706 - 714). To add to this discussion, we will also add the following sentence to the end of the paragraph on lines 697 - 714: “This could complicate (or even make impossible) endeavors to design antimicrobial treatment strategies that thwart resistance”.

      We will also add a new paragraph that discusses these implications earlier in our manuscript. This paragraph will highlight the strengths of our method (e.g., that we “catch” classes of mutants that are often overlooked) while being transparent about the weaknesses of our approach (e.g., that we “miss” mutants with strong tradeoffs).

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations.

      The rate at which new mutations enter a population is driven by various factors such as the mutation rate and population size, so choosing an arbitrary threshold like 25 generations is difficult.

      We conducted our fitness competition following previous work using the Levy/Blundell yeast barcode system, in which the number of generations reported varies from 32 to 40 (PMID33263280, PMID27594428, PMID37861305, see PMID27594428 for detailed calculation of the fraction of lineages biased by secondary mutations in this system).

      The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay.

      We understand how the reviewer came to this misunderstanding and will adjust our revised manuscript accordingly. Previous work has demonstrated that, in this particular evolution platform, most of the mutations actually occur during the transformation that introduces the DNA barcodes (PMID25731169). In other words, these mutations do not accumulate during the 40 generations of evolution, they are already there. So the observation that we collect a genetically diverse pool of adaptive mutants after 40 generations of evolution is not evidence that 40 generations is enough time for secondary mutations to bias abundance values.

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach.

      This concern, and all subsequent concerns, seem to be driven by either (a) general concerns about the noisiness of fitness measurements obtained from large-scale barcode fitness assays or (b) general concerns about whether the clusters obtained from our dimensional reduction approach capture this noise as opposed to biologically meaningful differences.

      We will respond to each concern point-by-point, but want to start by generally stating that (a) our particular large-scale barcode fitness assay has several features that diminish noise, and (b) we devote 4 figures and 200 lines of text to demonstrating that these clusters capture biologically meaningful differences between mutants (and not noise).

      In terms of this specific concern, we performed an analysis of noise in the submitted manuscript: Our noisiest fitness measurements correspond to barcodes that are the least abundant and thus suffer the most from stochastic sampling noise. These are also the barcodes that introduce the nonlinearity the reviewer mentions. We removed these from our dataset by increasing our coverage threshold from 500 reads to 5,000 reads. The clusters did not collapse, which suggests that they were not capturing noise (Figure S7 panel B). But we agree with the reviewer that this analysis alone is not sufficient to conclude that the clusters distinguish groups of mutants with unique fitness tradeoffs.

      Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages.

      To evaluate the strength of the clustering, we performed numerous analyses including whole genome sequencing, growth experiments, reclustering, and tracing the evolutionary origins of each cluster (Figures 5 - 8). All of these analyses suggested that our clusters capture groups of mutants that have different fitness tradeoffs. We will adjust our revised manuscript to make clear that we do not rely on the results of a clustering algorithm alone to draw conclusions about phenotypic convergence.

      We are also grateful to the reviewer for helping us realize that, as written, our manuscript is not clear with regard to how we perform clustering. We are not using UMAP to decide which mutant belongs to which cluster. Recent work highlights the importance of using an independent clustering method (PMID37590228). Although this recent work addresses the challenge of clustering much higher dimensional data than we survey here, we did indeed use an independent clustering method (gaussian mixture model). In other words, we use UMAP for visualization but not clustering. We also confirm our clustering results using a second independent method (hierarchical clustering; Figure S8). And in our revised manuscript, will confirm with a third method (PCA, see below). We will adjust the main text and the methods section to make these choices clearer.

      This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted.

      The salient question is whether the clusters are so “fuzzy” that they are not meaningful. That interpretation seems unreasonable. Our clusters group mutants with similar genotypes, evolutionary histories, and fitness tradeoffs (Figures 5 - 8). Clustering mutants with similar behaviors is important and useful. It improves phenotypic prediction by revealing which mutants are likely to have at least some phenotypic effects in common. And it also suggests that the phenotypic space is constrained, at least to some degree, which previous work suggests is helpful in predicting evolution (PMID33263280, PMID37437111, PMID22282810, PMID25806684).

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components.

      The components derived from PCA are often not interpretable. It’s not obvious that each one, or even the first one, will represent some intuitive phenotype, like resistance to fluconazole.

      Moreover, we see many non-linearities in our data. For example, fitness in a double drug environment is not predicted by adding up fitness in the relevant single drug environments. Also, there are mutants that have high fitness when fluconazole is absent or abundant, but low fitness when mild concentrations are present. These types of nonlinearities can make the axes in PCA very difficult to interpret, plus these nonlinearities can be missed by PCA, thus we prefer other clustering methods.

      We will adjust our revised manuscript to explain these reasons why we chose UMAP and GMM over PCA.

      Also, we will include PCA in the supplement of our revised manuscript. Please find below PC1 vs PC2, with points colored according to the cluster assignment in figure 4 (i.e. using a gaussian mixture model). It appears the clusters are largely preserved.

      Author response image 1.

      Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages.

      We worry that the idea stems from apriori notions of what the important dimensions should be. It also seems like this would miss important nonlinearities such as our observation that low fluconazole behaves more like a novel selection pressure than a dialed down version of high fluconazole.

      Also, we believe the reviewer meant “fitness profile” and not “fitness landscape”. A fitness landscape imagines a walk where every “step” is a mutation. Most lineages in barcoded evolution experiments possess only a single adaptive mutation. A single-step walk is not enough to build a landscape, though others are expanding barcoded evolution experiments beyond the first step (PMID34465770, PMID31723263), so maybe one day this will be possible.

      Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered.

      We agree. We did not rely on the results of BIC alone to make final decisions about how many clusters to include. We thank the reviewer for pointing out this gap in our writing. We will adjust our revised manuscript to explain that we ultimately chose to describe 6 clusters that we were able to validate with follow-up experiments. In figures 5, 6, 7, and 8, we use external information to validate the clusters that we report in figure 4. And in lines 697 – 714, we explain that there are may be additional clusters beyond those we tease apart in this study.

      This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset.

      We are under the following impression: If our clustering method was overfitting, i.e. capturing noise, the optimal number of clusters should decrease when we eliminate noise. It increased. In other words, the observation that our clusters did not collapse (i.e. merge) when we removed noise suggests these clusters were not capturing noise.

      More generally, our validation experiments, described below, provide additional evidence that our clusters capture meaningful differences between mutants (and not noise).

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays.

      Some types of bar-seq methods, in particular those that look at fold change across two time points, are noisier than others that look at how frequency changes across multiple timepoints (PMID30391162). Here, we use the less noisy method. We also reduce noise by using a stricter coverage threshold than previous work (e.g., PMID33263280), and by excluding batch effects by performing all experiments simultaneously (PMID37237236).

      The main assay we use to measure fitness has been previously validated (PMID27594428). No subsequent study using this assay validates using the methods suggested by the reviewer (see PMID37861305, PMID33263280, PMID31611676, PMID29429618, PMID37192196, PMID34465770, PMID33493203).

      More to the point, bar-seq has been used, without the reviewer’s suggested validation, to demonstrate that the way some mutant’s fitness changes across environments is different from other mutants (PMID33263280, PMID37861305, PMID31611676, PMID33493203, PMID34596043). This is the same thing that we use bar-seq to demonstrate.

      For all of these reasons, we are hesitant to confirm bar-seq itself as a valid way to infer fitness. It seems this is already accepted as a standard in our field.

      Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors.

      We don’t agree that fitness measurements obtained from this bar-seq assay generally require validation. But we do agree that it is important to validate whether the mutants in each of our 6 clusters indeed are different from one another in meaningful ways, in particular, in that they have different fitness tradeoffs. We have four figures (5 - 8) and 200 lines of text dedicated to validating whether our clusters capture reproducible and biologically meaningful differences between mutants. Happily, one of these figures (Fig 7) includes growth curves, which are exactly the type of validation experiment asked for by the reviewer.

      Below, we walk through the different types of validation experiments that are present in our original manuscript, and additional validation experiments that we plan to include in the revised version. We are hopeful that these validation experiments are sufficient, or at the very least, that this list empowers reviewers to point out where more work is needed.

      (1) Mutants from different clusters have different growth curves: In our original manuscript, we measured growth curves corresponding to a fitness tradeoff that we thought was surprising. Mutants in clusters 4 and 5 both have fitness advantages in single drug conditions. While mutants from cluster 4 also are advantageous in the double drug conditions, mutants from cluster 5 are not! We validated these different behaviors by studying growth curves for a mutant from each cluster (Figures 7 and S10).

      (2) Mutants from different clusters have different evolutionary origins: In our original manuscript, we came up with a novel way to ask whether the clusters capture different types of adaptive mutants. We asked whether the mutants in each cluster originate from different evolution experiments. Indeed they often do (see pie charts in Figures 6, 7, 8). This method also provides evidence supporting each cluster’s differing fitness tradeoffs.

      For example, mutants in cluster 5 appear to have a tradeoff in a double drug condition (described above). They rarely originate from that evolution condition, unlike mutants in nearby cluster 4 (see Figure 7).

      (3) Mutants from each cluster often fall into different genes: In our original manuscript, we sequenced many of these mutants and show that mutants in the same gene are often found in the same cluster. For example, all 3 IRA1 mutants are in cluster 6 (Fig 8), both GPB2 mutants are in cluster 4 (Figs 7 & 8), and 35/36 PDR mutants are in either cluster 2 or 3 (Figs 5 & 6).

      (4) Mutants from each cluster have behaviors previously observed in the literature: In our original manuscript, we compared our sequencing results to the literature and found congruence. For example, PDR mutants are known to provide a fitness benefit in fluconazole and are found in clusters that have high fitness in fluconazole (lines 457 - 462). Previous work suggests that some mutations to PDR have different tradeoffs than others, which is what we see (lines 540 - 542). IRA1 mutants were previously observed to have high fitness in our “no drug” condition, and are found in the cluster that has the highest fitness in the “no drug” condition (lines 642 - 646). Previous work even confirms the unusual fitness tradeoff we observe where IRA1 and other cluster 6 mutants have low fitness only in low concentrations of fluconazole (lines 652 - 657).

      (5) Mutants largely remain in their clusters when we use alternate clustering methods: In our original manuscript, we performed various different reclustering and/or normalization approaches on our data (Fig 6, S5, S7, S8, S9). The clusters of mutants that we observe in figure 4 do not change substantially when we recluster the data. We will add PCA (see above) to these analyses in our revised manuscript.

      (6) We will include additional data showing that mutants in different clusters have different evolutionary origins: Cluster 1 is defined by high fitness in low fluconazole that declines with increasing fluconazole (see Fig 4E and Fig 5C). In our revised manuscript, we will show that cluster 1 lineages were overwhelmingly sampled from evolutions conducted in our lowest concentration of fluconazole (see figure panel A below). No other cluster’s evolutionary history shows this pattern (figures 6, 7, and 8).

      (7) We will include additional data showing that mutants in different clusters have different growth curves: Cluster 1 lineages are unique in that their fitness advantage is specific to low flu and trades off in higher concentrations of fluconazole. We obtained growth curves for three cluster 1 mutants (2 SUR1 mutants and 1 UPC2 mutant). We compared them to growth curves for three PDR mutants (from clusters 2 and 3). Cluster 1 mutants appear to have the highest growth rates and reach the higher carrying capacity in low fluconazole (see red and green lines in Author response image 2 panel B below). But the cluster 1 mutants are negatively affected by higher concentrations of fluconazole, much more so than the mutants from clusters 2 and 3 (see Author response image 2 panel C below). This is consistent with the different fitness tradeoffs we observe for each cluster (figures 4 and 5). We will include a more detailed version of this analysis and the figures below in our revised manuscript.

      Author response image 2.

      Validation experiments demonstrate that cluster 1 mutants have uniquely high fitness in only the lowest concentration of fluconazole. (A) The mutant lineages in cluster 1 were largely sampled from evolution experiments performed in low flu. This is not true of other clusters (see pie charts in main manuscript). (B) In low flu (4 𝜇g/ml), Cluster 1 lineages (red/UPC2 and green/SUR1) grow faster and achieve higher density than lineages from clusters 2 and 3 (blue/PDR). This is consistent with barseq measurements demonstrating that cluster 1 mutants have the highest fitness in low flu. (C) Cluster 1 lineages are sensitive to increasing flu concentrations (SUR1 and UPC2 mutants, middle and rightmost graphs). This is apparent in that the gray (8 𝜇g/ml flu) and light blue (32 𝜇g/ml flu) growth curves rise more slowly and reach lower density than the dark blue curves (4 𝜇g/ml flu). But this is not the case for the PDR mutants from clusters 2 and 3 (leftmost graph). These observations are consistent with the bar-seq fitness data presented in the main manuscript (Fig 4E).

      With all of these validation efforts combined, we are hopeful that the reviewer is now more convinced that our clusters capture groups of mutants with different fitness tradeoffs (as opposed to noise). We want to conclude by saying that we are grateful to the reviewer for making us think deeply about areas where we can include additional validation efforts as well as areas where we can make our manuscript clearer.

      Reviewer #2 (Public Review):

      Summary:

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotypephenotype mapping.

      Strengths:

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory).

      We are very grateful for this positive review. This was indeed a lot of work! We are happy that the reviewer noted what we feel is a unique strength of our manuscript: that we survey adaptive isolates across multiple environments, including low drug concentrations.

      Weaknesses:

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one!

      We thank the reviewer for these words of encouragement and will work towards catching more low fitness lineages in our next project.

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think:

      We think that phrasing the “jump” as a question might help lay readers get from point A to point B. So, in the introduction of our revised manuscript, we will add a paragraph roughly similar to this one: “If two groups of drug-resistant mutants have different fitness tradeoffs, does it mean that they provide resistance through different underlying mechanisms? Alternatively, it could mean that both provide drug resistance via the same mechanism, but some mutations come with a cost that others don’t pay. However, another way to phrase this alternative is to say that both groups of mutants affect fitness through different suites of mechanisms that are only partially overlapping. And so, by identifying groups of mutants with different fitness tradeoffs, we argue that we will be uncovering sets of mutations that impact fitness through different underlying mechanisms. The ability to do so would be useful for genotype-phenotype mapping endeavors.”

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm.

      In our revised manuscript, we will carefully review all citations. The issue may stem from our attempt to reach two different groups of scientists. We ourselves are broadly interested in the structure of the genotype-phenotype-fitness map (PMID33263280, PMID32804946). Though the 3 papers the reviewer mentions on lines 132 - 133 all pertain to yeast, we cite them because they are studies about the complexity of this map. Their conclusions, in theory, should apply broadly, beyond yeast. Similarly, the reason we cite papers from yeast, as well as bacteria and cancer, is that we believe general conclusions about the genotype-phenotype-fitness map should apply broadly. For example, the sentence the reviewer highlights, “previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms” is a general observation about the way genotype maps to fitness. So we cited papers from across the tree of life to support this sentence.

      On the other hand, because we study drug resistant mutations, we also hope that our work is of use to scientists studying the evolution of resistance. We agree with the reviewer that in this regard, some of our findings may be especially pertinent to the evolution of resistance to antifungal drugs. We will consider this when reviewing the citations in our revised manuscript and add some text to clarify these points.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae).

      In the revised manuscript, we will make clear that we study S. cerevisiae.

      In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly?

      We like this idea and we are working on it, but it is not straightforward. The reviewer is correct in that we can use the sequencing data that we already have. But calling aneuploidy with certainty is tough because its signal can be masked by noise. In other words, some regions of the genome may be sequenced more than others by chance. Given this is not straightforward, at least not for us, this analysis will likely have to wait for a subsequent paper.

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections?

      Perhaps because our background lies in general study of the genotype-phenotype map, we did not want to make bold assertions about how our work might apply to pathogenic yeasts. But we see how this could be helpful and will add some discussion points about this. Specifically, we will discuss which of the genes and mutants we observe are also found in Candida. We will also investigate whether our observation that low fluconazole represents a seemingly unique challenge, not just a milder version of high fluconazole, has any corollary in the Candida literature.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      • Albeit the link between CSA and NE integrity the work is in my eyes too preliminary. Although the data presented are well done and carefully evaluated they mostly (except Fig 1A) rely on direct comparisons of one patient cell line (CS-A or CS-B) to the same cells expression the wildtype protein. It remains thus open whether the effects seen on LEM2 expression, LEM2-LaimA/C interaction, stress fibre formation, cGAS/STING signaling pathway activation in the CS-A cells are representative for a number of different CS patient derived cells. This is especially important given the small changes observed. Please note that there are alos clear differences between the CSA-wt and CSB-wt cells. Would the HPA CSA KO cells show in addition to NE irregularities (not even quantified) the same phenotypes and can they be reverted by re-expression of the wildtype CSA protein? We would like to thank the reviewer for this comment. Indeed we have previously observed nuclear circularity defects in CSA KO HAP1 cells but we haven’t investigated the other phenotypes in this cell line. One of the main reasons behind this is the technical difficulties associated with performing immunofluorescence with HAP1 cells who are very small and tend to grow in aggregates.

      Proposed experimental plan: To address this reviewer’s comment, we will attempt again to use HAP1 cells (WT and CSA KO) and look at nuclear circularity (with quantification), stress fiber formation and cGas foci. If we don’t succeed, we will use an alternative isogenic cell model consisting of fibroblasts in which we will knock out CSA using CRISPR/Cas9. We will then repeat the same experiments as proposed above for the HAP1 cells.

      • The link between CSA and SUN1 is not well worked out. What is the effect of SUN2 downregulation and that of nespirins? It remains unclear whether the observed effects are indeed LINC mediated. Proposed experimental plan: To address this point, we will downregulate SUN2 and nesprins using siRNAs in two different cell models (as described above) and assess nuclear shape as well as cGas/STING pathway activation.

      Minor comments:

      * Fig 1B: Why is HA-Tagged CSA not shown on the CSA western? This would be helpful to compare to the endogenous levels at least in CSB cells. A western showing an housekeeping marker would allow better comparison. Judging from the proteins markers HA-tagged CSA seems much larger as endogenous CSA (first versus second row). Again, less cropped western blots would help.*

      We are sorry for the confusion and we have realised that the molecular weights on the western blots were incorrectly labeled on this figure. This will be modified, and the full, uncropped WB will be provided as a supplementary file.

      Fig 3A: Is CSA FLAG or HA-tagged? Or both? If both are expressed the question raises of why the CSA-LEM 2 interaction is only seen in an overexpression situation.

      • *

      CSA is HA tagged. To address this reviewer’s comment we will try performing immunoprecipitation on endogenous proteins.

      Fig 5: Inconsistency between figure and figure legend: 20 vs 25 nM Jasplakinolide. I assume Latrunculin A should read Cytochalasin D?

      Thank you for pointing this out, and yes this is indeed a mistake on our labeling. We will rectify this on the figure legend.

      Fig5B: Not clear why in "CSA-wT cells" Cytochalasin D and Jasplakinolide have the same effect on nuclear envelope shape yet only Jasplakinolide increases the number of blebs.

      Cyt D inhibits actin polymerization while jasplakinolide increases polymerization. Likely actin polymerization increases blebs through extra force being put on the nucleus through actin cables/ actin based motility. Both drugs decrease nuclear roundness as they disrupt the normal actin network leading to worsening of nuclear shape through different mechanisms.

      Page 10: Method for IF: 4% (v/v) paraformaldehyde and 2% /v/v) should likely read (w/v). Page 19: replace "withl" by "with".

      We will rectify both these points.

      Reviewer #2

      The paper is well-written and for the most part, the data support the conclusions of the authors. Some minor caveats could be addressed to improve the quality of the manuscript.

      We would like to thank the reviewer for their positive feedback on our manuscript.

      • The phenotype of decreased LEMD2 incorporation into the NE in CS-A cells is minor. Only ~20% and thus, it is not clear whether this is causal of any of the NE abnormalities. It should be better explained how these data add to the story.

      To address this point, we will overexpress LEMD2 in CS-A cells and assess whether the NE phenotype can be significantly rescued. This will add value to this part of story.

      • Inducing actin polymerization and depolarization impact nuclear morphological abnormalities and nuclear blebbing. Do these treatments impact nuclear fragility and cGAS accumulation at NE break sites?* This is a good point indeed. To address this question, we will include cGas foci staining and quantification upon treatment with these chemicals.

      • Depletion of SUN1 in CS-A cells increased nuclear circularity, decreased blebbing, and phosphorylation of TBK1. The impact of SUN1 depletion in cGAS foci formation at NE break sites and phosphorylation of STING is not shown. Such experiments will provide stronger evidence that CS-A activates the cGAS-STING pathway in a SUN1 (mechanical stress)-dependent manner.

      * We will address this question by analysing cGas foci and cGas-STING pathway activation upon SUN1 depletion by siRNA.

      Reviewer #3

      • *

      The data are generally clear, well performed and well interpreted with some exceptions:*

      1) I appreciate the use of isogenic cell lines (a big plus when dealing with patient-derived cell lines). However, these lines were established 30 years ago and the reported phenotypes might be due to genetic drifts. To exclude this, I suggest to complement the HAP-1 ERCC8 KO cell line with exogenously expressed CSA and assess if this rescues the phenotypes reported. Validation of the KO in these lines, either by western blotting or sequencing is needed.*

      This point has also been raised by the first reviewer, and will be addressed as described above (and pasted below):

      We would like to thank the reviewer for this comment. Indeed we have previously observed nuclear circularity defects in CSA KO HAP1 cells but we haven’t investigated the other phenotypes in this cell line. One of the main reasons behind this is the technical difficulties associated with performing immunofluorescence with HAP1 cells who are very small and tend to grow in aggregates.

      Proposed experimental plan: To address this reviewer’s comment, we will attempt again to use HAP1 cells (WT and CSA KO) and look at nuclear circularity (with quantification), stress fiber formation and cGas foci. If we don’t succeed, we will use an alternative isogenic cell model consisting of fibroblasts in which we will knock out CSA using CRISPR/Cas9. We will then repeat the same experiments as proposed above for the HAP1 cells.

      2) Related to the complementation of patient cell lines, the exogenous HA-CSA is not recognised by the anti-CSA in the CSA-null patient cell lines (Fig 1B, second blot). Shouldn't you be able to see this exogenous protein? HA-GFP-CSB in the complemented CSB-null patient cell line runs at the same weight as endogenous CSB (Fig 1B, fourth blot). This is also unexpected. I think you need better characterisation of your cell lines and need to demonstrate the level of exogenous transgenes that have been used to complement the cells and that they localise appropriately, presumably to the nucleus. You should also make sure to cite the paper where they were isolated and describe that they were immortalised (Troelstra et al., 1992) and the paper in which transgenes were stably overexpressed (Qiang et al., 2021).*

      *

      As mentioned above, we have realised that we made some mistakes with the labeling of the molecular weight on this western blot. This will be corrected and the full uncropped western blot will be provided as a supplementary figure.

      We will also cite the suggested papers accordingly.

      3) Immunolocalisation of INM proteins is notoriously tricky and the permeabilisation steps include only 0.2% Tx100, which can be insufficient to permeabilise the INM. I appreciate the Emerin and Lamin immunostaining seems to have worked, but in many cases successful immunostaining can be antibody-specific. Can you try harsher permeabilisation to expose LEM2 epitopes? I'm somewhat uncomfortable with the suggestion that there is a cytosolic (ER?) pool of endogenous LEM2 as this runs counter to the literature and feel that your antibody or fixation conditions are illuminating a non-specific protein. The WB in Fig 2E shows that there is virtually no LEM2 in the "soluble" fraction. I would be more cautious on this cytoplasmic/nuclear pool interpretation. Biochemical nuclear and cytoplasmic fractionation would help clarify the signal in a NE vs a non-NE pool.*

      *

      As suggested by the reviewer, we will try harsher permeabilisation conditions to test the LEMD2 antibody. As we suggest in the manuscript however, we think that the “cytoplasmic” LEMD2 pool we observed by IF in the absence of pre-extraction is indeed unspecific. This is why we have performed the rest of the experiments with a pre-extraction step, that we have shown to give a specific LEMD2 signal that disappear upon depleting LEMD2 by siRNA.

      4) Page 15: "Using a Proximity Ligation Assay (PLA), we showed a significant reduction in the number of PLA foci in CS-A cells compared to the WT(HA-CSA) cells, reflecting a reduced number of LEMD2-lamin A/C complexes (Figure 2G, 2H). This data suggests defects in the incorporation of LEMD2 into the NE and lamin protein complexes in CS-A cells". If you have less LEM2 in the NE, it is quite expected that you will have less "LEM2-laminA/C" complexes. To me the logic doesn't hold and this data does not suggest that there is an underlying defect in LEM2-lamin interaction. To ascertain whether there is such a defect one could perform an IP against LEM2 and quantify laminA/C, normalizing by the amount of LEM2 in the input.

      We feel we may not have been clear in how we interpreted this data. What we mean is that in each individual cell, the number of Lamin-LEMD2 complexes is decreased, probably indeed due to the fact that there is less LEMD2 altogether within the nucleus in the absence of CSA. We will clarify this in the text.

      5) "We overexpressed LEMD2-GFP and Flag-CSA constructs, followed by GFP pulldown in WT(HA-CSA) cells". Since the co-IP data are obtained in overexpression conditions (of both HA-CSA and Flag-CSA?), the authors should validate the interaction between LEM2 and CSA using an orthogonal approach. Perhaps anti-HA capture of the WT(HA-CSA) cells would allow you to immunoblot for endogenous LEM2?*

      *

      To address this point, we will try to immunoprecipitate HA-CSA and look at endogenous LEMD2.

      6) Related to the CSA-LEM2 binding in the above experiment, the procedure involves combining a native detergent-extracted cytoplasmic pool with a denatured (RIPA-extracted) nuclear pool for performing the GFP-trap. From which pool was the tagged CSA bound to LEM2 in?*

      *

      We are sorry about the confusion. We didn’t try to run the IP from the different pools but instead from the combined pools, to ensure we were looking in the whole cell extract. We would expect however that the interaction occurs in the nuclear pool as both CSA and LEMD2 are nuclear proteins.

      7) "The absence of CSA in CS-A patient cells does not affect the mobility of LEMD2 at the NE but instead decreases its interaction with A-type lamins". To me the fact that loss of CSA decreases LEM2-lamin interaction is not well supported (see point 3).

      • *

      See our response to point 4

      8) "Here, we showed by immunoprecipitation that LEMD2 also interacts with CSA. This suggests that the recruitment and stabilization of LEMD2 to the NE is mediated by an interaction with CSA, although the mechanism remains unclear". I think this is an overstatement: there are no data suggesting that CSA recruits or stabilises LEM2 at the NE.

      * *We will tone down this statement in the text

      9) As the authors suggest in the discussion, it would be worth checking whether LEM2 overexpression is able to rescue some of the NE defects reported, strengthening the hypothesis that LEM2 levels are at least in part responsible for the phenotypes reported.

      To address this point, we will perform LEMD2 overexpression in the CSA cells, and analyse the nuclear envelope defects and ruptures (shape and cGas foci quantification)

      10) To me it is not clear how the reported phenotypes are interrelated. The first part of the manuscript shows that CSA interacts with LEM2, and that loss-of-function CSA impacts on LEM2 levels and LEM2-lamin interaction, suggesting a direct role for CSA at the nuclear envelope. The second part of the manuscript shows that cells with defective CSA have more actin stress fibres and releasing the cytoskeleton-nuclear tethering is able per se to rescue the nuclear membrane and cGAS phenotypes. How do the authors reconciliate these two parts? Is CSA directly involved in both inner nuclear membrane homeostasis and actin cytoskeleton modulation or is this latter role upstream and the NE defects a mere consequence of increased cytoskeleton rigidity?

      At this point indeed we cannot draw definitive conclusions as to whether the two described phenotypes are inter-related. However, by addressing the other points raised by the reviewers, we hope this will help clarifying the mechanism.

      11) It is not clear how or why actin stress fibres are elevated in the CS-A cells. Can the authors provide any insight based on their RNAseq analysis? Demonstrating a link to ROCK, LIMK or Rho signalling would be interesting and verifying ppMLC2 levels would help explain why contractility is enhanced. Additionally, is the increase in contractility dependent upon any of the genes identified as up- or downregulated in RNAseq? Presently, the manuscript is missing a link between its two halves.*

      *

      We would like to reiterate that the RNASeq analysis we performed was done on previously published data from another group (as described in the text). To address the point raised by the reviewer, we will look more specifically into our analysis to look at ROCK, LIMK or Rho signalling to see if any of these pathways appear to be modulated by the absence of CSA.

      12) Related to point 1, the RNAseq comparison was performed on patient cells lacking CS-A and patient cells lacking CS-A and later over-expressing HA-CSA, and this comparison is used extensively for phenotype description in the manuscript. In isn't clear to me that this is the most insightful comparison to make; the rescue by overexpression is not as elegant as CRISPR reversion and the ko fibroblasts have presumably been surviving well in culture without CS-A before this protein was overexpressed. Can you validate the differential expression of any identified proteins in the acute HAP1 ko? Can you validate any of the differentially expressed proteins in comparison to normal fibroblasts (e.g., 13O6, as per Qiang et al., 2021)?

      As we will validate our experiments in an additional cell model (as described above), we will also indeed validate the level of expression of cytoskeletal proteins upon CSA KO/rescue.

      Minor comments

      * - Page 14: "To characterize the NE phenotypes further, we obtained CS patient-derived cell lines carrying loss-of-function mutations in CSA (CS-A cells) or CSB (CS-B cells), and their respective isogenic control cell lines (WT(HACSA) and WT(HA-GFP-CSB))." What type of loss-of-function? Is the mutant protein still produced? In Fig 6A there seem to be a band in the CS-A blot (second lane), but in Fig 1B, there isn't. I think this is important to know to interpret the phenotype related to LEM2 interaction.*

      We can clarify that in the text. Indeed, the loss of function mutation leads to the absence of CSA protein.

      - Figure 1B is poorly annotated. What do - and + stand for? In general, I find a bit confusing how the WB are presented throughout the manuscript, specifically how the antibodies are reported (e.g., HA-CSA instead of HA). Please mark up all western blots with antisera used. Please make sure all expected bands are within the crops - e.g., Fig 3B, the anti-LEM2 blot should be expanded vertically to show the LEM2-GFP relative to endogenous LEM2.

      We will correct these on the figures

      - From the methods, it appears that you obtained a Please provide clarity on which construct was used in which figure, and verify that an N-terminally tagged LEM2 still localises to the NE.

      We actually cloned LEMD2 into an empty pEGFP vector but still maintained LEM2-GFP. We will remove the C1plasmid from the methods to avoid confusion as we removed the MCS and GFP and just used the blank vector and inserted lem2-gfp as we obtained it.

      - Fig 1I: there is some text on top of the upper panels (DAPI, cGAS, Merge).

      • *

      * - "Through gene ontology analysis, we found that genes involved in endoplasmic reticulum (ER) stress were differentially expressed (Figure 4B)". I don't think that the way data are shown in Fig 4B is effective. Since GO has been performed, I would replace the table with a GO enrichment analysis graph. Ensure to report all the data in a supplementary .xls so that others can see and reuse it. Is there a mandated repository that accepts RNAseq data?*

      The RNAseq experiment and data was performed by another group and reported in a previous study, as referenced in the main text of the manuscript (Epanchintsev A, Costanzo F, Rauschendorf MA, Caputo M, Ye T, Donnio LM, et al. Cockayne’s Syndrome A and B Proteins Regulate Transcription Arrest after Genotoxic Stress by Promoting ATF3 Degradation. Mol Cell. 2017 Dec;68(6):1054-1066.e6.). Here, we only re-analysed their data using STRING pathway analysis, as detailed in the Material and Methods. However, as suggested by the reviewer, we will replace the table by a GO enrichment graph.

      - The volcano plot looks weird with many values at the maximum log10 (P-value) - is the data processed appropriately?

      As mentioned above, the RNA Seq analysis was performed and published in a different study. We think this is because the Y axis shows adjusted P values.

      - Figure 5B: the legend says "Latrunculin A". Please correct.

      We will correct this

      - For a Wellcome funded researcher, I'm surprised that the mandated OA statement and RRS is absent from the acknowledgements.

      We will of course comply with the open access policy of the Wellcome Trust. However, and based on the WT requirements detailed on their website, we believe the acknowledgement section complies with the funder’s policy: “All research publications must acknowledge Wellcome's support and list the grant reference number which funded the research reported.”

      Maybe mention the changes in nuclear shape is not a causative of nuclear blebbing. But maybe not say that they are completely mutually exclusive phenotype to each other.

      suggestion

      Maybe say that we will overexpress LEMD2 in CS-A cells and show that the NE phenotype can be significantly rescued. This will add value to this part of story. I remember when I did the FRAP experiment, CS-A cells with expression of LEMD2-GFP (that doesn’t form aggregates) looks better in term of shape.

      I think Anne, please check the plasmid map? According to the lab inventory (Plasmids Anne), it is LEMD2-GFP. So probably GFP is at C-terminus.

      I think there was a part in discussion was LEMD2-GFP was mistakenly written as GFP-LEMD… But I am sure I used LEMD2-GFP throughout the work

      We cloned it into an empty pEGFP vector but still maintained LEM2-GFP. Maybe remove the C1 in the methods to avoid confusion as we removed the MCS and GFP and just used the blank vector and inserted lem2-gfp as we obtained it.

      Same construct was used for GFP pulldown and for FRAP. And we can see in FRAp that they localise to the NE. SO it should localise to the NE. Maybe mention that we will do a LEMD2-GFP over expression experiment in CS-A cells and show that they do localise to the NE.

      I don’t remember fully if Denny did this and what came out. I thought he did and ER stress and cytoskeleton regulation came out as enriched terms?

      Denny will have to check this but I think this is because the Y axis shows adjusted P values?? I have the same in my data and Jack told me this is an artefact of the analysis if you adjust for multiple comparisons and is something more often seen in mass spec data